Speaker 0
0:00 – 1:35
Civic Tech Chat is a monthly podcast about the civic technology movement. We seek to harness the power technology has to improve the delivery of public services to people everywhere. This month's episode will cover the topic of open data more in-depth. To get an idea of what open data is at a high level, please take a look at our breakdown session covering open data with a brief explanation of the policy. Let's get into a brief history of open data in Chicago before we hop into our interview. The initiative began in May 2010, with the Daley administration adding Freedom of Information Act logs to Chicago City website. In 2011, the initiative became part of Mayor Emanuel's transition plan, eventually becoming its own site. The portal started to include items like budgetary data, building permit information, location based data, and lobbyist information, among other items. The data portal became formal policy as of December 2012 through the use of an executive order, which called for the appointment of a chief data officer. Lastly, in February 2013, the city of Chicago joined GitHub, allowing for the use of open source projects. There's a great deal to discuss regarding the present state of open data in the city. To get into that, we'll be welcoming Tom Schenk, Chicago's chief data officer, so that he can share more about that. Thank you, Tom, for the time out of your day to be on our program. If you could, for the listener, introduce yourself, tell us a bit about your story and what you do.
Speaker 1
1:35 – 1:54
Yeah. My name is Tom Schenck. I'm the chief data officer for the city of Chicago. And my role every day is to think about how can we strategically use data across the city to improve the quality of life for Chicago's residents, the people who live here, the people who visit here, and also to improve the efficiency of our operations in the city.
Speaker 0
1:55 – 2:05
Oh, well, that sounds like a rather large job. In exploring that a bit, what would you describe as your personal why for your involvement in the field and that role?
Speaker 1
2:06 – 3:22
So for me, it's always I've always thought about the next way to progress within government is to apply more data to it. Governments have certainly been using data for a long period of time. Data was in fact, in our constitution early on about conducting the census whereas people knew that data was very important. And as data has become more cheaply commoditized, it's easier to do analysis with it, it's only going to become more and more ingrained within it. So there's a real opportunity, a real greenfield of how we can use data to improve quality of life, especially with recent methods, methods that have been developed in the last few decades around advanced statistics, really allows ways of using statistics that haven't been done before. Going beyond just counting individuals, going beyond just averages of things, counting number of people and programs, but combining that data and being able to maybe predict what's going to happen and be able to get ahead of the problem, to be able to have a more nuanced understanding of what's happened within our cities within our government. I I've always wanted to take those tools and apply it to day to day life, day to day operations within government to, again, improve improve the quality of life for people who live here and just make, government, a better operation.
Speaker 0
3:23 – 3:34
I can definitely hear the passion for this in in the way you describe that. Additionally, with your role, how how would you describe a day in the life of a chief data officer?
Speaker 1
3:35 – 4:36
So it's a it's a mixture of things. Sometimes it's it's sitting in meetings and and working with different teams and different departments and trying to coordinate those. There's certainly that. There's even engineering of of coding and programming. We create new platforms. We create new programs. We do statistics. And so it's spending that time of actually engineering a solution and trying to figure out a solution that works. Now a lot of people have done this work before, so we're trying to figure that out, the best way to approach these problems. And it's working with community organizations as well, with Open Uptown, with Shy Hack Night, with Code for America Sunlight Foundation, who all who have a bit of action in terms of making sure that they want their cities to work well and making sure that governments are transparent, and other community groups who who are interested in guiding how data can be used to improve the cities and setting those boundaries. And so it's working with those groups as well, trying to figure out what can we do that's better. It sounds like this is something that involves wearing a lot of hats
Speaker 0
4:37 – 4:43
to where there's a part of it that's technological and part of it that's, people oriented. Am I hearing you, correct in that description?
Speaker 1
4:44 – 5:30
Absolutely. The technology is easy. The technology is what we're trained on. I always like to compare it to, doctors and and folks that you hear that have gone through medical school. They'll talk about how they've been trained in in medicine, and the medicine is the easy part. The difficulty is always the people side of it. It's talking with groups. It's talking with departments. It's about applying those and actually taking data and making it operational and connecting it with people and the people who are going to use it to make sure it's actually usable, to make sure it's actually solving some sort of actual problem and not just a theoretical exercise. So the technology, that's what we're trained on. That's what we know really well. The statistics we know very, very well. It's connecting that with actual individuals is where the actual work is.
Speaker 0
5:30 – 5:39
Now let's delve a bit into Chicago's Open Data Initiatives. For someone who isn't familiar with the data portal, how would you describe the program?
Speaker 1
5:40 – 5:59
So the data portal is a a central location where you can search and find data that, is available. And by available, I mean something that you can certainly read and and look online. You can look at those graphs, but it's also machine readable so you can download that information, manipulate that information, use that data for other purposes.
Speaker 0
6:00 – 6:05
And what would you say is the the current state of open data in the city of Chicago?
Speaker 1
6:06 – 7:22
I think we're very strong. We're continuing to make we're one of the first data portals that to have been published. Starting off early with, popular datasets such as crimes, the list of all the salaries, which is actually the most popular dataset on the portal by far, business licenses and building permits, which are all helpful, often frequently FOIAed. And recently we've been able to push into other areas that are hot topics within government. We last year released the strategic subject list to provide additional transparency around the strategic subject list program at Chicago Police Department to publish data from the new agency COPPA, which is the new, Citizen Police Accountability Board and and listing all the complaints that have been registered there. So making sure that we're still on beat to talking about and releasing data around important issues, I think we've been able to continue that progress and also to improve the usability. Last year we launched the redesigned version of the data portal which collectively incorporates feedback that we've gotten from all of our a lot of our users over the last few years. And we tested with users, casual users and also pro users. And we've been able to redesign the data portal. We're still constantly making improvements there to make sure it's also a friendly exercise.
Speaker 0
7:23 – 7:40
Speaking of the the redesign, do have a question for you about that as well. As you mentioned last year, there was the major redesign of the data portal. How did these changes look to improve accessibility and usability, and what methods, if any, are are used to quantify that sort of information?
Speaker 1
7:41 – 9:48
So discoverability was key. One of the things couple of things that we knew from our testing and surveying was, the plurality of our users, the largest user group was casual users. That is to say, people who are not students, people who are not IT pros during the day, but had a variety of different occupations that were not technically oriented in any sort of way. And so they were 45% of our user group. So we know we needed to orient more towards casual users that these were not just technical folks. But one of the problems was discoverability. What's there? And then also provide context around information as well. So there's obviously data that's available on the portal, but we also knew that people wanted being connected with apps that wasn't just around raw data. So we connected them with similar applications such as, the SnowCloud tracker, which has a feel. It's in the universe of open data, but it's not open data in its own right. It's a map showing that data. So we want to make sure we could provide people with context and links to supporting resources, including training videos, apps, blog posts explaining data that's on the portal and making sure that that could be an effective bridge. To test this out, we partnered with the Smart Chicago Collaborative, now the City Tech Collaborative and their civic user testing group, the Cut Group, where we did testing with end users in libraries. And again, these were casual users, not IT pros, to get feedback from them as well, to see how well they were able to navigate through different phases and stages. And for a long period of time, we put it out to our technology power user groups such as Shy Hacknight and long asked for feedback and we did a feedback session with our technical users as well. So we took that collective feedback, made some tweaks and modifications on it. And now actually on a year to year basis, we've allocated a bit more resources to make tweaks as we go along with the data portal as opposed to always doing these big rewrites every few years about being more responsive on an ongoing basis as opposed to doing a big project every four or five years which was this case this past time. So we think that's gonna be a more pleasant experience in the long run as well.
Speaker 0
9:49 – 10:18
It definitely sounds like you're describing what would be a a more iterative process to that instead of trying to do a whole lot all at once, which, definitely sounds like a more efficient way to go about it. In your prior answer, I, also heard you mention the Civilian Office of Police Accountability or ACOPA, as you called it. I'd be curious if you could elaborate a bit on its importance as a program and in its involvement in open data and what benefits Chicago and Chicago and Skit out of its inclusion?
Speaker 1
10:19 – 12:01
We we have a lot of user base user groups and user bases that come come to the data portal. But when it when it comes to COPPA and the data that COPPA has, it's very clear that transparency and groups that demand transparency from governments are very interested in this information and data. And so we've had a lovely time working with Copa and the new staff that they have over at Copa, really trying to think about how can we step forward with accountability. In addition to the data portal, which I mentioned earlier has complaints opened with COPA and its previous entity IPRA. But COPPA also maintains a portal which shows and publishes documentations and in fact videos and audio recordings of incidents that involve police shooting that's under review. And that's even though not on the data portal, that's another bit of a police transparency and accountability that's being provided to the public. Again, not on the portal because it just wasn't the same fit on a purely technical reason, but it's also another step. So working with them and making sure that Copa has the IT infrastructure to be able to do this sort of work to even maintain this sort of information and then working about how can we responsibly publish the information without putting at risk or threading the people who register these complaints or the officers who are subject of these complaints and have right to due process before more information is published. Making sure we work with them to be able to do that is key. It's been quite productive this past year, and we're very much looking forward to the the next few months and few years where we're gonna work with them.
Speaker 0
12:01 – 12:12
Alright. And then to look at this again at a bit of a higher level, what would you say about the Chicago data portal most differentiates it from other open data initiatives?
Speaker 1
12:13 – 13:40
So one of the greatest things that we have here in Chicago is we have this robust community who's interested in the data portal. Whether or not, again, it's groups like Open Uptown, Shy Hack Nights and folks that used to go to, Open Government Chicagoland, which is an older meetup group or other meetup groups such as Python meetup groups, R meetup groups, students and universities, you know, great crop of universities around here who use the portal. And again, the plurality of our users who are casual users that are just folks day to day just trying to find information that's pertinent to them. One of the things I think we do a very good job of and what makes that well is working with each one of those groups, trying to hear those voices and us being responsive and reflect those needs in our portal, modifying the data portal itself, and also creating other sites and other little applications to be able to help out these different user groups. Groups. It's hard to create a monolithic website that can serve all. So identifying new needs and creating other websites, for instance, when we created OpenGrid which is a map based interface that allows you to see what's happening in Chicago purely through a map, was in part to answer some of those needs because people just didn't wanna see data, but they wanted to see context of their neighborhoods, where they lived, where they would take their kids to the park. They wouldn't wanna see all this information on a single map, and and OpenGrid and other projects that we have kind of reflect that.
Speaker 0
13:41 – 13:51
Additionally, part of your role, as I've read, is to chair the Open Data Advisory Group. How would you describe this to someone who isn't familiar with it, and what role does it play? So
Speaker 1
13:52 – 15:19
So what one of the key things that we have to have, of course, is is data to put on the portal, and and part of that is identifying data with departments that's appropriate for the portal. And sometimes this comes in conjunction with new legislation or new initiatives that we have in the City Of Chicago. One of the big data sets we released this past year was and actually a very big data set which is all the taxi trips in the City Of Chicago. And we worked very closely with the Business Affairs and Consumer protection department who oversees taxis because they're the ones that collect that data as a part of a piece of legislation that was just passed a couple of years ago. So we partnered with them, explored the benefits of publishing this data to the portal both for the public who can use it and then also for ourselves as well. And to work with them to responsibly publish that portal, maintain anonymity and privacy of those who are riding in the cabs and those who are driving the cabs. But that's a small example of how we work with departments to be able to liberate this data so it's usable. But doing it in a responsible way, making sure, of course, that the data has some semblance of accuracy, that this is data that is looked at and maintained and well kept, publishing the fields that are usable, publishing the information that those departments use on a day to day basis and making sure that that information is being made available to others because that is the information that we are trying to make transparent for others.
Speaker 0
15:20 – 15:40
I I'm definitely hearing a bit of a theme involved with, whether it's adding new datasets or ensuring the accuracy of them as you do so. And that seems to be the idea of, some semblance of a change management. That would tend to come up in a conversation about data. How does this process work with, Chicago's data portal?
Speaker 1
15:41 – 16:46
So oftentimes, the data that we publish on the portal is part of a larger enterprise or a larger operation. So other aspects of my team that will build dashboards, that would do analytics, we will work with them on a number of different things where open data new throughout the city of Chicago. So we're building a new application to, help a department navigate that better. And some of that data is going to be published through the data portal because that's just kind of a piece of the package of what we deliver. It's, well, let's make sure we can you can track information better on the one hand. And then two, let's take some of that information and make it available to others, so others can share in this higher quality of data. And so we want to make sure that we're addressing a number of different needs when we're working on data because the demand for data is great. The demand for better quality data is is great as well. And so the data portal fits tactically
Speaker 0
16:47 – 17:04
in one of the many steps that we work on with different departments across the city. And if there's someone out there listening that has a strong interest in this program and wants, say, to contribute to to these goals themselves, how would they go about, doing that and working with, the city?
Speaker 1
17:05 – 18:49
So there's a number of different ways and it really depends on where you're coming from. Because again, we have a broad base of users and people and how they can contribute, can contribute in a number of different ways. One, I would I would encourage them to participate in their local civic technology groups, to go to, Open Uptown, to go to, Shy Hack Night, to go to a number of different organizations that come together and meet and they talk about data or they use data. I'm where my team, we're often at these meetings. We're listening. You can reach out to us at dataportalsityofchicago dot org if you have ideas or recommendations for different datasets that you would like to see on the data portal. And we certainly collect those. And we certainly notice when certain ideas start ramping up more and we'll prioritize those ideas as they ramp up more. If If you're more of a coder and if you're a data analyst who has some skills, technical skills that they would like to use data to help try to improve their community, you can certainly go to our data portal. There's APIs that power all of this where you can feed out and grab that information, create create applications. I would encourage you to work with a local nonprofit to see what their challenges are and maybe they need some technical advice and some help because that is something that local nonprofits often struggle with. With and seeing how you can connect the data portal with your local nonprofit or nonprofit that you like a lot and seeing how you can help them as well to make that data usable. You can also contribute to our projects. We have a number of open source projects, many of which use open data. I mentioned OpenGrid earlier. We have an R package called R Socrata, which is a wrapper around Socrata that allows you to program in R and easily interface with the Socrata data portal, which is what our data portal is. And you can find those projects at github.com/chicago.
Speaker 0
18:49 – 19:14
Excellent. And I will definitely, make sure the links you mentioned are in the, episode description for those listening in case they'd like to, click and learn a bit more. You started earlier in our discussion mentioning that the state of open data in the city is strong. So knowing that that's the baseline, I I do wanna ask, what changes, if any, do you believe are most needed regarding how the city handles its data?
Speaker 1
19:15 – 20:40
First and foremost is that we're the the job is never done. So we need to continue to expand and explore, new datasets that get published to the data portal. There's a certain arms race that always exists between cities. We're very competitive with each other in the best of ways. Some friendly competition, we're always trying to do a better job than Boston, San Francisco, New York City in terms of the breadth of offering, the quality of data, the depth of data, different and new interesting categories. We're always working with that and we're always wanting to expand new data sets. Usability is still key. We have not solved all those different user groups and what they need to be able to do. One of the things that we've noticed that still befuddles users to our data portal, sometimes they go to the data portal because they're not actually seeking information but they think it's a way where they can apply to get business licenses or it's a place where they can submit a three zero one request, not just look at three zero one requests. So in fact, working with other cities, we are looking at ways that we can help funnel users to the correct destination. City websites are very large and monolithic because they're covering a lot of different topics. So so there's sometimes there's confusion by the end user. What are ways that we can do to improve that to connect people with the websites that they need need to visit?
Speaker 0
20:41 – 20:53
You mentioned there that bit about friendly competition between cities and pushing each other to improve a bit. Is there anything that you've seen other cities mimicking as they push their own open data programs?
Speaker 1
20:54 – 23:16
We we constantly are mimicking from each other. I'm a cofounder and and just recently was the first chair and now San Francisco is the chair of something called the Civic Analytics Network, which is based out of the Harvard Kennedy School. And Chicago is one of the cofounding cities that brought together different cities, chief data officers, chief analytic officers, people who had their analytic enterprises in different cities and a few counties, in fact. When we get together, we talk on the phone once a month. We meet in person at Harvard, twice a year. We're backed by a grant from the Arnold Foundation, which generously put together money to facilitate us coming together. We've put out planning agendas around what we feel the future of open data should be. As I alluded to earlier, we talk quite often to see what other cities are doing. So we're always collecting new notes from different cities. It can be minor such as what are different ways that we can anonymize data and publish it to a portal that's responsible and has proven to be worthwhile is an example of an area that we've come together. We've also come together in terms of demanding better support for geographic data on data portals. And we feel that that needs to be treated as a first class data set and that has not been done so far. And we've also enjoyed insights from our different cities. One of the things I found very insightful was the city of Boston just recently redesigned and relaunched their data portal. One of the more interesting insights that they did during their user research was that they retitled their data portal and instead of data portal they call it Analyze Boston because in their user research that they found users did not associate data with being information. But if you use the word analyze, such as analyze Boston, they found more and more users were saying, hey, that's a place where that has something I might be interested in. And they're going to be more likely to visit Analyze Boston because they feel they're more akin to the word analyze opposed to data, which sounds technical and it sounds like not something that was intended for them in the first place. So I I'm not sure how many other cities are gonna change their name to analyze to their respective cities or change their data portal to analyze, but these are the sort of insights that we can, take with us.
Speaker 0
23:17 – 23:40
It is really fascinating how what you name a program can really set the tone for how people interact with it and what their, interest in it is. Going on, I wanted to ask you a couple, I guess, bit more lighthearted questions about the program. The first of which is, what do you think is the most underappreciated dataset on the pro on the portal?
Speaker 1
23:41 – 25:45
You know, that's that's a that's a fantastic, and and utterly fantastic question. So I'm gonna have to think here for a second because there there are quite a few. As I mentioned earlier, salaries is is the most popular dataset usually followed by the crimes dataset. There's a a fantastic love for the, problematic landlords' dataset or the bad landlords' dataset where people can go and find information about landlords who constantly caused problems for others. One of the general categories of data I think is underappreciated is all the maps that we have on the portal. In fact, most of the data portal are maps and people really need to go through and look at all the different sorts of ways that the city gets divided up and the sorts of different ways that the city gets formed. And in fact, if you look really closely at the city boundaries, a lot of people know about Norwood and a couple of suburbs that are completely engrossed in our city. But one of the funny stories that we had was looking at the data portal, we realized that Mount Greenwood Cemetery was not part of the city of Chicago. And we did some research. We weren't sure if this was a mapping error, if this was something serious. We actually contacted local historians and said, is Mount Greenwood Cemetery part of the city of Chicago? And it wasn't. They came back to us and they said, well, largely, the city of Chicago would never take over cemeteries. The cemeteries were always an unincorporated land because back then, a lot of people would sell the same plot over and over and the city didn't want to have anything to do with that. But Mount Greenwood Cemetery because it was privately owned for a long period of time, never got absorbed by the city of Chicago. And so there's this bit of a hole of this other hole in the map besides Norwood and some of these other, Harwood Heights, these other suburbs, you have this little square mile by square one square mile area that's not part of the state of Chicago. So if you look deep in some of our data sets, sometimes you unwind a funny story that you first think is a funny data error but it's actually something that repres represents the narration of the formation of the city of Chicago.
Speaker 0
25:46 – 25:58
Yeah. That that is definitely true. That sounds like quite a story to unravel with what started as just a peek at some data. That that is really interesting. Do you have a favorite application that makes use of the data portal?
Speaker 1
25:58 – 27:15
So a couple of my favorite applications that I I like to point out is, some old applications, early on applications that were developed. One was wasmycartoad.com, where you can go and you enter in your license plate and you can see right away, you know, whether or not your car had been towed or not. So it's it's one of those applications where you just go to it's it's amazingly simple. It's very illustrative of how these very simple things can be, be used by residents to just make life a little bit better. And a a sister application to that, which was an application to find out where you can park your car, and it takes in all of our street sweeping schedules. And it's a website called sweeparoundus, sweeparound.us, where you can go and you can enter in your address and you can get text message alerts. You can get email alerts or just simply download the calendar to, to Outlook or Google Calendar, either one, or iCal and see when is your suite gonna, street gonna be swept. So you can go move your car. And if indeed you did get sweeped and and somebody towed your car, you can go to this other website and figure out where it got towed to. So these are things that affect your life day to day, but they're incredibly important because it just makes life a little bit easier.
Speaker 0
27:16 – 27:38
Yeah. That definitely sounds like those applications, help folks out a great deal. Now with, I guess just to have our concluding questions, looking a bit to the future. The first one I'd like to ask you you is if there's a listener out there that is now interested in a career in data science and analysis, is there any advice that you would give that enterprising and individual?
Speaker 1
27:40 – 28:52
I the breadth the breadth of knowledge is is what's key here. In governments, in cities, at all levels of government, you're gonna deal with almost every single topic. So meetings can bounce between taxi legislation and taxi data to things like street sweeping and then bounce over to things like remediation for lead poisoning and reconstructing homes to reduce the chance of lead poisoning to the rate that mosquitoes are reproducing and where mosquitoes might pop up, to what E. Coli levels are in Lake Michigan. So you're going to deal with just about every issue from infrastructure to health to people to animals to insects. All these different things are things that you have to deal with. And you're dealing within the context and research within data. So obviously, you need to know how to use technology. You need to know how to do data analysis. But I mentioned earlier, that's easy to learn. That's something that gets imparted on a lot of individuals. The struggle is just connecting those with individuals. The struggle is just connecting those with actual use cases and and having some semblance of education in these different areas so you can ask good questions so you can reasonably connect data and technology with actual use cases.
Speaker 0
28:53 – 29:00
Excellent. And, as far as looking into the future goes, how does the future of open data look to you in the city of Chicago?
Speaker 1
29:02 – 29:46
So I think we're again, we're gonna be very strong. We have a lot of work to do in terms of opening up more and more data but we're on a good path in progress to be able to do that. The biggest challenge I think we have in front of us is again making it more and more usable, expanding that user base, expanding it to people who don't want to deal with looking through loads and loads of data but are trying to find information that's valuable to them. That's a very challenging, usability challenge in front of us, but we're looking forward to the challenge. We think there's a lot of people interested in where there's a lot of interesting conversations that we're having about trying to pioneer that sort of next phase of open data of of greater usability and and expanding our use case to, more and more residents of Chicago.
Speaker 0
29:47 – 29:52
And to, draw us to a a close here, do you have any final thoughts you'd like to leave us with?
Speaker 1
29:54 – 31:13
What what we find very enjoyable and what also helps the city of Chicago is the participation from residents and visitors and programmers and students, to local organizations to show up at local meetups, to show interest in things like open data, and then trying to use that data. And very importantly, using that information, that data to be useful for residents to nonprofits in your community, to your local church. If we can make those connections and Chicago's residents and the skilled folks that we have here who work in technology and data, if they can help us complete that last mile of delivery to the end user to work out on these particular use cases, what we're going to be able to do is in effect and asynchronously asynchronously have governments and people working together to improve the city together. This has always been what the interaction between people and government has always been for the last hundreds of years as long as cities have been around. But this is gonna allow us and using technology and data is gonna allow us to scale it up in a way that has never been able to do been done before in human history. And then we can be truly optimistic about the future of Chicago and the future of cities by us working together.
Speaker 0
31:14 – 31:43
Tom, thank you so much for taking the time out of your day to talk to us. I have no doubt that there will be listeners out there that take a tremendous amount of inspiration from what you've shared with us today. So thank you so much. Thank you very much, and thank you for taking the time as well. That concludes this episode covering open data. You can follow us on Twitter using the handle at civic tech chat. Visit us on the web at civictech.chat, and subscribe to the podcast for content updates wherever it is you download your podcasts.