Speaker 0
0:10 – 0:12
Welcome to Tech Talk. Bye.
Speaker 1
0:13 – 0:13
CT.
Speaker 2
0:16 – 1:17
Welcome to CDT's Tech Talk, where we dish on tech and Internet policy while also explaining what these policies mean to our daily lives. I'm Jamal Magby, and it's time to talk tech. In today's episode, I'll chat with coauthors of the book, Data Feminism, Catherine D'Ignazio and Laura Klein. After, you will hear from Larry Norden, director of election reform at the Brennan Center, as he talks election infrastructure. So let's get to it. Data is fundamental to the modern world. From economic development to health care to education and public policy, we rely on numbers to allocate resources and make crucial decisions. But what happens when the data collected doesn't represent the community it is intended to serve? And here to answer just that, coauthors of the book, Data Feminism, Catherine D'Ignazio and Laura Klein. Ladies, thank you both for being here. Thank you for having us. For having us. So for our listeners who don't know, can you explain exactly what data feminism is and what inspired you both to write the book?
Speaker 0
1:19 – 3:08
Data feminism is really a book about power and data science. You know, when you think about it, the data is really powerful. Right? We've seen probably all of us can pull examples from the headlines about some study or some data set, you know, has been used to say, look. You know, here's the problem. Or look. This is the change that we should make. But we've also seen the opposite. Right? That data collected by corporations is wielded against the power lift. We see this, like, in issues of Facebook, you know, controlling the advertise the advertisements that users see. We see this all over the world where corporations are able to track our purchases and then sell us or not sell us certain things. And we wrote this book because we realized that while data has been sort of have made these issues of power, seem very present and very new, actually, the imbalances that they represent are not new these issues of oppression about who has power and who doesn't, these are as old as time. And, you know, feminism is this body of thought and also action that for just as long, has been thinking about how power works in the world. Right? So thinking about issues of gender inequality, but also, you know, as you think about the reasons why people experience issues of gender inequality, it also comes back to issues of power. And so that's sort of the framework for the book, you know, that there are these issues of power all over data and through data science, and that, you know, we, both Catherine and I, have spent a lot of time both thinking about feminism, being feminist, being women in the world. And we think that one of the ways to try to, sort of rewrite the balance of power that we see, happening right now is to learn from feminism, to sort of take the lessons of feminist activists and feminist thinkers and try to figure out how they can be put to use in a context in involving data.
Speaker 2
3:09 – 3:14
So how are the powerless or underrepresented communities affected by lack of data or sets of information?
Speaker 3
3:17 – 5:27
Sure. This is Catherine. I'll take this one. So, there are a couple ways that we talk about in data feminism that underrepresented communities are adversely affected, by data. And so one of those and they're they're kind of two flip sides of the same problem. So one of the things we talk about is this problem of missing data. Those are the words of. And so missing data problems are things where we have really important issues at stake, but nobody's actually collecting data. And this happens over and over again in relationship to, for example, women's health or, women's mobility in the city or, the Mimi highlights cases of, police killings of citizens. Like, there is no comprehensive bill, federal database of these things. We have also highlight in the book the case of maternal mortality data. Until recently, we didn't have have good federal data collection efforts on these, super important issues. And so those kinds of issues disproportionately affect, women, people of color, marginalized groups, and so on. But then so that's like a problem of, underrepresented communities being underrepresented in a way in data. But then on the flip side of that, like, the other problem that happens is being overrepresented in data. And that's where we get into things like surveillance or, datasets that people say, oh, this is, the crime dataset for a particular city, for example. Well, no. Actually, because policing practices are not, policing all of the neighborhoods in the city equally. Actually, black and brown communities are overrepresented in those datasets because they're overly served in all of them, done by the police. And so, these are kind of two flip sides of the same problem, but, like, the the root cause is the same. It comes back to, as Lauren was saying, these power imbalances. And so they're they're kind of equally bad problems, but they're also equally important problems that we have to be
Speaker 2
5:28 – 5:41
aware of and working on. Yeah. That's that's very true. And I and I wanna move on to, so with this data we collect, is is it ever just does it ever represent just black and white? Is it ever just purely, purely one or the other?
Speaker 3
5:43 – 5:51
Lauren, you've gotta take this because I think Lauren wrote in the book her her words were that it's double edged sword. Okay. And so I feel like you could address this one.
Speaker 0
5:51 – 7:37
Yeah. Sure. I mean, I guess, you know, that's the you know, this is really at the heart of the issue. Right? It's that, you know, working with data is really complicated, and it's always contextual. Right? You know, I teach Catherine and I both teach these classes involving data and, you know, data science and statistical methods to our students. And this is something that I just say all the time. Right? You really need to understand the context sort of that surrounds every single step of the way, in these data projects. Because if you don't, then you really you you truly don't know what you're looking for. You don't know what questions to ask. You don't know which questions have already been asked. You don't know the decisions that were made or the decisions that were not made. There was no one knew to make those decisions, you know, even in the process of data collection. And so that's really, you know, among the takeaways from the book is just that context matters. Right? And this is where you start to see that really, you know, really, it's very hard to find, you know, a data science project that's unequivocally good. You know, because under the Excel, probably, like, Amazon was processing your data. You know, even if that's the, you know, if that's the only thing you find. And this looks like, like well, I guess there are actually some projects that are probably equipped, like, unequivocally bad. You know, but most projects do fall into this middle ground and really what and sort of what stops them from descending into reproducing, you know, racism or sexism or other forms forms of oppression is the people working with the data being attentive to, like, to those possibilities. Right? So what information they're getting from the data, from their models, you know, from their, visualizations or whatever of them, and then what information they're not, and they need to stop and think, you know, why is this happening? What am I seeing? Who have I not talked to? Who's not represented in these datasets? Those are equally important questions to ask.
Speaker 2
7:38 – 7:48
Wow. So when collecting this data and when once we, once we have all of this data, what part does emotion play when collecting and and visualizing all of this data?
Speaker 3
7:50 – 10:37
So I'm glad you brought up emotion because, actually, one of the principles in our book is to elevate emotion and embodiment. So, we feel like emotion and data are not words that we typically put together. Right? And in that particular chapter of the book, we talk about how what's really been valorized if you look at data visualization practices and, people have written about data visualization and kind of promote it is a kind of very minimalistic, view from above in terms of data visualization. So, a lot like, some of the technical writers talk about, you know, having, you know, no persuasion, just being kind of a purely objective sort of representation of the facts with no decoration, you know, nothing to try to sway the, sway the reader or the the looker with any of those sort of pesky emotional things that that intervene. And we say, well, this is a completely false binary because first of all, there's emotion is involved all the time. And actually, persuasion is involved all the time with all of our forms of communication. Aristotle's actual definition of rhetoric had an amount of persuasion at its core. So we are always being persuasive even if we're not directly trying to, like, manipulate in terms of, like, propaganda or something like that. Mhmm. So we kind of figured, well, like, what happens when we say emotions are valid? Like, let's bring the emotions into data visualization. What do we gain gain by by valorizing still valorizing reason and logic and, you know, we're not, like, abandoning facts or anything, but also valorizing emotion and saying and we show some of the sort of experiments that have gone on in that regard. And there's just really amazing artistic projects. There's amazing design based projects. Moving image and sort of animated projects, that use different, that use more emotional levers to move us to understand something using data and to possibly, like, act on that understanding using the data. So, we say basically, like, elevate emotion, and look what we can get. And, we also introduced a term from Kelly Dobson called data visceralization to say, well, like, what if we even open it beyond just a visual? Like, the visual is one sort of sensory register, but, like, what if we open it up to, like, our whole three d facial register to olfactory data, you know, with the auditory data? So, like, how might we actually, like, visceralize data? And what might that do for us in terms of learning cognition and understanding and so on?
Speaker 2
10:38 – 10:46
Nice. So with that, what are some strategies data scientists can use to elevate emotion and combat data bias?
Speaker 0
10:48 – 12:02
That's such a good question. We should we should say you should read our book, Data Science. Right? So we should have our, yeah, we do but, you know, I'm real I mean, really and we can talk about this a little bit more a little bit later. But, you know, our our goal was, you know, to in writing the book was to help people do better data science. Right? And so we we do spend a lot of time pointing out problems, but our ultimate goal is to sort of think about how we can do better. But one of the things that we do say in the book is that, you know, there's so much talk about, bias and data and bias algorithms and all this stuff, and, like, what we can do to fix them and sort of, you know, combat the bias that we're seeing. But that's kind of asking the wrong question. You know, really, the reason why you get biased data is because of larger issues of oppression. Right? And so viewing, like, biased data or an algorithm or something that seems to be racist, it's like a thing you can fix after the fact. It's kind of a not that's not a good enough solution. Right? And rather, what we argue for in the book is to look at larger systems of oppression and ask sort of, like, why the the data that went in might be biased to begin with. Right? And so an example we talk about in the book, you know, we have this there's a line that I mean, we actually I feel like, you know, Ben Green says it. The AI Now Institute says
Speaker 3
12:04 – 12:04
says it. You you you hear this increasingly,
Speaker 0
12:05 – 14:34
that, predictive systems, they don't predict the future, but they instead they reproduce the past. And that's so, so true. Like, there have been a couple of instances recently of what Cecilia Noble called digital redlining, you know, where different groups of people see different results online, depending on their mostly their geographic location. But if you know what redlining is, right, this refers to the historical practice of, denying black people home loans on the basis of essentially, on the basis of the neighborhood that they lived in saying that that was, like, too risky of a neighborhood to give a personal loan. But, really, you know, the all of those neighborhoods were ZIP code was a proxy for just being racist. Right? Right. And so, you know, we see all of these systems now that are based on geographic data and making decisions based on geographic data, but they have their roots in these discriminatory practices that just sort of, again, like, weren't the start of the problem, but really accelerated the, sort of divergent investments into different places in our country. And you see this everywhere. So everything from, like, when you try to get car insurance, and I see this in Atlanta where I live, which was an actual segregated city. If you live on the East Side versus the West Side, you pay more for car insurance if you live on the West Side, because that's viewed as a riskier neighborhood. Right? And that has to do with is of historical location based discrimination. Right? Right. You also see this this in fairly, things that you wouldn't expect to be discriminatory, discriminatory, things like Pokemon Go, where there are more Pokemon Go stocks in Mhmm. Whiter, wealthier neighborhoods than there are in blacker, poorer neighborhoods. And, again, you might think like, look, you know, the creators of Pokemon or whoever designed that algorithm to generate the stuff, they're not trying to be racist. Right? But they have produced a biased algorithm because they did not stop to think, you know, what are the larger structural forces that are contributing to this geographic data that I'm feeding into my algorithm sort of giving me, differential outcomes. And so, really, I think the larger takeaway should be, you know, like, let's stop and think why is this bias happening and rather than sort of patch up, to say, like, what can I do instead to try to overturn or sort of, do my part to somehow try to dismantle some of the sort of structural differentials of power that we see?
Speaker 2
14:35 – 14:44
Right. Wow. So moving forward, what are your hopes, for data science and data ethics moving forward? What do you you hope to see?
Speaker 3
14:45 – 16:54
So, I mean, I've been really excited that that the the work that's sort of in this space is growing. And by by this space, I mean a kind of a critical pushback, a kind of analysis of power, a beginning of an understanding of how these larger oppressive forces like sexism and racism are entering into these datasets. Even just from the time when Lauren and I started writing this book and, like, I think when we first did it, like, late twenty sixteen to now, I just feel like there's been a huge explosion of really amazing work that's, like, really, you know, pushing the envelope, along a lot of in a lot of different ways, whether it's AI with, like, the work of Joy Buolamwini on, facial recognition and facial surveillance, or it's, housing data with the anti eviction mapping project or it's, counter cartography like the work of Margaret Pierce or it's these really amazing books like the Pia Noble, Virginia eubanks, and so on. Or even conferences, like, conferences like the fairness, accountability, and transparency conference. I think it's becoming a multidisciplinary folks who are working to kind of come and encounter each other. Like, I was actually just hearing from a student of mine who went who went to the conference this year in Barcelona, that she was she's a computer science student. She was like, you know, it was really exciting, but it was very confusing because you have these wildly different perspectives of the table. Like, you have someone doing, like, a technical review of a risk assessment system, and then you have somebody coming up and saying, like, let's abolish all forms of quantification. And then you have somebody else presenting, like, a data visualization. And, like, it's it's a little confusing, but it was kind of awesome to have all those things in one hand. And so, that to me is what is helpful is that that there's just so much more there's there's so much more work that's happening in the space and some of these, silos in certain cases are getting broken down. They're they're not getting broken down in other places. So that's maybe where we need to do more work. Yeah.
Speaker 0
16:55 – 18:51
But, at least there are these these spaces that are beginning to emerge where we can, kind of collide all of these things. But, Lauren, what do you think in terms of You know, the risk of being cliche, I sort of feel like the features in our students. You know, I'm teaching this class this semester called feminist data science, and it's been so heartening to see you know, it used to be that you had to convince people that, you know, algorithms were biased or that there were things like oppression being encoded into computational systems. And every single student who shows up in the class, like, that is a given. Right? And they began the class by asking, like, what do we do? Tell us how. What's the next step? We wanna change things. And, you know, my answer to them, which I think is exciting but they sometimes think is a bit overwhelming, is, you know, like, I'm not really sure. You know, the path is not clear. Like, I can't really tell you exactly what to do. But here are some models that we can follow. Here's some of the good work being done right now. But I'm just really heartened that there are so many people who want to take their, you know, all of the their skills at their disposal. And I would also say, like and their humanity too. You know, all the students, I think, really bring their own life experiences into this class and their experiences of people they know and having done, you know, internships and volunteer work. You know, like, they really are bringing their whole selves to this challenge. And that, to me, seems like, you know, the right approach and the necessary approach, and it's one that, you know, I think to Catherine's point earlier about these disciplines previously being so siloed. Mhmm. I I do think that an increasing number of people are recognizing that, you know, these are really complicated problems that require, you know, our whole selves and everything that we know and everything that everyone we know knows and then the things that we don't know and we need to ask others, to help us. And I think that approach is really what's gonna help us, figure out more equitable solutions in the future.
Speaker 2
18:52 – 18:53
So there is hope.
Speaker 3
18:54 – 19:23
Really? Totally hope. Yeah. We we and we try to write a hopeful book. I mean, we're deeply critical in certain ways, but, it was our goal to try to show some path forward and some, a lot of the stories in the book are not just stories of bad stuff happening in the world, but stories about people that are doing things differently and, like, models that we wanted to sort of elevate and say, hey. We think this is the seed of a really good future model for how to work with data.
Speaker 2
19:24 – 19:36
Nice. Well, Catherine and Lauren, it has been a pleasure having you both, and thank you so much for joining us. Data Feminism is currently available for preorder through MIT Press and amazon.com. Ladies, it's been a pleasure.
Speaker 3
19:37 – 19:40
Thank you so much. Our pleasure entirely.
Speaker 0
19:40 – 19:44
Thank you. Thanks so much. It's been great to talk to you. Thank you.
Speaker 2
19:48 – 19:52
Next up, Larry Norden, director of election reform for the Brennan Center.
Speaker 1
19:53 – 31:15
Alright. Well, thank you, Anne. And, thanks to all of the groups, involved in partnering with the Brennan Center, on the event today. I think it's gonna be an interesting and informative, conversation. I've been asked to let's see if I can oh, yes. I've been asked to present an overview of some of the biggest, challenges, facing, American election infrastructure. And I'm gonna discuss a little bit about, why they're not insurmountable challenges in 2020, but also what we can do in the longer term after 2020, to to start making some bigger changes, to secure our election infrastructure. It isn't fair, but I'm gonna use Iowa, and the caucuses as an introduction to this topic. I was a little bit worried about Nevada over the weekend, but, fortunately, I didn't have to overhaul my slides last night. It didn't seem like there were, too many big disasters there as far as we know. The reason that it's I think it's not fair to use Iowa as an example before I go ahead and do it, I should say, is, of course, there there was, no cyberattack on on the, infrastructure they were using there as far as we know. And and, of course, as others have pointed out, the caucuses, were run by a political party, in this case, the Democrats. They're not run by, professional election officials, by the by the states or the counties as, our primaries are and as the general election in November will be. Nevertheless, I think there are some important lessons, going into, 2020. And the first is that, vendors, are, a point of vulnerability in our elections. We often on Capitol Hill, when people talk about, our election infrastructure and election security, they talk about election officials, they talk about, states and counties. But much of our election infrastructure, is created and supported by, private vendors. They touch nearly every aspect of our elections. So folks may know that there are are three big, manufacturers of voting machines in The United States, and and they control about 90% of the market, for for voting systems. But there are certainly hundreds, of additional companies that maintain and program these machines, that build and maintain voter registration databases and and electronic poll books used to determine who's eligible to vote, and that perform other essential functions, for our elections. And yet, unlike, other vendors in, other sectors that have been deemed, part of critical infrastructure, like dams or energy or defense. There are no federal, regulations, over these vendors. And in fact, there's been, very little, federal oversight of these vendors to date. What this means is, we don't even have a full picture of how many vendors, there are working, on our election infrastructure, either manufacturing or servicing. We don't know where they're working. We don't know what kind of screening they do of employees that perform critical functions. We don't know who owns them. Maryland famously or infamously learned, in the past couple of years that, that, a vendor for the registration for the voter registration systems was owned by a Russian oligarch. They only found that that out because the FBI informed them of that. We don't know what their supply supply chain practices are. We don't know where their parts come from, and we don't know what kind of, internal cyber security practices, they enforce. So election officials can know what kind of security practices they put in place, in their offices, but they really don't know when they're, dealing with, vendors, and in purchasing products or services from them, what they're doing. They can they can ask and they can trust them about what they're doing, but they really can't know. We're not gonna get that problem fixed before 2020. But I do think that there's a bipartisan interest in, tackling this problem. There there was a a hearing, at at at the house administration committee, in the past couple of months where both Republicans and Democrats expressed concern about this issue. And certainly, when I talk to election officials of both parties, they say that this is something that they want to address. So I said, I don't think we'll solve this problem before 2020. I don't think that means we need to be despondent about 2020. The Department of Homeland Security, election officials, state and local governments, have all done a lot to secure our elections, and our election infrastructure, since 2016. And, of course, for the first time in more than a decade, congress, has provided money, to the states to help secure, their their systems to to spot and patch vulnerabilities in those systems, that they purchase from these vendors. Nevertheless, I do think that this is a real weakness, in going into 2020. And the solution, as always in elections, is to hope for the best, and to prepare for the worst. So that brings me to the second lesson from Iowa that I wanna talk about, which is that, a great danger of cyber attacks, is system wide failure. No election is perfect. There are always technical problems, that we read about and see in elections. But if the, reporting app in Iowa, you know, were just a a few glitches and only some precincts had, trouble, reporting their results. I I wouldn't be here talking about Iowa today. The problem in Iowa was that, the the failure was system wide, and system wide failure is different. And it's a danger of, cyberattacks that, entire communities or jurisdictions, can be targeted for system wide failure. A system wide, attack could be could be particularly damaging if, unlike in Iowa, it prevented people in large numbers from voting or having their votes accurately counted. So that means, systems like voter registration databases, electronic poll books, which are used to determine eligibility when voters check-in at the polling places, and of course, voting machines. The answer to this vulnerability is to build in redundancies and redundancies on the redundancies to, ensure, resiliency. So here's an example of one of those pieces of infrastructure that I was talking about, electronic poll books. You know, what might happen if this system was attacked or failed? It it might not start up. So we'd have difficulty checking people in. It might have inaccurate information. You get long lines. People get told that they can't vote a regular ballot. Maybe they're even, sent away. We've seen examples of this, in nearly every federal election. But at a at a county or statewide level, it would be a real mess, and I would argue a bigger mess than, the problems that we saw in the Iowa caucus. So, what what kinds of things can we do, to to ensure resiliency under those circumstances? Well, there are 41 states that use these electronic poll books. Only 12 of them, require to have in the polling place a paper backup for these, electronic poll books. That seems like an obvious solution, having having something that's not, on this tablet to go to, if the system fails. And, of course, even if, you have a paper backup, it's possible that the paper backup itself could be, corrupted in some way. And there we have a federal solution for that. The voter registration databases, is, infiltrated in some way. The federal solution is provisional ballots. We can we can have people vote and go back and check later, whether or not there was some problem with the data that we had. So that's a really good federal fail safe. Unfortunately, most states don't have any minimums on the number of provisional ballots that are required in polling places, and we have had instances in the past, where polling places ran out of provisional ballots. So the Brennan Center has recommended that, every polling place have about two to three hours worth of, provisional materials for peak voting, to get through if there's some kind of, system wide attack on on, the the database or on e poll books. Of course, voting machine failures themselves could also be a problem. There are 20 states, that use electronic voting machines to, cast or mark ballots. And they're, again, emergency paper ballots that can be broken out so that people can vote on on them are key, but many states don't have minimums again for, emergency paper ballots, in case of that kind of a failure. That brings me to the the third and final lesson I wanna talk about in Iowa, which is, that, paper backups are essential. If voters in Iowa had voted on that app that failed, instead of it just being a reporting app, we really would have had problems in Iowa. There there there would not have been a record to go back to, that people could have trusted, we might have lost those votes entirely. We still, unfortunately, have states, that have, oops. We still have states that have, are using paperless, voting machines in The United States. And, this is despite the fact that there is near universal agreement and there has been since at least 2016, that we need to get rid of these systems as soon as possible. The good news is that we've drastically reduced the number of paperless machines that we use in The United States. We've gone from about 30,000,000, people voting on these paperless systems in 2016 to, I would guess, less than 16,000,000, coming up in this new election. And there really aren't, as you can see from this map, no battleground states, that will be using, paperless machines, in the 2020, election this November. One challenge is that, in addition to having paper, we really should be routinely looking at it to check, the totals that the software are reporting. And, only about half of all states require, that kind of review of the paper before certification and even fewer, check a statistically significant, number of the paper ballots. So I'm just gonna wrap up by saying, well, that all may sound dire. I think the good news for, for 2020 is that all the things that I talked about as important resiliency measures are things that can be done in 2020. It's not too late. It might probably not even too late, for the primaries and certainly not, for November. Getting back up paper poll books, in the polling places is something that is very accomplishable. Having enough minimums of emergency paper ballots and provisional ballots, is something that we can do this year, and even things like conducting post election audits are are things that we can get done in time for the November election.
Speaker 2
31:22 – 31:51
That's it for this episode of Tech Talk. For the latest on what CT is doing to create a vibrant digital future, follow us on Twitter, like us on Facebook, or visit cdt.org. If you're interested in making interference free elections a reality, be sure to visit cdt.org/elect sec month. Also, don't forget, CDT's annual tech prom is taking place April 23 here in Washington DC. For more information, please visit cdt.org/techprom. I'm Jamal Magby. Thanks for listening.