Yakov Feygin and Nick Vincent: On Data Dividends

Speaker 0 0:02 – 1:13

This is a RadicalxChange production. Today, I welcome to the podcast Nick Vincent of Northwestern University and Yaakov Feigen of the Berggruen Institute. The backstory to this conversation is a lengthy research collaboration focused on how the value of data gets captured, and with that in mind, how to design attacks that would fairly redistribute it. Along with researchers of Brent Hecht, Hanlin Lee, Luisa Scarcella, Shirod La la, and others, Nick, Yakov, and I worked on this together for much of 2019 and 2020. You can see the results of our collaboration at datadividends.org, a proposal for a simple, eminently implementable tax that would go to the heart of the economic distortion caused by the data economy. In this conversation, we focus on how data and other assets get their value, compare data policy to the industrial policy of the Depression era, and much more. I hope you enjoy the conversation. I am Matt Pruitt, and this is Radical Exchanges with Nick Vincent and Yaakov Feigen.

Speaker 1 1:20 – 1:54

So I thought I thought we should start by introducing the two of you a little bit. We have worked together a lot over the past year, and I've gotten to know you guys well in sort of a quasi professional setting. But I would I'd love to hear, in your own words, sort of what it is that you do and, you know, what, how you found your way to to the work that you're doing now. So maybe we start with Nick. Cool. Yeah. Sounds good. Okay. Yeah. So I'm, Nick Vincent and I'm, right now a fourth year PhD student at Northwestern,

Speaker 2 1:55 – 4:35

in the technology and social behavior program, which is a kind of unique interdisciplinary joint program. So it's half in the school of computer science and half in communication. And, I guess the research areas that I kind of, like, identify with are, are computer science, but also human computer interaction, machine learning, and social computing. And the big the big question or the big, broad overview of what what I'm interested in is trying to measure the value of people's data and and make them aware of it. And then develop tools that make it easier for people to kind of collectively act around the value of that data. And so I guess we'll be we'll talk more about that, all throughout here. But, the idea is that maybe the public can, change the can basically act on these powerful intelligent technologies like AI that have started to dominate a lot of, media discussions of technology and kind of take advantage of the fact that ultimately the public and and we as a collective are responsible for the success of these technologies. In terms of what I do to do it do in, like, my day to day, it's mostly writing research papers, and that kind of ranges from collecting data about online platforms, about search engines, platforms like like Reddit, Wikipedia, Stack Overflow, and also doing experiments that, draw on research methods from machine learning. So setting up models and training models to do machine learning tasks and looking at how the performance is different in different situations. And so I've kind of adopted a lot of methods from from machine learning and, social computing for that. That's about it. Anything else you had in mind? How'd you get so interested in data? Oh, yeah. What is it that brought you into that? So I guess it started with, or like, this is also kind of why I ended up at this PhD program. I was interested in the broad topic of, intelligent technology, exacerbating, wealth and economic inequality and the many downstream effects that come from that. So before I was a PhD student, I worked as a web developer for a while. And I also did research, when I was an undergrad, I did some research in, machine learning for neurology at, at my school, which was at UCLA in California. And, I also did research on acoustics, signal processing acoustic data, at the Naval Research Labs in San Diego. And so I was really just doing like, this research was very much applied and mostly just math and and coding and, mostly just signal processing, addiction specifically. And I was really I wanted to kind of understand how data was broadly more broadly connected to to society and to things like, economic inequality and its downstream effects. So that's kind of the, I guess, the trajectory that brought me to this this PhD program now. Was there something you noticed about the way that

Speaker 1 4:35 – 4:51

data was being used in the work that you're doing before you, got into this PhD program, for example, that made you think, there's some there's a deeper question here than most people realize?

Speaker 2 4:51 – 5:53

Yeah. So I guess it's it's something that I think is kind of like almost a running joke in for many people in the machine learning community, which is that the the day to day work of a data scientist or a machine learning engineer is just oftentimes utterly divorced from the actual, process of collecting and creating data. So it's really easy to spend lots of time trying different models out and different hyperparameters and doing, graduate student descent and stuff like that, without ever really thinking about the activities and the social context and all the things that had to happen to get your data in the first place. And that's not that's not quite as applicable to things like, like neurology and and acoustics as it is to maybe, like, social media data, but it still is. Even even these, things that seem like really highly specific domains are are produced under these really complex, social dynamics and these, I mean, with that risk of sounding like, you know, saying we live in a society, it's really easy to do the work of data science and not think about the fact we live in a society. So I think that was kind of a factor Yeah. For me. Alright. Let's put let's put a pin in that. I would love to,

Speaker 1 5:54 – 6:07

dig in on the difference between social data and data for neurology or something like that. That's that's super interesting. But, Jaco Feigen, tell us a little bit about yourself.

Speaker 3 6:08 – 9:19

Well, I keep this with it. Probably the oddest background you can find to it. I'm actually an economic historian, used to be an economist and then became so institutionalist that became a historian. And I came to the I never worked on data or honestly was interested in data until I moved to California and was working in a and I'm still working in a policy shop that has a lot of, work with California politics. And, basically, my first week at this job, my boss came up to me and said there's this discussion of a data dividend in California. We think there might be something there, and we want you involved with it. And I was shocked because I've I've never thought of the issue, and I honestly was pretty skeptical of any kind of social policy that's just a paper or for, you know, using an individual's data. I thought it could actually be worth that much. Right? But that's kind of how I fell into it. And, you know, when I started looking through it, I'm like, I thought they were really interesting historical analogies. As I started going through the literature, especially actually the work of, what's his name? Oh, god. I'm sorry. But whatever. It doesn't matter, but I'll I'll come to me in a second. But the analogy to utility problems and to the installation of new technologies really kind of made something click in my head. And I like, what I thought is, like, well, look, this isn't just a problem of classical valuation, which is, Mac, you kind of made me think about. Right? This is a problem of how to understand the social, I guess, value of a collective resource, which is for an almost impossible on some level economic problem with conventional economics. When we think about institutionalism, right, the old economics in the nineteen twenties to really the fifties, right, that created the New Deal and a much of the American state actually. It still functions. It's not that hard of a problem. It's a problem with utilities with the proper uses of law to create public control. Mhmm. And that kind of when I thought realized that, I'm like, I began to think that this went from kind of something very sci fi that to, you know, my practical brain of, like, well, you know, let's just focus on getting the tax base right, getting, institutions funded, doing the right kind of fiscal policy, which is the world I'm born in and really working for the most part. Right? To going, there might actually be something really interesting here and a real alternative to traditional kind of division between, you know, regulation through antitrust and then regulation through, you know, individual some kind of market based individual, payments team, then there might be some third option, which is this utilization option that I think we all came on to. Right? To that re is more about reconfiguring the way this market works more so than just accepting it and trying to toot around the edges, really, which I think those two radical other approaches are doing. Right?

Speaker 1 9:20 – 9:51

Yeah. So you said something very interesting there, like, about the distinction between looking at it as evaluation problem or looking at it as a sort of institutional control problem. Can you say Yeah. Can you say that a little? Sure. I mean, like, most simply. Right? And and I think maybe, actually, just just to, you know, to make this more accessible to, listeners who might be just jumping right in, can you give an example that doesn't have to do with data? Yeah. Absolutely.

Speaker 3 9:52 – 11:52

Sure. So in ecom one zero one, right, you come in and you learn marginal price. Right? That's the kind of dominant neoclassical model as well. And the marginal pricing model is just, you know, the, what price is the average marginal cost of making the next item. So, like, let's think about oranges. Right? The price of an orange is the marginal cost of picking and marketing another orange. Right? And that's why you have the price code. Right? Now that's even that's not true. No. That's not true for real world. And what, you know, conventional textbooks will do is they'll try to introduce frictions and try to kind of say this is the ideal state, and in the real world, there are frictions. Right? But it's still the theory. If you're an institutionalist, especially an old institutionalist, you come at it a little differently. Right? You come to think about, okay, what are the pricing powers that come from the a priori distribution, the distributional factors, and pricing factors that come in to create this market rather than thinking about what a perfect market is in the vacuum. Mhmm. And then price becomes less of a function. And in fact, if you go down there, there are some institutions that think markets don't set prices at all. There are people who think that most prices are active in the capitalist economy or administered. Like, some don't say there are some studies that suggest 65% of, prices aren't even determined by markets, but concentration of firms. Right? Mhmm. A firm concentration. That's a whole other debate that I don't think we should have time to get into. Right? But I think there's something to that. So the problem isn't to, you know, create the perfect market for data. Right? The problem is to regulate these institutions properly. Right. So that there you can get a stake backward back from the use of what we, I think, all around here consider a common resource.

Speaker 1 11:53 – 12:19

So would it be so just to, to try to to try to bring that into simpler terms, it's basically I it correct me if I'm wrong. What you're saying is that if we just look at assets in terms of what their sort of correct evaluation is, we missed this other question of what the sort of institutional background is that creates the situation in which the prices are determined.

Speaker 3 12:20 – 13:26

Exactly. Right. So coming back to the example of the orange, it's not the orange in isolation of its marginal value. It's, for example, the fact that there's a monopsony, right, within grocery stores in the area that sets the value. Right? The really, really classic kind of pop example is the famous Pete Buttigieg interview, right, where it wasn't the market that was creating the pricing decisions on the Canadian tuna bread. Right? It was McKinsey trying to figure out how much first power, you know, how how to set the right price to play I get through the price target and, like, in each sector. Right? Right. The the really famous, meme of that journal of the journalist going, you work for a company that manipulated bread prices. I actually don't think that's exactly right. You just, quote, unquote, work for a company. No one's manipulating prices here because there is no objective really price. Right? And but that opens up a whole different vista of, thinking about how to do policy. Right? So instead of improving market imperfections,

Speaker 1 13:27 – 14:56

which is the traditional way you think about it, you have to think about creating the market a priority. Right. So one of the I mean, one of, speaking personally, like, one of the things that, led me to to thinking this way about all different kinds of markets actually is simply the problem of land. Mhmm. So thinking about thinking about how land gets its value, you know, through through sort of college and my conventional economics education, I really just sort of never asked myself these kinds of questions. I just thought about, you know, there's there are assets, and then they have value. But I never really thought about, you know, what's behind the asset. Why is the asset an asset? And land, I think, is a is a great example. I mean, maybe it's not the best example because it already kind of asks people to question some basic assumptions that they're probably making in order to see that that example, but I still think it's, it's a great, it's a little bit more tangible than data. So so it helps, it it does help make the point, I think. The point being that when we think about the value of land, and if we ask ourselves how can we sort of, you know, price it more accurately, for example, we're sort of wandering away this entire area of inquiry of why it is that land is being bought and sold.

Speaker 3 14:56 – 16:06

Yeah. Exactly. Right. I mean, that's the you there there's a historical kind of parable we teach in introductory, you know, history, which is actually a very contested one, and I've actually stopped teaching it, to be honest, because I'm not sure how true it is anymore. But enclosure. Right? The classical story of enclosure being the thing that creates capitalism. Right? And that you had these medieval economies in which land ownership wasn't really wasn't the same kind of rights that it used to be that anyone in the community could graze on the land because it was considered a common property. Right? And then, you know, event you get a legal regime change and land becomes literally a commodity and thus owned by the lord. So, like, essentially, it becomes enclosed with their rents that you paid to raise your sheep on, and thus, you have migration into cities. Right? Because people are displaced. That that that little change now kind of contested as to when, how, why that happened, but that's was always the classical kind of primitive accumulation story you got right about the rise of capitalism. That's, like, exactly

Speaker 1 16:07 – 16:59

and it's a great kind of but even if it's not historically true, I still think it's a great story though. Right? Right. Right. And there's also you know, the this connects to the idea about the whole the rise of cities and, you know, how how did the, how did urban land become valuable? You know? Yeah. The idea being that the reason urban land is so valuable is that there's this whole network of activity going on around it, which, you know, which means that that holding title to a piece of urban land gives you access to this whole network that the owner of the land didn't create. Mhmm. And, so that the sort of the institutional shape of the thing that we're buying and selling, you know, might not match with the actual sort of value creation process, behind it.

Speaker 3 16:59 – 17:04

Yep. Exactly. Right? That's that wrench famous wrench here, right, we always talk about.

Speaker 1 17:04 – 17:32

Yeah. So this connects with data. How does this connect with, with data? Like, how is you know, there because there are obviously, analogies and disanalogies, galore here. But, like, how, and this is for, you know, both of you. How how does this story that we're telling connect to the way we should understand data and the digital economy?

Speaker 3 17:33 – 19:43

I think the kinda key breakthrough and this is something I think, Nick, you really explained to me more. You and Hanlen really explained to me that I know that I think I understand intuitively, but never really understood the implications of it is, you know, the little bits of information we're generating are, like, data stream and our activities. Right? It's not that valuable. Right? Because it it makes sense. Right? Like, how much predictive value does one subject's preferences really have? Right? Not that much and, apparently, right, in machine learning. Yeah. A whole lot of subjects Together. You can compare. Right? That that's a lot of that that seems to be a lot more valuable. Right? So it's not really like our personal data stream that's valuable. It's it's really the fact that you're mapping social interactions that aren't considered, quote, unquote, in the economy, much like land wasn't at one point considered in, quote, unquote, the economy, right, of, like, commodity production or commodities. That's the thing that's new. Right? And it's not so it's it's very hard, practically impossible to say, you know, how much any of our data is worth relative to one another. But then it becomes a lot easier to think about in aggregation. So it's like a it becomes like a way of saying, okay. You are taking snapshots of society and generating value to it, which is a really interesting way of thinking through, okay, now what is society owed back and how do you regulate the terms of access to that information? Right? Right. Especially because if you're scaling, there's more seem to be more and more returns to scale, which is reminds you a lot of something like a natural monopoly, like, you know, an electrical company. And we we have we know how to do how to deal with that. We have regulation. We have come up. We have ways of regulating that, taxing that, but making setting prices on that that are pretty well known and have been encountered over and over again as we kinda deal with these kind of things. And then you can begin to kinda see a way of solving the problems. So there's a,

Speaker 1 19:44 – 21:21

I think there's kind of a technical way of describing what you just described, which is that, you know, data is this has this kind of increasing returns to scale characteristic. So that, you know, if you and and this I know this gets complicated, but, you know, as a general matter, the way that you just described it, you know, data is more viable in combination than individually. So there's sort of a when we combine many people's data, we get something that's worth more than the sum of its parts. Right? Yeah. It's it's, you know, a natural monopoly. Yeah. Networking. Right? And so what the the way that I the way that I kind of like to think about this in less, quantitative terms, I guess, is it's like the difference between a whole and a part. So so that if you have for if you were to take a painting and, tear it into a 100 pieces and sell the pieces, each of those pieces would, would be worth less than a hundredth, the value of the painting because you've destroyed the the the whole Mhmm. By by cutting it up into parts. Right? And so in in a way, the the way that I think about this kind of increasing returns to scale stuff that we often frame in in quantitative or economic terms is that that's what's going on. You know? I mean, when we have a lot of data, contributed from a lot of different sources, it adds up to something more valuable than it, you know, greater than the sum of its parts. It's like that. Is that a,

Speaker 3 21:21 – 21:25

is that a fair way to think about it? Yeah. Yeah. I think so. I think that's a really great analogy.

Speaker 2 21:26 – 22:51

Yeah. I would wanna add. So one complexity that actually is kind of important though is that, a really common characteristic, if you look at a particular technology, a technology that does a specific task. So I want to have a system that tells me that detects cars. It tells if a picture it tells me if a picture has a car in it or not. Or I want a system that will recommend for me the the top five movies that I haven't seen yet. What are the next five movies I should watch? Right? So I if I define a particular task, almost every single machine learning system will display this diminishing returns curve where you start with really low bad performance. And as you collect a little bit of data, you jump up, and then you cap off at the top. It's almost by definition because you can't go above a 100% accuracy or 0% error. And this is pretty important because this, like, diminishing your turns curve is why, when a company has a bill has data from a billion users, a single an individual person, choosing to include their data or not include their data in that, dataset makes almost no difference. And in fact, there's a random chance that deleting your data could even make the system better if you're like a little bit, you know, you have odd taste compared to the average population. So it is true that, like, data is only useful in aggregate, but also there's this diminishing returns when you look at a particular task. That gets more complicated if you define, like, a pipeline of tasks or a suite of tasks, and you can, you can get really complicated. And that that's probably not useful to do, especially over audio. But, yeah, I just wanna add that in. That's a really interesting

Speaker 3 22:52 – 24:09

Exactly. Right. That's a really interesting point about the, like, science and mechanics of data science versus the economics of data science that I understand it. Right? Because, you know, you want to encourage a better machine. Right? Like, a better predictive algorithm to do x, y, and z, you know, unless it has other downside effects. That's a whole different conversation, like, you know, irrelevant to kill us all versus one to cure cancer. Right? But, you know, you that's where you want the value added to come from. Right? That's innovation. You don't want it to come from the fact that you're sitting on the nest egg, that you're sitting on all of this information that just keep getting updated. Right? That's the network effect. It's not in the, like, algorithm. Right? Or it's not in the process you you design. It's in the source for the process. And the nice the and the great thing about data, right, from the point of view of a platform, right, is that if it's used, it's not like oil or something where it's burned up. You can just use it again over and over and over. Yeah. Alright. That, like, that that's, you know, that's a great business to be in in some, some way, but it's not necessarily one that fosters a lot of innovation See. Or the gains.

Speaker 1 24:09 – 26:18

I mean, I I think that, Nick, you know, you you emphasize you emphasize this idea that that that diminishing returns dynamic happens when you apply data to a particular task. Yep. Mhmm. And I think that that's the key. That's the key. That, when you when we talk about the value of data, just open ended, full stop, we're not talking about its value for a particular task. We're talking about its value for any task that we might apply it to. And and I think that this is this is where the conversation often gets confusing, and I'd I'd like to try to sort of draw out this this distinction a bit more clearly. Because the fact that data, has diminishing returns to scale when applied to particular task doesn't mean that it doesn't have increasing returns to scale economically overall because the number of tasks that we might apply it to is huge, and it can actually increase. So, for example, you know, if we if we compiled the data of lots and lots of people to try to train, machine learning algorithm or or or sorry. Like, a machine vision algorithm is what I'm gonna say. You know? Eventually, we will hit diminishing returns to scale in terms of improving that particular improving the machine vision. But if we continue accruing data for this purpose, we might learn we might discover eventually that, actually, now we there's something else we can do with it. Right? There might be we we might be able to sort of pivot our, business model away from machine vision into some other some other area of of inquiry or some other task that the data, enables that we may not have even foreseen when we started compiling data for for machine vision. And so this is where this is where it gets complicated, and I think where the the idea that data is an increasing return to scale process overall depends upon this idea that there is an indefinite number of of tasks that we might apply it to. Yeah.

Speaker 3 26:19 – 27:04

The analogy I used to kinda clear that up in my head is the difference between, like, an electrical, you know, steel press and the power grid. And that the steel press probably has diminishing returns, but we're not talking about the steel press here. We're talking about the power line that you have to plug it into. Right? Right. Does that makes and that usually doesn't have that much diminishing. Where I mean, there are some diminishing returns in electric, obviously, in electrical transmission, but not as much usually as, you know, like, the production of, like, I don't know, something like copper or something that's sitting next to me, but that's necessary input. Yeah. And one that has large scale effects.

Speaker 1 27:05 – 28:39

And I mean, another example I think I think an to to try to create a concrete example of how data can do this, of how, you know, how this happens in the context of data. Like, you might imagine, suppose I had a, a social network that gathered information about, about where people like to shop, in order to serve them ads about where they should shop. Right? So then I get I I get more and more data that helps me serve ads to them, you know, to try to to try to direct them to this shop or that shop, and it's diminishing returns thing. Eventually, I've got so much data from so many people that, my ability to improve this algorithm is is decreasing. At that point, I might discover that that information reveals something completely different about all the people. Right? It might reveal, you know, so now I know more than I thought I knew. Now I also know their daily routines and their you know, I can geolocate them. And so now that I know their daily routines and geolocate them, I can I can, not just serve them ads about where to shop, but I can serve them give them incentives to to do something completely different, or I can I can sell the data to their location to someone else? Yeah. Exactly. I mean, I I can sell location to you know? So in other words, you know, there there isn't, like, a clean distinction between the different purposes for which we might be collecting data, and this is and that's where the increasing returns dynamics emerge.

Speaker 2 28:41 – 31:31

Yeah. And I mean, it's so a couple other factors too that that play a role in this apparent I guess, yeah. What we have is kind of an apparent contradiction here, but it's actually not a contradiction at all. Yeah. Another factor is that the relationship between machine learning performance as it's measured by machine learning researchers and real world outcomes is not always one to one and is oftentimes unclear. So it's hard, you know, you could tell me, oh, I have my recommended I have this recommender system. It's going to recommend you where you should shop. And this version has 89% accuracy. Accuracy defined in terms of how often you send the recommendation and someone clicks it. That's insanely high, but, let's just use it for the example. And then that's, you know, system a has 89% accuracy and system b has 91% accuracy. It's could be really hard for me to figure out what that two how that 2% accuracy will actually translate in terms of long term customer satisfaction, immediate sales, immediate changes to the user base. Are people gonna tell there's like just a ton of an order effects, that go on here. So and a big one is that if there's two online free services that you access via your web browser, and they're both quote unquote free, and one has quote unquote 99% accuracy and one has 98, I'll just pick the 99% every time. So there's this winner take all effect, superstar, you know, the superstars have been talked about a lot in the context of the, the twenty first century, economy and especially like tech firms. And, I think like people like to use, this was in, the second machine at machine age, a really popular book by Eric Mendelsohn. The example of like Instagram starting with a really small number of employees, and then getting bought out by Facebook for an absurd amount of money, which was just kind of like an unprecedented, thing to happen. And yeah, I guess that's kind of an example of where now Instagram has this dominant position, and it's really hard to to beat them. So you can have these tasks, these machine learning tasks that have their diminishing returns in the context of single task. And that's mainly relevant to how an individual or a group of people can affect the performance of a particular task. And it's not necessarily, making any claims about the broader economic, diminishing or increasing returns. So I think that's kind of like one way to resolve the contradiction and why they're important. And in terms of, like, what it means for data dividend, the natural monopoly increasing returns suggests it kind of is a motivation for why there should even be, state intervention in the first place. Whereas the diminishing returns of data for a single task tell us about, for instance, how much an individual might expect to get for their their personal paycheck and why a personal paycheck would be really small and it's not necessarily the best use as opposed to funding public goods or something like that. Right. Yeah. Sorry. Just kind of a little bit of thinking out loud there in terms of how this is an apparent contradiction and how to resolve it.

Speaker 1 31:32 – 32:55

No. Yeah. I mean, I I think that that's that's exactly right. I mean and it picks up on that same theme of of parts and wholes. Yeah. Right? So, you know, if we look at, if we look at a dataset in terms of its ability to help us do a particular task, you know, in a sense, we're we're looking at, we're looking at a part rather than the whole. You know, the the reality because the reality is that, you know, a a data set that can help, serve more accurate ads for electronic sales might also be able to, help us categorize people politically or, or or socioeconomically or in terms of sexual preferences or, or any number of other things that someone might be willing to pay for. Right? Anyway, that's a good transition maybe to, to the data dividends work. So, I guess, starting with with Nick, would you like to tell tell us a little bit about the the data dividends project that we that we worked on together and, how we went about it and what recommendations, we make?

Speaker 2 32:55 – 34:39

Yeah. Definitely. And then I guess there's a I have two helpful correctors if I make any errors, available here. So the project, I guess, was started this began almost two years ago or even maybe even more than two years ago now. And, the idea back then was that it was in response in 2019 in the state of the state speech. The governor of California, had a line or two about a a data dividend, which kind of led to a lot of speculation. There was a bunch of op eds in the various California newspapers of what does this mean? What's the data dividend? And, basically, we kind of digitally congregated a bunch of folks from academia, from industry, working in, in policy and think tanks and whatnot, who are all interested in this question. And we wanted to kind of sit down together, virtually and, and try to figure that out. What would, what would a data dividend look like? What does it think like? So we kind of tasked ourselves as a ad hoc non political volunteer group to try to write a report answering what a data a data dividend for the state of California would look like. What how would do we think it would work best? Could we amass some some basically recommendations? And so since then, we've been working on this, report and maybe I'll let let you all expound on this. But the basic idea is to, where it's a little bit of a rebrand. So a lot of people when they hear data dividends, their immediate thought is, oh, I'm going to generate some data. I'm gonna tap my phone and click my laptop and I'll get a paycheck because I generated that data. And while we're not like entirely saying that's impossible, we're recommending something a little different, which is a data dependence tax funded, program for public goods. That's like a one sentence version. I don't know. Iago, do you wanna expand on that a little more?

Speaker 3 34:40 – 38:21

Yeah. It's that plus using the tax as a way of restructuring the incentives and institutions of the system. Right? Once you, you know, ego, like taxation isn't just about collecting revenues. It's about shaping that even sentence, but asserting the power of the state over a economic sphere, right, going back to Schumpeter. So the other thing we kind of built around this kind of tax, which is a tax, not on as a way of individual remuneration, but as of a public goods building system is ways of getting individual participation in their collective participation. So one thing is using the tax to incentivize, you know, consumer collectives that can decide whether they wanna get remunerated for there is some parts of their data as they kind of lock of consumers, whether they can what kind of privacy and condition they can attach the use of their data. Right? The tech in a vacuum, you know, there's no reason for, like, like, say Facebook to agree to that or want to work with that. There is a tax attached to it that, you know, gets discounted per se per user that accesses through one of these cooperatives. Right? Being that they are well regulated and that's thought a lot about this. Like, you're the guy who's been thought about the regulation part. Right? Then Facebook might have an incentive to actually play ball and become a public, the kind of embedded corporation. The other aspect that we were really interested in, again, comes back this idea of thinking of these as utilities is to have what we call the data relations board, which is something of an analog to, to the California utilities board, which is actually the first utilities board in the country, not the world actually, right, to regulate to, the uses of data, but not only regulate them, but to make more public public, kind of make them a public resource. So we think that there should be more and more public ownership of data, public data trusts in which if there are publicly important datasets that are anonymous enough and that are secure enough, we think those should be administered by the public in the name of the public. And the tax helps get us there by one funding these things, right and allowing us to collect that information, you know, the public but to also encouraging firms to put their that kind of information into public domain through a tax incentive, right, to get away lower their tax. And that's really important because that really that is the utility part, right, of it all. Right? It's the idea that because this is has the scale effects we've been talking about, this is the kind of thing that publics need to control because there's just there's no way for this to be a quote unquote, competitive market. Right? Yeah. We've had a long tradition, especially in America, dealing with natural monopolies by making them into utilities rather than just breaking it up because sometimes that's not feasible. Right? There are very good there are places in which breaking up large firms works. Right? Like Facebook, Instagram thing, the classic example, but there are maybe some that it doesn't make sense. Right? Because there's too much cross integration in production processes to to make a break up useful rather than a kind of backdoor quasi nationalization essentially of the monopoly part.

Speaker 1 38:22 – 38:58

Yeah. So the the fundamental idea, if I'm, glossing this correctly, is that there's this kind of spectrum between things that are are sort of more suited for more private like control and other things that are more suited for more public like control. Mhmm. And so there's and there's something that's happening in this sort of massive aggregation of data that pushes these big piles of aggregated data along that spectrum. So they start to look more like things that should be have a public like control.

Speaker 3 38:59 – 39:44

Yes. That's a great that is a great way to put it. And not only is the tax revenue raising, which is good, you know, states especially because they don't have a printing press need a lot of revenue. Right? Especially California because we have proposition 13, which lets us not tax lands, which is makes life very difficult and it's very regressive. But, that's one part, but it's also the it's almost like, you know, putting a flag down that this isn't the wild west, that value is somehow can't come out of nowhere, that private actors can't be the only ones who set value here. The public can't. And that's actually a deeper, more important

Speaker 1 39:45 – 40:57

function of taxation and revenues. And I think this, again, goes back to that sort of part and whole thing. In other words, you know, so if we sort of track onto the metaphor a little bit further, the the sorts of things that are more suited to more private like control tend to be things that are more divisible. Right? Assets, they can be sort of, divided into million pieces and managed independently without, harm to the value of the utility of the whole. Right? Whereas the sorts of massive datasets that are driving, the big tech platforms and much of the, you know, cutting edge AI and modern economy in general tend to be things that, it can't be, you know, the where these these data sets can't just be sort of fragmented into a million pieces without, destroying their potential for good or for ill, which means that there should be a stronger measure of of public control and sort of shared participation in the value that they create. Right?

Speaker 2 40:58 – 41:50

Yeah. I'd agree with that. I'd also add just a, I I think that maybe my own fault, I've, I've been talking about, like, machine learning and AI specifically a lot, but this really applies to it's not just tech companies that are, that are benefiting from this, socially produced data. It's basically, literally any large firm has they probably have a team of data scientists, but they have probably have some position in the firm, maybe business analysts, etcetera, that is benefiting from data. Of course, the the tech companies and and people who are actually doing the cutting edge AI research might be benefiting even more, but, it's not just, only tech companies in the AI we we're talking about. And I'm saying that because it's my own fault. I I tend to overly fixate on like kind of viewing these things through a lens of machine learning loss functions and using research methods and framing everything in terms of counterfactual questions about machine learning models.

Speaker 3 41:50 – 43:28

But that's actually a bit too limited. So I wanna walk back on my own limitedness. Really that's a really important caveat along thinking about the text. Right? Because Yep. I think we've, you know, done a good job on, and particularly your colleague, Hanlin, I think has been built in. Also, our colleagues, Peng Lin and Luisa, who are in different time zones. So she can't join us right now, have really done, like, the work showing is we can design a data dependence tax that taxes, you know, a tech firm at a much higher rate than, like, a or get a a company that, you know, makes toilet paper. I don't know why that was my first topic because that's a deficit item these days, And, uses that data to try to predict where that like, where to send their supplies. Right? Because they're not really dependent on that data. They're not storing that as much of a vast amount of it. That means you can tier it. Right? To and as you know, the experiments run by our colleagues of Joan, those tiers are actually pretty good at saying that, you know, Downey isn't gonna fall under a data debate with Stacks to any large extent, but Facebook whose entire business model is based on that is going to fall under that, which I think is a really again, like, really gets this distinction that you made about the collective value of data and its reusability and its kind of constant malleability to create more and more value even though individual processes get diminishing returns. And, you know, companies that still use data, who who doesn't, right, but aren't really dependent on it with their business model.

Speaker 1 43:29 – 43:47

Yeah. So can you can you, could you summarize that a little bit more, Yacob, in terms of, you know, this so this data dependence tax idea is is close to the core of the of the proposal of the data dividends idea. And, can you say a little bit more about what that what that means?

Speaker 3 43:48 – 45:04

So, yeah, what we're looking at is actually they're really two separate taxes in the in our recommendation. Right? The first one is extremely straightforward, which is just a transaction tax on data broker companies. Right? Companies that sell data to other company, to other companies or buy data from companies. Right? That's just a straightforward sales tax essentially. Right? The more complex one is the one on data storage, right, in the storage of individual identifiable users. So we're we're not we we've actually kind of played with the definition of that. And here the actually, this California Data Protection Act, the CCPA is actually pretty good at what we found, like, getting that is it's not just registered users, but their larger network to the point where a user can be identified as an individual. That's the thing we're taxing. Right? But we're not taxing it on a one to one basis. We are making a cutout for company size and revenue. Right? And we're also making cutouts for the amount. We're making, like, a marginal tier. Like, we're we're we're taxing this more on the marginal user, so there's, like, a nice little tier kind of going through right on

Speaker 1 45:04 – 45:19

how many users you get. So again So, basically, the tax tries to, it it tries to create a higher tax for very big companies Mhmm. Whose business model depends upon the aggregation

Speaker 3 45:20 – 45:42

of data pertaining to many, many people. Yes. Exactly. Right? Rather than a smaller company or a large company that's only incidentally using data to optimize one or two processes that it's doing. Right? Which is we want to encourage that really not, you know, tax that. Right. And actually, I think there's another interesting possible misunderstanding

Speaker 1 45:43 – 47:16

about the sort of thrust or intention of the of the dividends idea here, which is, which relates to what we were just talking about in the sense of public control. So when you hear about a tax like this, that taxes a certain kind of business model more than another, you might, you know, if you kind of have taken if you're sort of versed in econ one zero one or something, you might think of that as like a Pigovian tax. In other words, a tax meant to discourage a kind of thing that we don't want people to be doing, like, like a cigarette tax or something like that. But I think that what I think that what we're aiming at is actually a little bit more, a little deeper, a little more subtle than than a sin tax like that. We're not saying that we, you know, the idea behind a data dependence tax isn't that it's bad to create big, big dependent businesses. It's just that those types of businesses more resemble, things that the public ought to have a stake in, and that the public ought to have, a measure of control over. So in other words, it's not about taxing data dependent businesses because we just want everyone to use pen and paper and not get the benefits of, of modern technology. It's that when you know, it's that these these kind of business models just demand a a greater sort of public, stake.

Speaker 3 47:17 – 48:06

And I love your phrasing on that, by the way, because the control in the state court are just not used enough now in economic policy. Right? They used to be public control, not necessarily public ownership, but this idea that the public should have some control and some private activities is actually founding part of American economic policy going back to the founding generation. So Alexander Hamilton's writing and even Thomas Jefferson's writing, sir, right? You usually think as opposed to things. And it really does run through all the way really up through the sixties. That public control over certain key elements that are really embedded within society is considered a goal of policy.

Speaker 1 48:08 – 48:50

And we don't really talk about it that way anymore, but I think that's an important thing. So, Yaakov, you probably know more about this than I do, but, Keynes Mhmm. Was interested in, in an older idea about ethics, you know, stemming from, like, GE Moore, these kind of, like, turnips Yeah. Emphasis about, you know, which goes under the name organic unities. So there's a sort of idea of organicism. Then we can do more about this. Yeah. So heard of it too. But I I I know the exact person that you're interested in. I know you've been, like, reading up on this. But Yeah. I'd love to. It's it's actually it's just something that I'm sort of starting to explore as well to sort of figure out what,

Speaker 3 48:51 – 49:29

what Keynes was interested in here and how it relates to what we're talking about. That's that's definitely, like, Zach Carter's book, new book, and even more so my friend Alex Williams who was, like, just doing I'm I'm gonna plug up, like, a deep chapter a month dive into the general theory of, like, reading it as a philosophical text. But, but, yeah, it it is a very different way of doing economics than what we're used to. And there's this is an area that I do work on. Right? Like, the history of economic thought. Economics wasn't always about, like, solving an optimization problem.

Speaker 1 49:29 – 50:12

Right. And and just just to set it up for the listeners who have no idea what I'm talking about, the the idea of organic unity has to do with the idea of increasing returns and the idea that, that parts can be greater than their wholes. So in other words, there can be sort of institutional and social structures in society that, add up to more than the sum of their parts, and that therefore need, you know, ought to be cultivated and so on. You know, Keynes didn't actually write very much about this, but he was heavily influenced by this way of of thinking. And I think it's, it's a really useful lens through which to understand his

Speaker 3 50:13 – 52:20

his his appreciation of the of the importance of the role of government in society. And and I know if you really wanna push on the sideline, actually, if you look through what Keynesian theory is, which is about the essentially wise, The entire Keynesian project in the general theory is disproving, Sayes' Law, which is that supply makes its own demand through rational decisions. In a variety of more complex ways, it reveals more and more about, about how a modern economy works. It's really just the it it really is just like if you actually read the book, it's extremely chatty book. He's just teaming off against this one assumption in far and more complex and beautiful ways. But what he's trying to get at is that outcomes that are individually rational are collectively irrational and imperfect. Right. And that's and that's why we can, you know, in the context that he's writing in the Great Depression, the argument is this could all be solved by wages falling. And he's showing, like, no. Actually, it won't because wage that's individually rational, but as a collective unity that actually makes the situation even worse because who's going to buy the output then? Mhmm. And then why would you hire people? I mean, that's like a very vulgar Keynesian as a summary of the book, but that's what he's trying to do. And he makes that point in far more complicated ways, introducing financial instruments, introducing a philosophy of uncertainty, etcetera, etcetera, right, to explain why that happens. But yeah. No. Absolutely. I mean, I do think Keynes is not a, quote, unquote, modern economist. He's closer to the institutionist in some ways. And that's why he's great to read compared to, you know, like textbook or something that you read these days. There's just not that kind of deep philosophical issue there with this one anymore. I wish there was. You know, most of you get Karl Popper. Some people reading Popper just deciding that that that's all correct. Right. And then and then, you know,

Speaker 1 52:21 – 52:23

Keynes' disciple, John Kenneth Galbraith,

Speaker 3 52:24 – 52:53

sort of carries forward some of that. Yeah. And then and then through that, you also slightly on the edge Keynes, and you get the Canes through the air fusions within the American coast Canes, the Minsky's, the, Weintraubs, etcetera, which are which are influenced again by the American institutionalists who we were talking about earlier, who are very interested in these structures and instabilities within the accumulation process.

Speaker 1 52:53 – 54:23

Right. So I think I mean, I think the theme here the theme here is basically that there's a, we've kind of stumbled into a place in our in our policy thinking, in our thinking about data, in our thinking about economics, I think about policy in general, where we kind of missed forest for the trees basically. Mhmm. Right? We we kind of we we're sort of we're trying to formalize things in terms of their particular application or in terms of a particular way of understanding their their value without seeing the way that they fit together into a into a broader whole. You know, at least my kind of, interpretation of what's what's happened over the past ten years or so with the data economy, it's just that it's, you know, these new technologies have connected us so much more intimately that, this the untenability of this has become very apparent. You know, it's it's it's becoming clearer that we need to look at things as organic holes, you know, as GDE Moore might have put it, instead of as discrete parks with their own separate valuations that can move independently of one another. Yeah. Nick, what are your thoughts? Maybe we can turn back to the question of different kinds of data, like non you know, date data that isn't collected from from people per se or Mhmm.

Speaker 2 54:24 – 60:05

Yeah. Okay. So I was, I got a little lost with the economic history there. I'm not gonna lie. But it was I was kind of thinking of a a bit of like a thought experiment that I think will be hopefully pretty quick and helpful. Which is that if you imagine a village in a village somewhere on earth in the pre internet era, And there's a a store in the middle of the village. And, there's a really there's a guy with a really good memory who likes to sit in the store and memorize what everyone, buys. And then he tries to he goes to people's houses and offers them special deals on the stuff that he remembers that they like to buy. So I like to buy milk. So he comes to my house and tries to give me a deal on milk, and that's how he makes money. Basically, I'm trying to, you know, imagine a a human recommender system, who is basically making money off of data from all the people in the town, just watching them. And there's no technology here. There's there's literally no Internet, no electricity here. Philosophus, the the underlying arguments that we use to justify this data dividends proposal could be used to, justify taxing this one guy or basically saying, hey, buddy, you're making money off of the preferences and the the lives and activities of everyone in this town. You should, you should pay it into a public good somehow. And in a sense that's done through taxes. So the point that the reason I think that's useful about experiment is because the same thing is happening now. It's just super, super, super charged by tech companies and large firms more generally who are able to collect data because one person can only memorize so many preferences. And the, the scale that we're doing it on now is just completely insane. And the reason that story is, that story is also useful is because when we first talk about data or when I say your data is collect is, is fueling AI systems, which is something I say a lot. I start a lot of like my talks with it, and it's a kind of a point I return to a lot in my research. It's easy to think. I I might you might say, okay, give me an example. And I would say, well, did you know that Wikipedia is used as training data in tons of natural language processing research and, graph research, for instance, because of the the knowledge graph that underlies Wikipedia. It's also it appears all the time in search engines. Like, if you search for anything with a Wikipedia page, you'll almost always get the Wikipedia page as one of the top results. So if you're a Wikipedia editor, that means that you helped fuel AI. You helped fuel dozens of you helped fuel hundreds of, AI research papers, for instance. And you might say, okay, that's that's cool. I I see that. I can see how sitting down and writing Wikipedia article is like doing a job. I can see how I'm kind of a a laborer in a sense who worked for the tech company to help produce, this awesome AI system. But I'm not a Wikipedia editor. So maybe I'm not included in this. And I guess the the point is that now we've come because we've commodified literally almost every interaction from the moment you wake up until you go to sleep. If depending on how how networked, how you know, digitally connected you are, that that is not the only kind of data viewing AI systems. And in fact, well, it's maybe easier to study in some cases. Wikipedia is really great to study because it's all open. But just the fact that you like milk is going to help tech the the equivalent, the digital supercharged order of magnitude's bigger version of the guy memorizing everyone's favorite stuff at the store, is going to help these tech companies sell more milk to other people like you just by going through your daily activities. So the public collectively has basically become these, these data laborers, and really anything that can be, recorded into a CSV file or some other, whatever database file format, company is using, can and will be used to train models to make profits. And if we don't have some mechanism to return that or to help fund public goods or or make that, make those benefits shared more widely, what's gonna happen is just the concentration of power. And this is kind of the, there's a dystopian people kind of have this to talk about this dystopian vision of the, you know, the streets are empty and literally no one has a job and robots are doing absolutely everything. And there's a lot of science fiction about this. So I won't, you know, describe that right now, but there's also kind of a shorter term, you know, in the next in the next five years or so that, economic inequality just continues to to to grow. And there's tons of negative effects of that. And so that's kind of the motivation for why we have to do something. And just to return that back. So I think that the corner store or the the village store milk story thought experiment is a nice example of why it's not just about data that you produce. It's the fact that you went to the hospital and got a brain scan. Whether you like it or not, you might be in a study now. Medicine's actually a little interesting because they're a little bit better. They well, in some senses, there's been a lot of big ethical debates in the field, and there's been a lot of really bad, like, ethical malpractice in medicine. So it's discussed more prominently. And now I I think hospitals are let me put it this way. You're more likely to get asked for consent to appear in a medical study than you are to appear in a tech company's recommender system. I think I can stand by that. But nonetheless, basically, just the fact that you went to the hospital, you're gonna help fueling and I you're gonna you're gonna end up help fueling some AI technology somewhere, and that's why you should be seeing some of the gains. And the way to do it in the short term is probably to,

Speaker 1 60:07 – 61:25

contribute to public goods. Yeah. I mean and the interesting thing also is is whether consent is is enough. You know, because the the main difference seems to me or one of the main differences between the shopkeeper, recommender, and and, like, a modern system is the word take all dynamics of it. Yeah. So in a in a in a shop along you know, in a village meant a long time ago, you could have a bunch of different shopkeepers who are recommending things to people, and some might be a little bit better than the others. But that if one was a little bit better than the next, then that wouldn't put the second one out of business. You know, everybody wouldn't flock to the first one, right? So so what that suggests to me is that the gains from the systems need to be distributed totally independently of whether there is consent or not. So let me give you example of what I mean. Like, if you take or instead of giving an example, let me ask a question. Like, what do you think about, GPT three or something like that? OpenAI, for example. If we have AI systems that are extremely data dependent, that are open to be used by anyone in the public, do you think that's less of a problem, or do you think that solves the problem?

Speaker 2 61:26 – 63:35

Okay. Yeah. That's a really great question. So also I I do wanna comment on GPT three a little bit because that's a really interesting one. So I actually wrote a blog post about that paper because about the the system because it got so much prominence and it is literally trained on Wikipedia and Reddit. There's some other stuff in there too. So there's the, the common crawl, which is basically like the internet, all of the internet you can scrape with the bot. And there's a big data set of books, so published authors, but there's a really good chance if you're listening that you helped to train GPT three. And it's also a really interesting one because, there's some there's a pretty good case to be made that that's a system that might end up causing more harm than it does, help people. In terms of, like, replicating, harmful language that's found on the Internet because, of course, there's a lot of, toxicity and and hateful speech on on places like Reddit. And, it's it's hard to clean that out when you're training when you're basically training a model on all the Internet. Did one of the bots become a Nazi within like an hour or something like that? Yeah. So that was years ago. That was, Microsoft's tape. That's like a classic example in kind of like AI gone wrong. Cox, everyone loves that one. And that was a long time ago. And, yeah, it hasn't gone a lot better. So there's kinda, yeah, there's kind of a whole separate discussion to be had about, when a system is deemed societally harmful or not. But g p three is one that that might be harmful. Nonetheless, it's gonna it's gonna make companies money. Companies are gonna make lots of money off language models. I think that's no one will argue with that. And these language models are being trained by the text we all wrote. That's the way to if you wanna train a giant language model, you wanna see what are all the ways that people write. And you're basically telling your your machine to look at all the ways that people write and and write like that. And doing it with, a lot of parameters and a lot of money on compute and and just, you know, training, chugging on your weights. So that's like it's a perfect example. It's a it's a indisputable example of your data that you consciously contributed is gonna help tech companies make money, and it might harm people in the process. What do we do about it? And so okay. So now is the democratization the the answer.

Speaker 1 63:35 – 63:40

But I can just download, you know, it's it's open source. Right? I mean, I can just download it.

Speaker 2 63:41 – 65:24

No. No. No. Sorry. This is a huge this is not a huge controversy. It's it's become a little controversial that, the OpenAI actually did this thing with GPT two, the previous one, and had a big press release saying this is this model is too dangerous to release, because it's going to be the spammers are gonna use it starting tomorrow and start producing like the most, you know, authentic looking spam ever. And so we're not gonna release it. And people got on the Internet and, you know, said, open AI, not so open anymore, etcetera. And that's basically still true. So I I mean, I don't, I don't know if it actually matters because the, the truth of it is that these models are so big that the, the capital required to operate them is, is just enormous. So even if they made their, made everything open source, you wouldn't be able to we can't train our own version of GPD three on my laptop or even on my, you know, school server cluster. It just costs so much money. Right. So So that that's a there's a really interesting dynamic going on there too with the the data that they're using is actually primarily commons data and not proprietary data like, like search engines and recommender systems are using. But on the other hand, it's the and just enormous cost of of capital and, you know, getting the engineers to hook all the things, to duct tape everything together properly is is really tough. So basically, they can they could release an API. They can make a free API. That'd be really expensive where or it would be cost them a lot of money and basically let anyone log in and just kind of produce text, using prompts. And right now, I think their plan is they're partnering with Microsoft to kinda make that available as a paid service. So I think, that also maybe got some Internet laughs.

Speaker 1 65:25 – 65:44

The upshot though is essentially that even if this were open source or even if there was an API that anybody could access, it would still essentially concentrate power in in the hands of well capitalized parties who are best positioned to, to apply it to a problem. Yes.

Speaker 3 65:45 – 65:58

Nick, could I ask you a question then? What are the exact inputs that are so capital intensive? I is it just servers and manpower essentially, like, engineer time?

Speaker 2 65:59 – 66:46

Yeah. So it's engineering compute. So the thing here is basically, like, a really crude example. I actually I mean, I don't really remember the, about the architecture of GPT three off the top of my head. But basically, what it's doing is it's reading everything on the internet and then updating a bunch of model weights. I I mean, sometimes people will say like deep, deep learning is can be crudely described as just stacking linear regression on top of each other and just keep stacking it and stacking it, until you can't anymore. That that is a little crude and it kind of, maybe downplays the the degree of innovation that's gone on in terms of like clever optimization tricks and clever architecture trips, tricks, and different ways to hook different weights together that will give you better outputs.

Speaker 3 66:46 – 66:49

Everything can be everything can be a linear regression in this most raw.

Speaker 2 66:50 – 67:18

Yeah. But, basically, it's it's the cost of of electricity and and computer access. And just like the the models are so big. There's so many parameters that, even like hooking up the, like, computing infrastructure is challenging. I think there's there's there's tons of others on the paper, and there's, like, a gazillion more people in the acknowledgments because it's just such a, monumental task to even, you know, do the the nuts and bolts of this, basically.

Speaker 3 67:18 – 67:46

And, I mean, that's that's really interesting because that really sinks back this quote, unquote data economy right back into the real economy. Right? Yeah. Real physical stuff. Right? Because it's not like this stuff is now costless, like or little capital cost like the data. You really need stuff to operate it. Right? And that that brings us back to realm of stuff we really easily understand, like hurdle rates, CapEx, all that stuff that, like, you just do the day to day.

Speaker 2 67:47 – 68:26

Yeah. Sorry. I'm looking up. So the estimates, I think there's a estimate of around, this is from Lambda Labs, which is a GPU company. They offer GPU services, and they estimated it at $4,600,000 to train at one time. And, of course, a thing that critics have pointed out is that if when language changes, like, if a new term if a new social movement enters the vernacular or people, you know, new means come out, you have to train it again. So and there's there's kind of an interesting, side thing about the the notion of, like, data going stale and, how good your data how long your data remains good for.

Speaker 1 68:26 – 69:01

That also can increase the the capital requirements of these kind of things. And there's also a winner take all dynamic here too. Right? Because Yeah. For sure. Everyone had their own GPG three instance, and even if we could all, you know, let's say the price of training it goes way way down so that everybody can train their own GPG three, it'll still be the case that whoever has the best one, whoever has the, you know, the the best GPG three trained on the most recent data on the largest dataset, will, will be able to fool the other ones essentially.

Speaker 2 69:01 – 70:18

Yeah. Yeah. And yeah. Another aspect too is that to actually, like, make money off of this, you're gonna need the the scale. The basically, you'll need, all the resources that business needs to operate in the first place because you're I I mean, there's been a lot of things that are proposed to do with language models. So you can, you know, automate, what was previously thought to be, you know, deep work, like writing auto complete, report. People have done people have done some pretty cool stuff in terms of, like, getting it to write web pages and, like, do design work. And you can type in, like, make two red boxes next to each other, and it will create the HTML and CSS code, the web, like, the web code, for that. And so, basically, if I'm just an individual, if I've had if I had my own instance somehow and someone gave me, you know, I magically got $4,600,000, I wouldn't be able to do I don't know what I would actually do with the model even. Whereas if I was running a firm, there's probably a lot of things I could do with it. Right. And so this is this is kind of a whole side topic about, autumn like, the rule of automation in, in the concentration of power and then also the role of automation in, like, transforming the economy, which, I I won't I don't wanna make any claims there.

Speaker 1 70:19 – 71:03

Well, I'm curious I mean, one of the things that it all this draws out for me, and I'm I'm curious what you think about this, Nick, and is is that it seems that, the whole sort of logic and the the the logic of of open source and open data and so on just seems to be breaking down. It just seems to be reaching its limit as a strategy for genuinely, you know, distributing control over important systems and and giving, more people access to economic value. It just it just seems to be, reaching its apotheosis. Do you do you am I wrong about that?

Speaker 2 71:04 – 71:59

I mostly agree. I'm kind of thinking out loud here. I don't have, like, a a super strong take on this. Like an interesting observation is that machine learning research, people talk a lot about like reproducibility crisis in science. Machine learning research is actually some of the best. Like, people really do some crazy stuff with making their their code, actually runnable. And it's still it's still a big problem. And there's been some pretty shocking papers where, folks will go back and try to reproduce all of the top papers from, like, the last three years of of top conferences. And then they'll find out only 20% of them shared their code, and only 50% of the authors will respond when you ask for their code and stuff like that. But it's still really good. And and part of that is driven by the fact that the big tech companies have done a great job of of open sourcing some of the software. So the big machine learning libraries are TensorFlow and and PyTorch, and those are, supported by Google and Facebook and open source contributors.

Speaker 1 72:00 – 72:57

No. No. No. So don't I don't I don't wanna be misunderstood. I I don't don't get me wrong. I I, I I've been speaking the language of of open source and open data and so on for a for a long time, and I I absolutely agree that it has created, you know, enormous enormous value that would otherwise just be gummed up in the Bell Labs or whatever. But, but I think that when you look at these edge cases, when you look at these edge cases about, you know, like GPT-three, like, sort of, you know, cut the cutting edge sort of, data value extraction systems, you are starting to see, I think, where the logic of open breaks down and where a more sort of thoughtful and complex institution building approach to sharing value is is is going to be required. Does that make sense? Oh, yeah. Totally agree.

Speaker 2 72:57 – 74:19

I guess, so one thing I I I would add is that, yes, all the the open source is really good and impressive. But so, I guess, here's like a little, of a fun story that relates to this, which is that something I since I started thinking about this topic a lot, something I try to do in a variety of spaces that have a lot of people who are interested in data and machine learning stuff is ask people how often they actually use their own models in their own life. Like, okay. So that the cutting edge I can download I could in theory, download the absolute best, like, mathematical code for updating my model weights for some kind of model and apply it to making a recommender system for my own life where I recommend what, if I should get a dark roast coffee or light roast coffee each morning. Right. In theory, I could do that. The tools are all there. It would be really easy. There's these amazing read mes that would help me install the latest software and format my data correctly, and and I could do it. I guess that's kind of a silly task because that's more of a a single processing task than it's not, really like any sort of tough recommendation, or it's just a pattern matching task. But, I can do it. And basically, no one's doing this as far as I can tell. If anyone is listening and they do do this, I'd be really curious. Please let me know what you're doing. I'm dying to know. But that that relates to the fact that most of these systems are are only useful when,

Speaker 3 74:20 – 74:27

in collaboration. I'm sorry. What's that? I think that person might be slightly insane too, but also interesting to talk.

Speaker 2 74:28 – 75:19

Maybe. I mean, yeah. I am bound to know. But ultimately, it's it's only these systems are only are valuable because they're they're connecting us. Right? They're there is, like, maybe this very grand philosophical, beautiful interpretation of, like, even the most gross, ugly, unethical targeted ad system that's, like, you know, sending people tricking people into buying bad products and whatnot. The model, like the weights underlying that model are this tapestry representing collective human experience. And that's the thing that's useful. And me just trying to figure out what type of coffee I should have in the morning to optimize my my happiness, which is maybe, you know, it's still perhaps a silly task in the first place, is not useful.

Speaker 1 75:19 – 75:26

Right. And, again, that has to do that's about the combination of information between people. Right? Yeah. I mean sort of the social

Speaker 2 75:27 – 76:18

So So to connect that back, the open source open source movement is awesome. It is like an amazing example of collective action. Like, the actual production of code artifacts is collective action, and there's there's tons of amazing collective action literature on the topic. For instance, like, around Linux and, everything related to Linux. But there I think there is somewhat of a there's some in individualist, like, mindsets underlying in some cases, like the idea of of building your reputation and becoming a contributor and whatnot. And to use these technologies, you need institutions and you need ways to manage, collect like, you need to manage the fact that you're getting data from a bunch of people, which is tough. Why would people trust you know, if I wanna start a recommender system tomorrow, why would anyone trust me with their data about what coffee they like?

Speaker 1 76:19 – 77:06

And so that's why you need institutions, or that's one reason why. I I think that's a great way of thinking about it. That's a great it's almost like you can imagine it almost like a, like a like a garden that needs a gardener or something. Right? In in other words, you you can you know, by keeping things open, you can, create more activity, more sort of cross pollination and and and invention. But there is, you know, at at at some point, public concerns come into the picture that need to be addressed, through not necessarily through government, but through some kinds of, through some kind of institutional framework that doesn't quite

Speaker 2 77:07 – 78:03

that doesn't quite fit into the the rubric of of the software collaboration. I have a new metaphor. Can I can I workshop it? You just inspired it. So it's there is a when people talk about open source solving the problem or open data solving the problem, they're kinda sit suggesting everyone just kind of, let's let's say that the benefits of AI are are fetch our fruits and vegetables in this case. And the the powers that be are suggesting everyone should just have their own garden, just have a garden on your balcony. But the problem is that we all live in tiny urban, densely packed, environments. And the best we can do is grow one vegetable every two weeks, which is not enough to sustain us. Like the conditions in modernity have put us into this into the state. And instead, we should band together and have farms. Some there's something there. I'll workshop that a little more. I think there's a good metaphor. There's maybe a presentation slide there or something.

Speaker 3 78:04 – 80:04

I I think that's actually a great metaphor, and it only makes the need to control and to, like, really embed and understand this economy even more dire. But when you said that 4,700,000,000.0 when you go in for instance number, you know, that was jaw dropping because that means that embed that makes entrance more and more and more powerful and especially ones that have already accumulated the, you know, sitting on these stacks of data possibly. And that really cries out for some kind of scaling of a date, something like a data industrial policy. I think we've been playing with that word before. We never really define what that is. But, you know, if this is a huge driver of economic activity, there really does need to be some thought in that. This is actually my wheelhouse. This is like what I was trained around and worked on. Right? It's some kind of set of public investments into the into building an infrastructure. Right? Be in order because those are huge hurdle rates possibly. Those are massive. And Yeah. The goal of industrial policies to grab where and how those can be overcome. If, you know, there are many arguments for from national interests, right, to, like, equity to economic growth. This is this stuff isn't just what what this process working with you guys has kind of made me realize that this stuff really, really isn't cost. This needs we need to think about how not only the distributional problems on one hand, but the actual what are the uses and the solicitations of this and the other? Right? It's not just output, like Yeah. The input itself is really hard to think about, and I think we've only begun to think about that problem. Like, just realize that acknowledging it was already a huge step. Yeah. Totally agree.

Speaker 1 80:05 – 80:48

Yeah. I agree as well. And I think that we a sort of understanding is is is emerging that we need to make really big decisions about the about the direction of our society, you know, in this in this space, you know, in this in this area of of how we how we handle data and and what it's used for. And, the the idea of allowing these decisions to emerge through the logic of atomized individuals following economic incentives, just isn't going to work out.

Speaker 3 80:49 – 80:59

The incentive economic incentive themselves are in the eyes of the beholder. They're never they're never not rooted in the larger hole and love this kind of language use.

Speaker 1 81:00 – 81:18

So what else, what else should we talk about? What else is on your guys' minds in the in in connection with this conversation that we could explore for a few minutes? Still can't get over the fact that it costs $4,700,000 per instance of CapEx to get a result that was supposedly reprocess.

Speaker 2 81:19 – 82:12

Yeah. So just to Joe, may just clarify that. So that in theory, you only have to train it once per round for how long you you think it's good for. So once you're done, now you have your weights are all set, and you can input. You can give it a prompt. You can say, I'm going to go to the and it will, using all the text on the Internet ever, say the most likely thing is I'm going to go to the store and buy beer. No. Probably that's probably not it, but, and you can do that forever. Now you can do that infinitely many times until you decide that you're you're too out of date and you need to, try it again. I I had some notes here, but also I have a bunch of new thoughts from this conversation. Oh, one thing that I I guess would isn't maybe an interesting thing to talk about is, like, how how would we move towards, something like data dividends or or, data industrial policy or data co ops?

Speaker 1 82:13 – 82:16

Yeah. And we've talked about data leverage a little bit. Yeah.

Speaker 3 82:17 – 82:17

Okay.

Speaker 2 82:18 – 82:20

So I can just,

Speaker 1 82:20 – 83:22

yeah, yeah, give a little background there. Let let me let me set it up for this day. Just to For sure. Pushing this together, and then we'll and then that's that's perfect. So so we went through this, process as a working group putting together a data dividends proposal, with great colleagues including Hanlen Li, Brent Hecht, Luiza Scarcella, and Sharag La la, which kind of, represents an actionable policy that could start to move in this direction, move into the direction of of a of a a rational data industrial policy or a way of giving the public a stake in the, in the data economy. But there are there are lots of other interesting ideas out there about how to how to move towards that. Some of which, you've outlined in some great recent research, Nick. So I wonder if you could tell us a bit about, a bit about that. Yeah. Absolutely.

Speaker 2 83:23 – 87:14

And so it's really it's really super connected, which is great. So in, some of the recent work, we focused on this concept called, data leverage. And the the core idea is that a group of people well, it's really the the core idea is the same as data dividends, which is that the public and, society broadly are responsible for the success of AI because they play a role in producing the data that that trains it and fuels it. And also that applies to other data driven data driven technologies. And so because we're responsible for that, that means that there's a lot of ways that the public could basically change the performance. So if, myself and a group of my friends are mad at a tech company because of something they've done, we could basically just say we're gonna we're gonna stop using that platform, cut off our data contributions to them, and make their AI system worse. And so we've conceptualized this as a data strike because we're doing we were doing data labor before, and when you withhold your labor, that's a strike. So withholding your data labor is a data strike. And there's other versions too. So we could give companies, bad data, and that would be data poisoning. So that but I'd lie to to recommend or system. I go on YouTube and I queue up a music video I hate to try to trick YouTube into recommending that music video to other people like me out there in the world. And then a third version, a third data lever as we've conceptualized it is called conscious data contribution. And that's where I basically take my data and give it to a competitor and say, Hey, I really want you to be able to compete with the this company that I dislike their practices. Here's a bunch of data. Hopefully, this helps your AI systems get better. So there's all these ways that, because AI is reliant on data contributions from the public that we can basically change their outcomes. And this has always been true. So the it's not like this has changed. This is like a new thing that's happening in 2020. And in fact, in the past, social movements that involve protesting or boycotting, like Facebook, for instance, has been the target of some boycotts. These things would also have the effect of changing how their AI systems work and maybe lowering the accuracy of ad targeting, for instance. But it's something that we think in the future that, can be enhanced. And there's a really great opportunity for researchers in machine learning, researchers in the social sciences and in design and human computer interaction and policymakers to kind of have this coalition. So for instance, there's, great opportunities to build tools that make it easy to kind of join these things. Because the big advantage is that, like, striking is is quite hard to organize, but data strikes are actually quite easy to join. I can just basically start I could delete my YouTube account and start only using YouTube from a, Mozilla container browser, which tries to block, tries to do a bunch of stuff for anti tracking. I could very easily switch platforms. I can just, you know, use a new search engine, perhaps. It's really, this is a bit harder for social networks because there's a lot of reasons why you don't wanna leave a social network, but there's still, there's kind of in between options. Right? I can continue to message my family members on Facebook, but stop, clicking their ads if I used to put their ads for instance, or stop using their, friend recommendations. So there's a lot of in between. There's a lot of ways that people can start exerting data leverage. And if done collectively, it could be really effective. I, I wanted to bring this up. I feel a little bad about doing it, but I can't help myself. We've recently seen this, like one of the biggest examples of online collective action ever, I think, in terms of GameStop and Wall Street Bets on Reddit. The amount of traffic on Reddit is actually on that on, like, Wall Street Bets is is pretty wild. The rhetoric is all framed around collective action, which is really interesting. And yeah. And I don't know. It looks like you have you might have some thoughts on this, this, Jacob, but I do not know. I mean, this this is a really difficult ethical question because, like,

Speaker 3 87:14 – 87:27

the rhetoric is, I think, insane. Yeah. But for what is a classical boiler room scheme pump and dump and, like, as we speak, I'm seeing people posting that stock is collapsing and people are losing their entire life savings.

Speaker 2 87:28 – 87:53

Yeah. So I I'm definitely not not, trying to promote this or, you know, make any sort of recommendations, but it's a really it is an example of massive online collective action organized through, an online platform like Reddit. And this you could basically imagine something like this happening around, trying to bring down a new AI technology that people are really mad about or trying to support a new,

Speaker 3 87:54 – 88:15

AI business that people really like. That requires, like, there's a political there. Right? Because the whole thing is what a lot of people in the industry that I know who are let's just put it extremely left leaning are really worried that people are using very good critiques in the financial industry to get rich off of people who don't know better.

Speaker 2 88:15 – 88:16

Yeah. And,

Speaker 3 88:18 – 89:03

so the question is, you know, who the it comes back to institutions. Right? Why we need institutions that's continuous action can let really, really bad actors in to really hurt large amounts of people. And that's kind of what why you need some kind of, you know, like, union with or something like that. They can, like, get a call up. They can really guide this kind of action rather than and have some, like, fiduciary responsibility members rather than, you know, some guy screaming short, short, short to squeeze the hedge funds where they're actually probably there might actually be more hedge funds on the long side. You know? Yeah. Easy to squeeze the shorts and then get very, very rich.

Speaker 2 89:03 – 91:12

Yeah. And this actually also relates to the, the the dangers of data leverage. So data leverage, the the topic of people trying to fool machine learning systems has been studied for a long time under the lens of security, not under the lens of using this as a way of addressing economic inequality, but if let's make sure our machine learning systems don't get hacked basically by bad training data. And so this is gonna be a concern for data leverage too. But to bring it all back together, basically, so there's and there's other examples. So for instance, the book hashtag, activism describes a lot of, recent campaigns around, like, social justice topics that were pretty successful at spreading their message through decentralized action on Twitter without formal leaders Mhmm. And without having any basically, all the individuals a lot of the individuals participating did so at really low cost to themselves. And there there's some there's a lot of discourse in the in the literature about this that people have a variety of takes. But online collective action is possible. There's there's indisputable evidence that online collective action is possible and strong evidence that can be done around political topics. And so data leverage is one way that the public might advocate for something like a data dividend where maybe so we did we did try to set up our our proposal has a lot of stuff that is trying to not to make this not painful. So to give companies ways to get tax breaks, to make it so companies aren't disincentivized from getting new users, to make sure we're not stifling innovation. That's like one of our core principles that we we really stuck to from the beginning to avoid complexity, not create a bunch of loopholes. So there's a bunch of reasons why why companies we tried to not make this horrible on on companies, but obviously, there's still going to be, you know, some resistance and ultimately firms are, the the people in power at firms are gonna make decisions economically. And so data leverage is a way that the public could say, hey, unless you unless you go along with with something that gives me if you if you don't fund some public goods or give me some of my value back, I'm not gonna help you. I'm not gonna continue helping you make these insanely profitable AI technologies. So that's how it all connects together to try to tie them out there.

Speaker 1 91:12 – 95:23

Yeah. I so I think that, so another piece of background here is that Radical Exchange Foundation has also done some research on the sort of institutional framework for data cooperatives or data coalitions that would, that would create sort of a a legal regulated kind of intermediary that would, that would mediate the relationship between regular people and, you know, data dependent, businesses and other entities. And, you know, part of part of the reason behind that is along the lines of what Jacob was saying is is that, you know, it's it's it's possible to imagine these these kinds of intermediaries themselves becoming exploitative, so they would need to, you know so there's there's good reasons to think about how to how to impose the right kinds of duties on them and the right kinds of requirements of independence upon them and things like that. And there's, you know, there's interesting, conversations going on around that sort of thing in, in the European Union, which is taking steps to to create something like these kinds of intermediaries through something called the data governance act. The data governance act may or may not end up being, a great thing. It the devil's in the details, and I think time will tell. But, but pending that kind of pending the establishment of the of these of a sort of a a well regulated network of intermediaries that can represent people's data interests in in their lives. There is this question of what can we do to organize to, either push the world in this direction a little bit or start to rebalance the the the the power situation between ourselves and and and big platforms. And I think that that your work on on data leverage is is articulating ways of doing that, articulating ways that we could think about what a what a data coalition or a data cooperative might be able to do now, without some new, regulatory framework, formalizing its its power. So I think it's super interesting, basically. It's it's really interesting to think about the possibilities if people did, organize and, coordinate their behavior in this kind of, informal way, you know, so that they could you could get you could imagine large numbers of people providing really valuable data to, to companies that made certain promises about how they would use it Yeah. Or something. Or you could imagine people, you know, you you could imagine a cooperative like this kind of doing the doing the legal research and doing the hard work to figure out what, you know, where the legal lines are in terms of data poisoning and, you know, what can we what can we do, to sort of, confound platforms, in terms of the quality of the data that they're that they're receiving, in order to try to inspire them to, you know, agree to agree to use their data in a better way or or something like that. And, you know, as you say, the I the idea of withdrawing data, so the third sort of category of data leverage, which is like a data strike, That does seem to me to be harder to to organize, informally. I mean, it's it's notoriously hard to to get everyone to pull themselves out of a network that they're already embedded in, basically. Yeah. But, so that one actually I guess I'm a little less optimistic about sort of, data strikes happening, without some kind of coordinated, lesser, you know, more robust system of coordination. But data poisoning and and active data contribution do seem like really, promising levers that, that we should be experimenting with.

Speaker 2 95:23 – 98:24

Yeah. Yeah. So I'll I'll add some we have a paper out. There's a preprint on archive now. It's called data leverage, a framework for empowering the public and its relationship with techcom technology companies. That's right. And we talk about how all these three different level levers are strong and not strong in different circumstances and how they might be used. And the one thing that I guess I'll I'll pitch for the listeners is that there's, like, there's exciting opportunities for everyone here. So as just a individual interested in the space, you there's there's ways that you could try to start exerting data leverage now in the sense that you can choose which AI technologies you are contributing to or not contributing to. Depends depending on where you live. Your government may help you do that more or less. So the more kind of data protection regulation that you have, the the easier it will be for you to do that. But for machine learning researchers, there's a ton of really awesome opportunities to basically research on explanatory models. So like doing questions about how would the model perform in this alternate situation are are inherently about data leverage. So if I take a classic learning curve where I I train my model with 10% of the data, 20%, 30%, 40%, 50, I basically wanna see how my performance scales with with, the dataset size, the training data size. That's basically I'm each time I rerun it, I'm simulating a new data strike. If I know, you know, how my performance is at 70%, that tells me how effective it would be if 30% of the users got together in data data strike. It also tells me how effective it would be if 70% of the users got together and engaged in a conscious data contribution campaign. So there there's tons of this in the machine learning literature already, but basically, the more basically, just the more we know about models and I get I like I said, machine learning academics are awesome in terms of making really great, easy to, like, visualizations, for instance, and public resources on the research. The more we do that, the the easier it'll it will be to reason about how effective data leverage can be. And then for, like, designers and people on the web design side and on the tool side, there's just a ton of opportunities to build, browser extensions, for instance, that help, that kind of automate the process. Because obviously, no one wants to sit down and pick their and go through a list of 50 companies and decide which ones am I gonna contribute data today and which ones am I not going to. In the future, ideally, that would be probably done by an intermediary, a data co op. But even in the short term, there's potential for software and tools to help, take some of that cognitive load away. That's what software is great for. So and then also, yeah, there's just I'd love to talk to you about this sometime, Matt. The the legal side of this is is really interesting too, and I'm sure there's a bunch of, nuance there and, and yeah, tons of research to be done. So just, yeah, really exciting. I really, I really think I've been talking to my advisor, Brent Hecht, used this really nice, analogy that we think that data leverage is a giant tent and all the sides are open. We really think all the sides are open. Like, everyone has something to gain here and, you know, we're excited about that aspect of the work a lot.

Speaker 1 98:25 – 99:59

So if we do manage to steer a cruise ship in the right direction and, institute things like the like, data dividends policies or the sort of, data freedom act, data coalitions stuff, or we managed to organize people felicitously into into data leverage campaigns. If we manage to establish more kind of, shared control over this gigantic shared asset that is powering, the contemporary economy and contemporary life. What is it gonna look like ten years from now? In other words, like, you know, what kind of problems is it realistic to imagine we might solve? Are we gonna have a, healthier public discourse? Are we gonna be able to solve the sorts of speech problems that, that, you know, Twitter and Facebook have been dealing with recently? Are we gonna have a more, egalitarian distribution of wealth? What's the what's the endgame here? Where do we want to be in ten years? And what's the vision that we need to, communicate to a broader public to convince, you know, millions and millions of people that with everything else going on in the world, this is, this is a topic that we all need to be engaged in and we all need to be, pushing in the right direction on.

Speaker 3 100:00 – 101:48

I don't know about speech or issues like that. I think those are complex and beyond my remit. But I do think that we need to I think a vision I have at least of this isn't just more egalitarian distribution of, you know, money equivalent goods or income, but much more use of that, this technology for specific in for specific needs and much more rapid installation of it into your daily life. Like, I'm not a technical in this regard. I actually think that these kind of policies can accelerate, the use of AI for very specific problems that aren't necessary that aren't always going to be such high, you know, return on investment or return, problems and much more use of it to, you know, engage with public service and to give you for optimizing and improving public services. Right? If you get this kind of right as industrial policy, you're actually you're boosting innovation, and you're boosting, most importantly, the rate of installation. Right? So rather than it being concentrated in those firms that have the cap operational expenditure ability and capital CapEx to do it and the kind of network effectively. I wanna see lots and lots of firms compete again, like, using this stuff so that you and you as a consumer to be able to set the terms, right, of what you want these things to solve and us as a society to be able to set the terms of how we you know, what direction we're send setting the innovation into.

Speaker 2 101:52 – 103:17

Yeah. I I totally agree with that. So I I think that, the my most confident prediction is that it would innovation will increase because having all the data siloed in a small number of really big firms, seems to me less likely that new innovative ways to use to use data and to to create new technologies from, automation and and classifiers and recommendations and whatnot and, you know, a million more ways that, that data dependent technologies can be used would increase. I I do think that this would be one this is one tool in the tool belt of addressing concerns of of economic inequality specifically. And of course, yeah, that's a that's a whole there's, you know, tons of there's issues of global politics, there's issues of of of law and questions around, you know, what I don't I don't even think maybe actually, Jakub, you might be able to answer this. Do you know you might not answer this for me. Sorry. Do economists even agree on what like the the good level of of, like, income inequality is for instance? Oh, god. I mean, there are all kinds of, like, models of that. Okay. Yeah. So obviously, I I don't wanna speculate on that, but I certainly think we can kind of slow the brakes on the potential runaway train of of data driven technologies really exacerbating, income and wealth inequality in particular.

Speaker 3 103:17 – 103:40

I think we we shouldn't be afraid of that by the of innovation, by the way. I don't I think I I kinda wanna, like, clarify, like, the by direction of innovation, I mean, innovation that doesn't exacerbate quality, right, that creates good sustainable growth, not innovation that displaces people into terrible jobs, right? Like not like less Ubers, more, you know, smart production of solar cells.

Speaker 1 103:41 – 104:18

Right? Which, you know, I mean, innovation displaces people and puts people in a terrible jobs. You know, does it even deserve to be called innovation? Yeah. I mean, that's that's that's very that's a really good point. Right? Like, what is the innovation here? There's not much like actual or stuff here. Yeah. Or or exit it's like, you know, it it the the the term innovation will lose its positive connotation quite quickly if that's what it means. It is a better way to say what I'm trying to say. But the, you know, we need to think about the structure of our of our social, institutions and, like, our actual lives, when we think about, you know, what's next and how we can think well, you know, better.

Speaker 2 104:19 – 106:50

Yeah. I also just one other thing. I actually do imagine that things like data co ops exerting data leverage will help with things along the lines of content moderation. Mhmm. And I I say help a little carefully because, of course, these are ultimately political topics and what I think is a good outcome for so, you know, assume Twitter has some policy by which they're going to delete posts and and block posts and, you know, clean up the bad content. In fact, it's probably gonna involve data dependent technologies and the fact that we've all tweeted here, I think. And so our tweets will be we will be fueling that particular intelligent technology. So just so we'll a real example of data labor. And actually, so Twitter also announced this new bird watch program. It's kind of a community moderation program that's, drawn some criticism, but also a lot of excitement around the the potential for adopting kind of a more Wikipedia esque crowdsourcing model. Anyway, just to conclude my point here is that if we had data unions right now, when these issues come up, they would let people make their voices heard, basically. They would say, Twitter, my my data union has 100,000 people in it, and they agreed with our statement that we don't support, I don't know, banning political figures, or we do support banning political figures when they're inciting violence. And obviously, democracy itself is not or direct democracy has problems. I I think there's maybe some interesting interactions with quadratic voting actually for the Radical Exchange crowd. Ways to use, data leverage in a manner that tries to to take advantage of quadratic quadratic voting, in fact. But it'll certainly be more democratic than what we have now. Right now, there's nothing. Like, Twitter can basically do whatever they want. They might get some media attention and people might go on Twitter and type, I'm so mad at you at Jack. I lots of people do that. Sorry. That that is a thing that happens when Twitter doesn't adopt the policies they wanted. And people banding together and actually threatening that we're going to withhold the data that's making your AI systems better. If you don't change something is going to I think will be I think it's a better world than what we have right now. Even if the outcome isn't even if the immediate outcome isn't what I want, I think in aggregate over ten years, that's a better world. But I I guess that's a political philosophy in and of itself. But I guess that's answering the fact that content moderation is a question of politics. But I think what we're talking about here can have a better way of resolving these political disputes than what we have right now. Yeah. That's it.

Speaker 1 106:50 – 107:13

Yeah. I I'm with you. Democracy is messy and every people are even when democracy is working well, lots of people are unhappy. But it's, it is it's better than the alternative most of the time. Thank you so much, both of you, for, for taking the time to talk today. Thank you for having us.

Speaker 2 107:13 – 107:17

Yeah. This is a lot of fun. Thanks. Yeah. Right. And,

Speaker 1 107:18 – 107:23

yeah, if there's anything you'd wanna plug before we before we jump off, go ahead.

Speaker 2 107:24 – 107:49

Tons of amazing collaborators. There's there's too many. I can't plug them all because this is real this this work and all the work that I've done is, like, super collected, but I'll specifically plug my lab. So the people space and algorithms research group at Northwestern, my adviser Brent Hecht, and my co my cohort mate, my, you know, equally academically aged PhD student, Hanlin Lee, who's, in the same program as me. They've been super influential to all this work.

Speaker 3 107:49 – 108:34

So yeah. But big thanks to them. Yeah. And just to, like, you know, the three of us are here, but we've had many, many collaborators and just support so hard on this report. Right? We just couldn't make this thing because of timing, and I wish they weren't here. It's actually really unfortunate there aren't, like, I wanna Luisa, Hanlin, Sharag, Brent, we had inputs from all kinds of like, a list of people longer than my arm. Like, that we're gonna have to have a whole, like, separate page for it. Right? Yeah. This was really as I said, I came into this thing really skeptical and very, very unsure about the value of doing this work. Right? But this was a really great experience, and I think this is really a really important issue.

Speaker 2 108:34 – 109:15

One more plug if we can squeeze it in. Is that hopefully our report and our website and maybe this talk itself will help you to reconceptualize data dividends. A lot of people when I say I've been working on a project related to data dividends, turn their nose up and say, oh, you are gonna I've heard about this. You're gonna give people pity pittance paychecks and nothing will change. And I think it's really stupid. They didn't say, I think it's really stupid to me, but they said a version of that. And hopefully, what we've done here can, convince you that it is there is a there could be a really good implementation of what could be called data dividends.

Speaker 1 109:15 – 109:46

Yes. Totally agree. I think this is an incredibly important area that, that more people need to be tracking. If you want to sort of understand the political economic landscape that we're heading into and the the the major, you know, questions of the next decade or two. I think that this is the this is at least one of the of the most important areas of of of inquiry and I'm I'm really proud to work with you guys to put out this, this report. And,

Speaker 2 109:47 – 109:49

great to talk to you. Great talking. Sweet.

Speaker 0 109:55 – 110:42

Thank you to everyone supporting Radical Exchange Foundation. This production would not be possible without you. Go to radicalexchange.org and donate. Thank you so much to Nick and Yako for coming on to the show and also to the musicians and to the team, Jennifer Marrone and Leon Erickson for your great production work. Again, check out datadividends.org to find the latest version of the proposal that Nick Yaacov, the rest of the team, and I worked on, And have a great weekend. This is Radical Exchanges. This is a RadicalxChange production.

Yakov Feygin and Nick Vincent: On Data Dividends

Top Keywords

Transcript

Listen