{
  "metadata": {
    "transaction_key": null,
    "request_id": "metagov:stray-metagov-20221123",
    "sha256": null,
    "created": "2025-10-27T23:38:02.710126+00:00",
    "duration": null,
    "channels": 1,
    "models": [
      "metagov-manual"
    ],
    "model_info": {
      "metagov-manual": {
        "name": "metagov-manual",
        "version": "2025-10-01",
        "arch": "manual"
      }
    },
    "warnings": null,
    "summary_info": null
  },
  "results": {
    "channels": [],
    "utterances": [
      {
        "speaker": "Speaker 1",
        "start": 0.0,
        "end": 0.0,
        "transcript": "We do tend to thanks. We do tend to record these sessions, so, you will see a little pop up on your screen now. So welcome everybody to this week's MediGov seminar. We are gonna be hearing from Jonathan Stray, who is brilliant on a on a number of different topics, but has most recently been focusing on the ways that algorithmic recommender systems shape our lives. And in particular, thinking about how we interact with content across social media, news, etcetera, through the recommendations of these systems. And he's been thinking a lot about how, polarization plays out in this space, how, much agency people do or don't have about influencing these systems and has been focused on thinking about ways that, people can really govern these, these technologies that increasingly shape our lives. So we're gonna hear from him, and then we will have some space for conversation as well. I welcome you throughout, Jonathan's presentation to drop comments in the chat if there are questions that you have or if there are things that, you just wanna share resources, links, that would be great as well. So, sent thanks so much for putting your intro in the chat. I will follow suit. But with that said, I'm gonna turn it over to you, Jonathan."
      },
      {
        "speaker": "Speaker 2",
        "start": 15.0,
        "end": 15.0,
        "transcript": "Thanks, Bea. Nice to be to be collaborating with you again. I recognize a couple of you, but mostly new. Okay. So as Bea said, I'm at the Center for Human Compatible AI at UC Berkeley. I study recommender systems. You probably have all heard this term, the algorithms that filter what we see. Previously, I was doing computational journalism at Columbia. I started out in computer science, so I've really been between the worlds of computers and information for a long time now. I've got about a twenty minute talk, and then, then we can chat. So let's see. Oh, yeah. And I as Bee mentioned, I also do a lot of work on on political polarization. I write a newsletter called the Better Conflict Bulletin, which I will grow a link to. There you go. Alright. So the talk today is democracy and democratic control recommender system. So how do we control these things in a democratic way, and and what what does that even mean? So just sort of starting from the top here, recommender systems are interesting for a bunch of reasons. One of which is that they are the the the some of the largest deployed AI systems. Most of the people in the world interact with, at least one of these systems every day. Most major platforms use them. And and the reason is they're solving a very fundamental problem, which is that, there is way too much information in the world for anybody to keep up. Like, that that idea almost doesn't even make sense when you think about the mismatch in order of magnitude. That's one of the arguments that people should see at least some personalized information. Specialization is maybe one way for society to deal with the fact that there's too much for any one person. And so we need systems that filter this down in some ways. And, of course, we have search engines. The the the basic limitation of search engines is they can only give you what they ask for. So the maybe the motivating example is a news recommender where, obviously, the whole point is to show you things you weren't searching for. Recommenders make a difference in a lot of people's lives. This is sort of one breakdown of the categories of stakeholders that I use sometimes. And there's basically four. Right? Like so, obviously, we normally think about the the person who gets the recommendations. We often talk about the the creator or producer as well. So if you think about, you know, a system like YouTube where you wanna encourage people to upload videos or a system like Spotify, which, of course, depends on other artists making music and uploading or not uploading, licensing it to the platform. But, of course, the the the platform or the operator them themself is also a stakeholder because they have sustainability concerns. And then, also, we have externalities. So one classic example of that might be, you know, if Amazon recommends high carbon products, that is an environmental externality. Or if Yelp recommends that everybody go to a particular restaurant, then, you know, it's going to be a crowded restaurant, sort of a quiet restaurant. So we often break them down like like this, and this is the the there's a whole, discipline of multistakeholder recommendation now. Another way to think about it is this sort of society in the loop model that Erwan Rahn has proposed. And I really like this model, the idea that there is an AI system which performs some social function, and in particular, it's a contested function. It's a much more realistic image of what we're doing in the sort of, like, AI alignment paradigm because this there's some sort of social contract where we all, let let's say, agree in quotes to be governed by an algorithmic system. And, you know, it's not as simple as solving the sort of agency problem for one person because people have different ideas of what the system should do. So I'm gonna talk through I've got this four level model for thinking about how we control recommender systems. And starting with policy and going down through operations, the algorithms, and the users. And there's a couple I wanna sort of explain what the arrows are here. So the the loopback from users to policy is, in some sense, the democratic governance loop. Right? That's stakeholder input on policies, that's voting, that's sort of the classic idea of governance that we have is participation in policymaking. However, there's this other arrow which goes from users to algorithms, and that is because recommender systems are all voting systems, implicitly or explicitly, or at least any system where my actions affect someone else is a voting system. So there's actually two sort of democratic feedback loops here. One is direct algorithmic input, and another is at that policy level. And then the operations level is something that is mostly annoyed ignored in in analysis of this problem so far, but I really wanna get into because I think it's actually really important in practice. So starting at the policy level, you know, I have another long talk, which is, so the policy aspects of recommender systems. And this is one breakdown of the types of policy that you can think about in terms of and I'm thinking of top down sorts of policies here. So you could ask people to disclose certain things about the recommender. You could ask them to measure and manage certain types of harms, or you could say we need certain certain processes. So appeals processes for content moderation, risk assessments for the systematic risks defined in the the digital services act. I have talked a bit elsewhere about what transparency in recommender systems could mean. So this is from a a post that I did with with some colleagues this year. But I'm actually not gonna talk about that too much today because although transparency can be an essential part of democratic decision making processes, I really wanna focus on sort of the core problem of how do you decide what it is that the system is supposed to do, and how do you make it do that? So instead of, transparency, I'm gonna talk about sort of from the top down to that diagram. And so I'm gonna talk about recommender policymaking. So here's one of Aviv's slides from a post he just made, and I think you could find it in the URL on the on the bottom left there. The fundamental challenge with recommender policymaking is that, in many cases, you are talking about speech regulation. And, you know, as opposed to, say, regulating which products can be sold on a market, speech regulation has unique implications for freedom of expression and, of course, in The US, first amendment, so the constitutionality problems. And you don't really want governments setting up speech policy, but you don't really want private interests doing it either. So you sort of have this who's left conundrum. And this is a nice answer to that question. Who's left? Well, you know, sort of everyone else. So we're now starting to see experiments using citizens' assembly. So also called deliberative mini publics or there's, like, three or four names for this. But the idea is basically you you do, a lottery, perhaps a stratified lottery. You get some citizens together. You give them resources for expert consultation. You have them make the decision. So that's one model for sort of democratic governance from the top down that, is a little bit different from, you know, like the Congress passing a law or something. But, of course, you still have the problem, okay, so what laws? Right? And you might imagine that what we want to look at is the potential harms of such systems, which means that we want to sort of measure and manage them in some way. And this is a slide from the Facebook files, which I think is far more interesting than almost any of the other coverage of the the set of documents. So you may remember there was a bunch of coverage about, you know, Instagram gives teen girls body image problems. So I don't think that coverage was very good because it sort of misrepresented what the source material was. So first of all, it was a focus group. There was no platform data at all using that. It's a type of study that anybody outside of Instagram could have done and many scholars did do. There were it was, you know, a a sort of 30 or 40 teen girls, and they were asking. They also that that stat, that one third number, that was of teen girls who had body image problems already, said it made their, body image problems worse. So it's not really measuring the thing that we care about, which is, is Instagram causing people to have body image problems? But also in those files, there was a a presentation on this study, which was a a large piece of research. This is a 100,000 person study across nine countries, and they asked for two things, negative social comparison and positive social comparison. So negative social comparison is I look at that workout video, and, you know, I feel bad about myself. Positive social comparison is I watch that workout video, and I feel inspired. And they found that, you know, both of these effects are happening. And they said about ten percent of users had positive social comparison often or or more. And then so we have this this interesting problem, which is if we look at this and we say this is an unacceptably high rate of negative social comparison, then, okay, what is what is the acceptable rate? Because it's never going to be zero, as as one platform engineer put it to me. Look. We have 2,000,000,000 users. Anything that can happen does happen. So how do we set appropriate thresholds, and how do we think about the trade offs? If the system was only harmful, we would just get rid of it. It's kind of like saying we could eliminate car crashes if we just eliminated cars. It's true, but it's not an interesting policy discussion. So that's that's sort of the challenge there. And then the the next question is, you know, are we even measuring the right thing? What what things should we measure when we try to evaluate these harms? Assuming we had measurements that we could, that we thought were reliable and we knew how to interpret, the next, question would be operations. How do we implement this in practice? And there's actually quite a bit of information, that talks about how platforms use metrics internally. And these are often used as KPIs, and there's actually some reports that certain types of metrics are used to or performance bonuses as well. So it's really a question of what what you know, you you you you manage what you can measure. So what are we managing to? This is another interesting document from the Facebook files. It's from a document that's all about how to do experiments on the newsfeed. And one of the things it says here is that if your experiment or your experimental change that you wanna deploy changes a list of of any of these metrics by more than a certain amount, you have to contact a point of contact and tell them that you're gonna just deploy this thing forty eight hours in advance. And there's pages and pages of these. There's, about a 100 of them, and this is what they look like. Right? So it's everything from, like, you know, are people playing games on, you know, on the the on Facebook too? There's this this metric they have, which is meaningful connection, which is they have a threshold which says that you if you exchange a certain number of of messages or interactions with someone, that connection is considered meaningful. It's sort of a proxy for, like, this is a close brand or close relationship. And so if you deploy a change or you wanna deploy a change at Facebook and it changes this metric by make more than negative point 2%, then you have to contact this person and negotiate with them. Or here's another couple of metrics. So NEQ is news ecosystem quality. This is a sort of very it's a relatively crude metric of the the quality of news that is being shared based, I believe, mostly on this crowd sourced news trust survey that Facebook runs. So they ask people, you know, how much do you trust news from this domain? And there's actually an external application that shows that this is actually a pretty good predictor of, bipartisan news quality. And then they also keep track of, you know, how many links were on politics topics. So you can see there's a wide variety of metrics that are monitored. There's hundreds here, and some of them are very straight up sort of engagement business concerns, and some of them are much more sort of quality of the service type measures. And I wanna try to, like, get people to think beyond the, like, you know, optimizing for an objective function type of view of of AI. I mean, yes, there are objective functions built into these these algorithmic systems, but they are managed by teams of people who have literally hundreds of different other measures that they are optimizing and negotiating. So it's it's much more sort of bureaucratic and has a lot of sort of internal politics than the sort of standard picture of AI optimization for a singular goal. So you can make metrics that embody normative goals. So this is from a paper out of Spotify, and they have a diversity goal. And And they they say a playlist is diverse if it contains songs from different levels of popularity. And this serves multiple goals simultaneously. They're interested in sort of flattening out the superstar economics of music production. Right? They they want somebody other than the top 10 artists to be viewed, and they're interested also in musical exploration and helping people broaden their their musical taste. So this bottom graph, this is actually the only published result that I know of from a platform that shows the trade off between a sort of normative measure and an engagement measure. So relevance, which is the vertical axis here, is is measured by basically surveys, you know, asking users how much do you like the product. And what they find is that if you increase diversity too much, which is going to the left, people are like, okay. But you aren't showing me music I like. So there's there's sort of me here, which is the sweet spot where you can change you know, increase one measure, which we think is normatively good, this sort of diversity measure, makes the ecosystem healthier and so forth, and without losing too much on these other measures. And so this is this is really, where the art of operations comes in. It's sort of mapping this trade off space between different things that we care about. And, of course, engagement is something that we care about because no no media organization can survive without engagement. It doesn't matter if it's ad supported or a nonprofit or whatever. I mean, I worked at Republica, and they're a nonprofit organization, and they have to produce a report every year to their funders that says this many people read our stories. Otherwise, nobody wants to fund them. So that's the so we've talked about the policy and the operational level. I wanna talk about the algorithmic level now. So at the core of most ranking systems is this thing that is sometimes called the value model or the scoring model. And, generally, what it is is it combines predicted probabilities of engagement with some other factors, often called integrity signals. So, like, you know, you know, you can't really build a classifier which tells you if an article is, contains falsehoods, but, you can build a classifier that kinda makes a guess based on a bunch of signals like tone and style and domain and so forth. So a lot of these things are done probabilistically, and they're combined in some formula like like this. These these are actually very large pieces of code. There's typically hundreds to thousands of different signals that are combined. There's a bunch of different special cases, so you're gonna use different ranking on, you know, a post in your timeline versus comments. If you and there's different places where you use this. So the YouTube home page where there's no query versus the recommendations on a video where it's relative to the item you're currently looking at versus after you do a search where it's relative to the query that you've done. There's there's lots and lots of variations on this. But sort of at heart, this is this is the how these systems work. And these are normally hand tuned with respect to the the metrics that I was talking about in the in the previous section. So in some sense, that function is the heart of the system and encodes most of the sort of normative payload. And so there have been some ex some experiments with direct participatory design of a ranking function. And this was a a a fascinating paper where they were trying to design it was a it's a nonprofit food bank that matches grocery stores and bakeries that have extra food with drivers who are volunteer with food banks. And what you wanna do is you wanna sort of you need to think about them all at the same time. Right? You don't wanna leave too much excess food undelivered. You don't want to, give delivery lot drivers long routes that are far away from from where they are, from where they they're starting from, and you want the food banks to like, maybe you wanna distribute to make sure that, you know, if they haven't had a delivery in a while, they get some food soon. So there's all kinds of interesting trade offs here. And what they did is they actually went through a a very hands on elicitation process using two methods. One is pairwise comparison. So they show them two matches, and they say which one do you like better or directly assigning values to a linear function. Right? So they actually worked with them to about an outreach to directly write one of these. Very high touch. They did this for about 30 different stakeholders, and then they sort of run that that's a model of each stakeholder's preferences, and they run them all at the same time, and they vote. And that's how they built this ranking system. So you can do this. I don't think it scales terribly well. This is actually a very small ranking system with a very small number of stakeholders and a very small number of parameters. So instead of trying to sort of hand build ranking functions in a participatory way, one of the major techniques that we see in practice is we try to infer them using survey data. And so here's a a YouTube asking for a rating on a recommendation. And one of the things to notice here is this category of things they're asking about. This is sort of the abstraction boundary between the normative and the technical. You can, in principle, ask a a user to evaluate any any property. Right? You can ask them any question. And you don't have to ask only about particular items. You can ask also about series of readings. So you can ask, you know, this conversation you had on Twitter, right, now you're asking about a sequence of tweets. You know, how did you feel about that? And then these measures, of course, they can be used at the sort of operational, sort of managerial level, but, increasingly, they are incorporated into the ranking system themselves, and this this is from a paper out of YouTube. And if you read it very carefully, what it says is that there's two types of objectives they they optimize for. One is the sort of classic engagement, you know, click, watch time, that sort of thing. Then there's this user satisfaction objectives. By the way, this this weighted combination box is exactly the value model we were just talking about. And if you read what they say satisfaction objectives are, well, some of it is we might also call engagement like likes, but some of it is the survey data. So they're actually building a model of what a user would answer on a survey if you gave them that survey, which sounds wild, but isn't really different in principle than trying to predict whether the user would click the like button. I mean, if I click the like button or I click the, yes, I found this useful button on the survey, I mean, I'm still just clicking a button. So to the extent that you can predict what people will like, you, you know, you can also predict what people are gonna answer on a survey. And so the the algorithm actually uses the survey data to build a proxy model, which then becomes part of the ranking. So that is a way that you get sort of, let's call it society scale or at least community scale feedback directly in a recommender system, and this is very common in production. I wrote a paper a couple years ago that sort of fleshed this out a little bit, talked about a few real world examples, and and this is kind of what what this process could look like if you were thinking about sort of community governance at the metrics level. Right? So you have to you have to have some sort of consultation process that defines the metric you're interested in. You probably can't get it exactly, so you probably have to develop a proxy that that we hope is signifies when we're doing a good job at this. And that proxy is applied first at the operational level because there's you actually sort of have more more power but less resolution at the operational level. Even if we don't know how to incorporate some sort of feedback into an algorithm, we can still watch it move. But we're watching averages if we look at sort of overall user satisfaction or, like, those metrics we saw earlier. You know, the fraction of people who are having close connections on Facebook. And then, you know, maybe we can build that in at the algorithmic level, and the advantage of doing that would be that we can then optimize at the individual level rather than trying to look at, you know, only these very broad averages or maybe through a few different subgroups. We can actually have the system try to serve people's interests personally. And then metrics always slip. That's one of the big lessons of metrics is that even a very good metric today might not be a good metric tomorrow because things change. User behavior changes. There's adaptation. There's adversarial behavior. The world changes. There's a huge there's a long list of AI models which broke when the pandemic started. You know, all of your, like, supply chain just in time models broke completely. So you you can't you can't just sort of set and forget. You have to sort of continually evaluate the metrics against the ground truth. And the the the near final thing I wanna say is this admits a huge category that we can think of as a democratic interaction, which is that, most real recommenders are voting systems. And sometimes they are explicitly voting systems as here on Reddit. And there's a few different voting systems, you know, hot, new, top. And for a while, Reddit was open source, and you could actually go look up the formulas that they used, and they're not very complicated formulas. What it hot is, I think, like, up votes minus down votes times a time weighted decay constant. So it it's it prefers things that are newer. And these are relatively straightforward formulas, and they're they're simple enough that you could tell you know, explain to users what they do, which is a sort of procedural fairness, which is maybe one way to think about what would legitimate or what would make a a voting system democratic. But even a, you know, even a system like like TikTok, it is a voting system in the sense that the the ranking that any individual user sees is dependent on a a deterministic process based on the actions of all other users. And so it's implicitly a voting system, but but most recommenders do this. And the reasons most recommenders are implicitly voting systems rather than trying to just serve you in a vacuum is because each individual person gives only a very limited amount of information about what their preferences, desires, and needs are. So given this sort of paucity of information that I have on any one person, the only really way to get a good result is to look for patterns and to say, ah, I think that you are this type of person. This is often in the policy world, this is often called profiling. I think that's not a very good word because, it sort of evokes the idea that those profiles are interpretable when, in fact, they're usually, like, clusterings or sort of abstract embeddings of user behavior. But in any case, as we say in statistics, real recommenders borrow strength. They look at, okay. Well, what this person is doing is similar to what this other person is doing, and the the simplest sort of cleanest example of that is, you know, people who liked this movie also liked that movie. And so that, if you sort of invert the sense of that statement, you see that that is a voting system. Right? I liked this movie. Therefore, I voted for this movie to be shown for people who also liked the movie the other movies that I watched. And so there's this whole avenue for recommenders as democratic information filtering processes. And and so Reddit is really at the at the forefront in terms of operational examples. But you could imagine a type of explainable recommender design that was up optimizing for, sort of legitimate procedural fairness where people understood how the voting system worked and were comfortable saying that this was a good system. Alright. The last thing I wanna talk about is, I've been sort of talking about democratic control of recommender systems like, you know, we can just decide what the outcome is, but we can't. And that is because all real recommender systems are embedded in a very complex feedback loop between these different stakeholders. And this happens, you know, even if your your recommender algorithm is just a chronological list, which, of course, only works for certain types of applications. But even where it does, you still get these feedback loops. And you've got this inner feedback loop, which is the recommender adjusting to user interactions. So this is the feedback loop you would be worried about instead of rabbit hole or filter bubble type problems. But you've also got this outer feedback loop, which is what content creators are incentivized to produce. And this is a sort of slower but potentially much more powerful feedback loop where you might worry about, let's say, incentivizing politicians or journalists to do inflammatory things because they get more distribution on the platform. So we can't really just turn a knob and get the outcomes we want. We can sort of poke at one part of the system. If we're talking about regulating algorithms, we're talking about this, but we don't get to control what users do or what creators do, at least not in a direct fashion, and they will surprise us. And this leads towards we're we're starting to get people you know, so my my general sort of methodological approach is I try to start from real recommenders and sort of work work work backwards from that, in terms of experiments and analyzing the data we have and and that sort of thing. But you can also start from from first principles and work forwards, and there are a group of scholars who are doing that. They are developing recommender systems along game theory fundamentals. And the the models they're building are are not really capable of modeling a modern platform yet, but we are starting to see some very interesting results in these abstract models. In in particular, we're starting to see theoretical results on the sort of equilibria between the recommender system ranking policy, the user's preferences, and the creator's production. So we're they we have mathematical models of this this feedback that we were talking about earlier. And and there's some interesting potential results here. Right? So you can think about equilibria. You know? So if I develop my ranking system in this way, that will set up the incentive so that the producers, let's say, have a you might normatively want, like, a a, you know, a certain type of diversity of content rather than everybody sort of maximizing for one type of most popular thing. Okay. I will stop there, and, looking forward to the conversation."
      },
      {
        "speaker": "Speaker 1",
        "start": 30.0,
        "end": 30.0,
        "transcript": "Wonderful. Thank you so much. I know that, that we had some good conversation going in the chat, but not too many questions shared yet. So, I will invite folks to raise their hands or to drop questions in the chat. I'm happy to voice over questions for folks as well. Maybe to give folks a oh, actually, we see Morgan has a hand up. So, Morgan, why don't you kick us off?"
      },
      {
        "speaker": "Speaker 3",
        "start": 45.0,
        "end": 45.0,
        "transcript": "Thanks. So this is amazing. Thank you so much for the presentation. I actually come from a world of search engine optimization. So, like, this is super fascinating stuff for me. And most recently, I've gotten really interested in, like, the TikTok algorithm specifically. So, you know, when we think about recommending content on an on a platform like TikTok, which relies so heavily on, you know, user engagement to your point, what what are the steps that can be taken to to make sure people aren't kind of funneling down into these into these echo chambers? I know there was a study recently. I I can't remember who did it, but it it someone experimented where they they followed certain, you know, red flags, certain, keywords. And in 400 videos, they ended up in what they called alt right TikTok. And so it doesn't take long. And so I'm just curious, like, what are the measures? What are the guardrails that we can put in place to stop that from happening when most of these algorithms rely so heavily on that kind of model?"
      },
      {
        "speaker": "Speaker 2",
        "start": 60.0,
        "end": 60.0,
        "transcript": "Yeah. So I think you're probably thinking of The Wall Street Journal with bot experiments. So so there's quite a lot known about, you know, if you want to stop that, what you do. And the basic answer is diversification, and all production recommenders pretty much have a diversification pass after the ranking pass. And there's various criteria you could diversify along. So, you know, you in this case, maybe you're thinking of topic or political ideology, but you also might wanna think about or platforms do think about format. You know, don't put too many videos in a row. Source, you know, don't put 10 posts from your friend in a row, that type of thing. And there's actually a little Facebook page where there's a little interactive game. Let's say model cards. Or is it Instagram? Yeah. Here you go. This is actually not a bad little little it's one of their transparency attempts. Yeah. The if you click through the Instagram one, there's a little interactive game, which basically tells you how the the ranker works and has you try to rank things. So all production recommenders more or less have a diversification pass. The issue of whether people are actually getting sort of influenced down down rabbit holes by these systems is pretty complicated. The major challenge with the face the, Wall Street Journal bot study is that the bots couldn't change. They were programmed to have a sort of, monomaniacal interest in, let's say, depressing content. And so, you know, no no real human behaves that way. And and furthermore, it it that model precludes exactly the causal questionnaire you're interested in. What you wanna know is how are users changing as a result of feedback loops interacting with the system. But if you try to do that study with a bot, the bot definitionally can't change. So I would say that we actually there's almost no ecologically valid work trying to study, this effect, which is which is very unfortunate. But I'll I'll stop there. Luke Luke Thorburn and I are working on a much larger article on this problem."
      },
      {
        "speaker": "Speaker 3",
        "start": 75.0,
        "end": 75.0,
        "transcript": "Thank you so much. I realized the study I was referencing is actually from Media Matters."
      },
      {
        "speaker": "Speaker 2",
        "start": 90.0,
        "end": 90.0,
        "transcript": "It was"
      },
      {
        "speaker": "Speaker 3",
        "start": 105.0,
        "end": 105.0,
        "transcript": "much smaller. I'll I'll link it here. But, no, that's that's super helpful."
      },
      {
        "speaker": "Speaker 2",
        "start": 120.0,
        "end": 120.0,
        "transcript": "Mhmm. Thank you."
      },
      {
        "speaker": "Speaker 4",
        "start": 135.0,
        "end": 135.0,
        "transcript": "Yes. I can jump in. Sure. Yeah. So I guess I'm I'm I'm one of the people arguing for democracy. Thanks for the slide in there. And I think there is this question, of course, so which people always ask me, which is, what is democracy anyway? What do we like, oh, the the rec you your voting is at democracy. Like, doesn't need to be representative. Doesn't need to be, like, maybe some under some frames you wanna be deliberative. Like, this is what I would advocate for. And I guess to get really concrete, like, which layer does which kind at which layer do which kinds of democracy matter, and how do we make that happen? And then to be even very, very concrete, you have a slide about participatory recommender alignment. And you like, step one was, choose a a well-being component measure with multistakeholder input. And I'm wondering, is there any obstacle that you imagine where you wouldn't where you couldn't instead have that input not be multistakeholder, but through democratic? A a because multistakeholder processes, I I don't see as democratic processes. I see them as elite processes that attempt to approximate what, the important interest groups might want under some definition of important interest group. Like, is there any reason why you wouldn't use, for example, a, you know, a representative deliberative process sort of, like, in the platform democracy slide to do that? Mhmm."
      },
      {
        "speaker": "Speaker 2",
        "start": 150.0,
        "end": 150.0,
        "transcript": "Yeah. I I think you could. I, you know, I wrote that paper before. I was really thinking about sort of, citizens assemblies type processes, and I think I think they could be a very natural fit for that that type of process. I mean, I think there is this this challenge of, like, where where in the process do you do the democracy? Right? Because there are thousands of different decisions that have to be made at at all scales, and it's not reasonable to say, well, you know, we're gonna have a vote on every single one of those. So there is this sort of like like all democracies, there is an agenda setting process, which should be receive at least as much of our analytical efforts as the voting process. And so you have to think, well, what are the what are the sort of top level decisions where it's important to have some sort of democratic governance? And and I think choosing the metrics to monitor is probably one of those places. But, you know, this that's also a highly technical question. Right? So the think of how the EPA works with the regulation of of pollutants. We don't have a vote on what the, you know, legal maximum for lead in our rivers is. We have congress who passes a a law giving the EPA authority to regulate it, and then we have specialists at the EPA who, you know, write those regulations. So should we have votes at some other level in that process? I mean, I I'm not sure. Boom. Okay. So Can I res okay?"
      },
      {
        "speaker": "Speaker 4",
        "start": 165.0,
        "end": 165.0,
        "transcript": "Yeah. Sure. Just I I guess the the distinction here is that you can have democratic processes that are that I I guess I guess, like, the the the goal of using a deliberate democratic process would be that you actually are bringing in that expertise, and you're actually bringing the population that is being impacted up to the level where they have an at least in a they have a sense of the the trade offs from you might still have that multistakeholder group, but the multi stakeholder group is going through a democratic filter. Right? So there is a representative population. There there's there's a a democratic body that is representative of the overall population that isn't hearing from the multi stakeholder group and having the time to to do their best to make sense of what those what those perspectives from those experts, let's say, those people who are high hired by the EPA to do. And then then they come to the conclusion. And, actually, this is exactly what has been done for things like water policy in several parts of Australia Mhmm. Where that's a really contentious issue. And it's, again, it's an allocation issue. It's a cost issue. And so you can imagine, like, oh, we need this much of the recommend. Like, this this factor in this place should be in this way, and it's it's almost a I think it's actually closer in some way to participatory budgeting, thing. But you can have deliberative participatory budgeting for allocation of attention within a recommender system. And that sort of I feel like this is a natural sort of progression of of this frame, and I'm curious if that resonates."
      },
      {
        "speaker": "Speaker 2",
        "start": 180.0,
        "end": 180.0,
        "transcript": "Yeah. I mean, I would definitely like to see more deliberative processes involved in recommending your governance. I think that makes a lot of sense, but I you have to draw a line somewhere. So democratic processes are slow. They are time consuming. They are more challenging when you have more deeply technical decisions to make. So I think I think one of the challenges is where where where in the stack and for what things do you put that deliberative layer? Yeah. And then to and be an answer to your question, you know, what types of democracy would we not replicate? Well, I mean, democracy has pretty certain well known weaknesses. Right? It's it's not very it's not a very fast decision making process. So, for example, I wouldn't want democratic processes involved in the sort of inner loop of responding to acute security threats, for example. I think there, you need to have the people managing it being able to make decisions very locally."
      },
      {
        "speaker": "Speaker 1",
        "start": 195.0,
        "end": 195.0,
        "transcript": "Yeah. I wanted to extend a little bit on on this point around how, things are slow and challenging, but also maybe the reality that not everyone wants to participate. Actually, it's kind of interesting to me watching my partner, Rob, interact with social media where, like, every time I watch a YouTube video that I like, I give it a thumbs up. Like, I'm like, yeah. Tell them that I love it. Tell the creator and also tell YouTube. And, like, my partner doesn't necessarily do that. Right? They're, like, a lot more conservative in, the types of reactions that they share online. And I was curious, if you know of any data or if the work that you've done is revealed, kind of how much you know, sometimes I've been prompted with one of those, like, was this relevant to you? Like, how how much do people actually respond to those things? Are people interested in participating? Participating? Like, do we have, a sense of kind of the burden of democracy or or participation in even these really lightweight voting systems, like the Reddit thumbs up, thumbs down?"
      },
      {
        "speaker": "Speaker 2",
        "start": 210.0,
        "end": 210.0,
        "transcript": "Yeah. So some of this is a question of sort of affordances and, you know, you can build a system that encourages people to give thumbs up, thumbs down. So so Facebook is a classic example of this. Right? The the thumbs up sign is both a social signal to your friends, which is why why you do it, but also a way to convince people to give information that helps in content ranking. So it's actually a very cleverly designed interaction. But other types of participatory systems. So let's think about recommender controls as a type of participatory system. Right? Even if it's just sort of setting controlling what you see, I I I don't know. Maybe that you can sort of look at it sideways and think about it as a type of democratic governance. But that those controls will also give information that is helpful for for other users, in which case it is, again, implicitly a voting system. Reliably, we see that only a few percent of users on major platforms adjust the controls. So there's just not a lot. It's like privacy settings. Right? Like, yes, we want everyone to have fine grade control of their information, but most people just won't. And so there's sort of two possibilities. Either it just needs better designs and no one has really cracked the code of building controls that people wanna use. And if we could figure out that design, then we would get see much higher engagement, or people are just never gonna do it. Most people just aren't going to care. And in any case, I think what it means is that we have to get the defaults right. So there's some balance between you know, we have participatory processes for how having people control what they see and also what other people see, and we have processes for making, you know, hundreds of other choices that we're never gonna get this type of of high touch democratic input on to get the defaults right. And I've sort of had this ongoing conversation with Ben Schneiderman who, is a figure in the human centered AI movement. And he's like, you know, these systems, what we want is we want better human control of these systems and, like, and social choice in my argue my sort of counterargument is often, but, like, okay. But, like, we're still gonna need to manage to metrics in some way because we just don't there's there's a sort of a paucity of feedback. We just don't have enough of the right types of feedback to avoid making a huge number of highly normatively consequential decisions for users who are never gonna tell us what they want. So that's, I don't know, that's that's the balance where we have to walk with all this stuff. Go ahead, Eleanor."
      },
      {
        "speaker": "Speaker 5",
        "start": 225.0,
        "end": 225.0,
        "transcript": "Kind of a low level question. So please shoot down if it's out of scope. But how do when you're clustering different, like, user profiles, even though, like, semantically, it's not really a profile like you were explaining Mhmm. How do you, from a system design level, figure out whether those kind of abstract, like, clusters have any semantic meaning? Is it important that they have some like, how do you evaluate that process if we're trying to, like we can't be all the time with that really low individual democratic level Mhmm. So we abstract up to, like, a clustering of people. How do you actually evaluate that?"
      },
      {
        "speaker": "Speaker 2",
        "start": 240.0,
        "end": 240.0,
        "transcript": "Well, you can you can try to interpret clusterings or I'm I'm gonna use embedding spaces. So I I hope you're all at least have heard of the concept of a of an embedding. Okay. So I'm just gonna use that word, and then if people wanna catch up on what that is, I can we can we can work out a a form to do that. So you so it's this abstract space. So, for example, a production large scale recommender system might use a user embedding that is a 128 dimensional. And you can sometimes interpret not really individual dimensions, like individual numbers, but vectors in that space. So so directions in that space notionally. And you've probably all seen the papers where this with word embeddings where you have, like, you know, man is to woman as king is to queen type of stuff. So you can recover semantics, in various ways, and there's a whole, line of research on interpreting, these embeddings. And, you know, it's closely connected to, interpreting neuron activation in deep neural networks. So you can do it. But I think I think one of the questions here is, okay. What what do you get by doing that? Right? And I because because what these what these embeddings are actually used for is as inputs to a model which predicts a user action, which predicts engagement, which predicts watch time, which predicts what you would answer if I asked you on a survey whether this video was inspirational. And so the embeddings, because they are trained in a to build these predictive models, they're going to encode whatever information is most predictive for whether you said the video was inspirational. So, gosh, I I feel like I'm wandering around a bunch of really complicated issues, but hope I don't know. Maybe that was helpful."
      },
      {
        "speaker": "Speaker 5",
        "start": 255.0,
        "end": 255.0,
        "transcript": "No. Yeah. I was I was just trying to think about, like not even from the explainability piece, but, like, from a metrics piece. I guess I I since I'm not as familiar with this particular type of of of feedback loop"
      },
      {
        "speaker": "Speaker 2",
        "start": 270.0,
        "end": 270.0,
        "transcript": "Mhmm."
      },
      {
        "speaker": "Speaker 5",
        "start": 285.0,
        "end": 285.0,
        "transcript": "Coming more from the CS side of things, I was just curious, like, how what is what is your accuracy level? Right? Like, what what are how are you evaluating that you are doing a a good job? Right. And"
      },
      {
        "speaker": "Speaker 2",
        "start": 300.0,
        "end": 300.0,
        "transcript": "It is two okay. Yes. Okay. So this"
      },
      {
        "speaker": "Speaker 5",
        "start": 315.0,
        "end": 315.0,
        "transcript": "is two Like, relative to those cluster those to those high level embeddings. Right. Just because, like, they aren't necessarily semantically meaningful. That that's kind of what I was looking for."
      },
      {
        "speaker": "Speaker 2",
        "start": 330.0,
        "end": 330.0,
        "transcript": "Right. Yeah. Yeah. Okay. So there's a there's a short answer. It's a it's a very well formed question. You have to train the models on something. So what you train them on is predictive accuracy for user action. Alright? So whether that's liking or answering a survey, that that's what they're trained on, and so they're going to encode whatever information is predictive. But that's not necessarily how the overall system is evaluated because there's this operations level where then you deploy the system and you watch all of these other metrics, which often can't be predicted in this way. So, like, say you're looking at user retention. Well, user retention is not an outcome. It's not a short term outcome of showing a a a a single item to a user the way that a like is or the way that a survey response is. And so you can't train on user retention because it's not you don't have the data. So instead, what you do is you try to hand tune, this value model, which combines which weights the the model predictions in some way and say, I think that if the model predicts that the user spends a certain amount of time on our site, that they'll keep their subscription. And then you deploy two different versions of it, and you AB test them, and you watch them for three months, and you see which one could do better. So there's there's there's an inner algorithmic optimization loop and an outer human optimization loop that operates at much lower, resolution both in terms of the amount of data you watch and the time over which you make choices. And I think that's that's the model I wanna get into people's head is that there's actually two metrics driven loops here."
      },
      {
        "speaker": "Speaker 1",
        "start": 345.0,
        "end": 345.0,
        "transcript": "Anyone have a last question for Jonathan before we come to our end? Or, Jonathan, are there any parting words you wanna leave us with?"
      },
      {
        "speaker": "Speaker 2",
        "start": 360.0,
        "end": 360.0,
        "transcript": "Oh, gosh. So I think there is there's basically two areas that I'm sort of let's call it three areas that I'm excited about in terms of future work here. One is the sort of deliberative platform democracy area that Aviv is really pushing on. Another is what I call the survey paradigm, which is I'm, you know, working to I and others are working to develop a methodology to be able to, like, ask somebody every month, like, hey. How are you feeling about your Netflix use? You know? Are you are you happy with the amount of time you're spending? Is the stuff you're watching making you miserable? You know, any of the sort of normative concerns we might have, we could just ask people. And then building technology to algorithmically chase a very sparse, long term, slow moving outcome. Right? I wanna try to algorithmically optimize for things other than short term engagement. And then the last area that I think there's really a lot of progress could be made is, you know, people, I think, rightly get very upset that the the it's what we're really talking about is attention directing algorithms or attention allocation. And so, you know, every time somebody tweets, you know, why why aren't people talking about this or, you know, the news isn't covering this or this should be viral and it isn't, what they're doing is making a claim that attention has been misallocated. And so what we might be able to produce is a voting system where everybody agrees that, like, okay. You know, for all of its faults and for all of the times it didn't allocate attention the way I wanted, at least there's some sort of procedural fairness. And I think that's sort of what Reddit has that doesn't, is there's some reasonably legible process by which attention is allocated. And I think I think we really know almost nothing about what is the design space of legible voting systems for attention allocation that might produce a sort of procedural fairness that people would accept in the public square. So it's kind of an explainable recommender challenge, but the challenge is not to explain the recommendations. The the challenge is to, produce a sort of procedural fairness."
      },
      {
        "speaker": "Speaker 1",
        "start": 375.0,
        "end": 375.0,
        "transcript": "Fantastic. Well, thank you so much for your presentation, and, I know there's been a lot of juicy chatter in the chat as well, so I think that'll be followed up in the Medigov Slack. But as per Medigov tradition, I would love to invite everyone to momentarily unmute themselves so that we can give Jonathan a round of applause. Three, two, one. Awesome. Thank you all so much."
      },
      {
        "speaker": "Speaker 2",
        "start": 390.0,
        "end": 390.0,
        "transcript": "Thanks, everyone. Well, I'm easy to find if you wanna chat. I'm trying to catch up on the comments here."
      }
    ],
    "summary": null
  }
}