{
  "metadata": {
    "transaction_key": null,
    "request_id": "metagov:hadfieldmenell-metagov",
    "sha256": null,
    "created": "2025-10-27T23:38:00.695558+00:00",
    "duration": null,
    "channels": 1,
    "models": [
      "metagov-manual"
    ],
    "model_info": {
      "metagov-manual": {
        "name": "metagov-manual",
        "version": "2025-10-01",
        "arch": "manual"
      }
    },
    "warnings": null,
    "summary_info": null
  },
  "results": {
    "channels": [],
    "utterances": [
      {
        "speaker": "Speaker 1",
        "start": 0.0,
        "end": 0.0,
        "transcript": "It is my great pleasure to introduce Dylan Hadfield Manel, who has recently finished his PhD from Berkeley and is now just starting at MIT. And we're going to hear from Dylan about let me get this title right. Artificial Intelligence Needs Normative Infrastructure. Dylan, please take it away. And and everybody else, please, start sharing any questions, thoughts you have in the chat, and we'll surface them after after Dylan's done. Thank you so much, Dylan, for joining us."
      },
      {
        "speaker": "Speaker 2",
        "start": 15.0,
        "end": 15.0,
        "transcript": "Yes. Thank you so much for the introduction. And just one question on process. Should I be watching the chat to see questions as they come in so we can respond to them in real time? Or"
      },
      {
        "speaker": "Speaker 1",
        "start": 30.0,
        "end": 30.0,
        "transcript": "I I think probably better if you if you give your introduction, and then I'll guide I'll I can help guide the discussion in the chat after that."
      },
      {
        "speaker": "Speaker 2",
        "start": 45.0,
        "end": 45.0,
        "transcript": "Okay. Yeah. Just I prefer to give talks with that replicate an in person experience like this when I can. So I've I've got it set up so I can see hand raises and things like that, and I welcome interjections."
      },
      {
        "speaker": "Speaker 3",
        "start": 60.0,
        "end": 60.0,
        "transcript": "Great."
      },
      {
        "speaker": "Speaker 2",
        "start": 75.0,
        "end": 75.0,
        "transcript": "Alright. Sense. With that, it's a pleasure to be here. My name is Dylan Hadfield Manel. I'll be starting as a professor in exactly one week at MIT on July 1. Well, it depends how many days are in June, actually, but I won't do the math in front of you all. And I'm gonna be talking about, artificial intelligence and the need for normative infrastructure there. So first, I kind of wanna start out with principal agent problems from economics. This is a classic problem that we we look at, and it's what occurs when you have a gap between the person on whose behalf you are acting and the actual agent that is carrying out those actions. One of my favorite paper titles of all time comes from this literature. It's On the Folly of Rewarding a While Hoping for b. And it does a great job of summarizing what it means just in the title. And there's this lovely sort of abstract that I often go through to talk about this. The bottom paragraph here is is I think the main thing, which is that numerous examples exist of reward systems which are fouled up in that the behaviors which are rewarded are those which the rewarder is trying to discourage. And a lot of my research is about how this kind of a problem can arise in the context of artificial intelligence and artificial agents. And this is not my paper, but this is a recent one that just came out a few weeks ago, by Sanhil Mullainathan and Ziat Overmayer. And this ports this underlying idea to some, draws an interesting analogy to it in artificial intelligence. And so they wrote on the inequity of predicting a while hoping for b. And I give them credit for taking a paper title that I wish I had taken. So the underlying idea behind this that that that or that that I sort of draw this behind is the fact that sort of proxy metrics tend to do an incomplete job of representing what we care about. And so this leads to the incomplete contracting problems specifically within principal agent problems. So an incomplete contract is one that has gaps, missing provisions, and ambiguities and has to be completed either by renegotiation or the courts with strictly positive probability in some states of the world. And down here below is a table from a paper of mine from 2019 where we looked at the connections between contractual incompleteness in a legal setting and reward misspecification in artificial intelligence applications. And we go through and identify several causes from the literature and incomplete contracting. And it's relatively straightforward for for most of these to to identify clear examples within artificial intelligence of these kinds of problems and issues. And and one of the core points that this builds on is is Goodhart's law, which is that any observed statistical regularity will tend to collapse once pressure is applied upon it for control purposes. And this is something that, you know, Kerr was certainly pulling upon. Mullenathan and Obermeyer were looking at. But this leads to a question of, like, why does this actually occur? What properties of the world are there that lead to this kind of pervasive phenomenon that we that we observe? So David Manheim and Scott Garabrant have done some initial work in this area where they tried to categorize different reasons behind Goodhart's law and different mechanisms through which it can operate. So there's a regressional side of Goodhart's law where you have noise that's added into your proxy and exploiting that noise leads to a gap between what you care about and what you end up optimizing for. There's extremal good hearts where you actually push beyond the range in which a classifier the metric was was trained on. Causal good heart where you have an underlying cause behind your metric that's distinct from the goal that you care about, and they break down a couple of different dependency relationships under which you can get this opportunity to uncorrelate your metric and your goal. And then finally, they talk about adversarial goodheart, which is the case where an adversary might be involved in crafting the metric in some way, shape, or form. And so you get one that performs well on your local data, but not very well in the future. And one of the things that they really sort of look at here is this idea of an in ignored additional cause that can lead to these problems. And so this was a work this was a a problem that I looked at with one of my co authors, a student named Simon Zhang. And we we looked at trying to formalize a little bit of the mathematical properties of an environment that lead to this type of good harding. And so the utility model that we looked at is our our model of both the world and the designer and and what they might care about. So we formalize our state space with an attribute space, which says that the set of available states is lower bounded by some value b, and then there's an upper bound on the different attributes. So our states are gonna be represented by vectors, lower bounded by b, and without an explicit upper bound. And to account for shared resources between these different attributes of utility, we have a constraint function which says that each attribute has a monotonic resource use and that those total constraints are bounded below some value. We then say that our designer, the principal on whose behalf we're acting, has utility function. And our primary assumption here is that the utility function depends on all of these attributes. So we're essentially assuming that there's no irrelevant attributes in this model. Then we look at what's our model of the proxy that we use to convey this goal to our system. We have a proxy attribute space instead of the true attribute space. And our primary assumption here is that this is a subset of the true attributes. So we're formalizing the idea that value is complex, and it is costly to represent different features or attributes of value to your system. And so in practice, in real systems, we should expect that they will end up working with incomplete representations of the values that they should be acting on behalf of. And then for our proxy metric, we don't do anything fancy. All we say is that it's going to also have the same relationship with utility that the underlying true metric does. So the proxy metric depends on our attributes and is positively related to increase in the attributes. Now our optimization model for how the system takes this proxy and operates in the world is in is one of incremental optimization. So we say that the state changes through some rate function that depends on your proxy metric. In a sense, this is going to be increasing the values that you're incentivizing with the proxy. And then we also assume that there's a complete optimization of this proxy metric. So in the limit of infinite time, your state eventually reaches the optimal point according to your proxy. Altogether, this gives us a model of a misaligned artificial agent in an AI in a principal agent kind of setting where we have a reward designer who has a true utility that we'll call r star in her head. And she's got a proxy that she specifies to the system, which then goes off and produces a sequence of changes in the world incentivized by that proxy. So our key assumption here really is that there's a split between the true utility and the proxy utility, and the true utility depends on more features than the proxy utility does. Additionally, we have a shared resource constraint between these attributes of value in order to restrict the reachable space. For an example of of what this might look like in an AI setting, we can think about the problem of incentivizing a recommender system like that that Facebook might use to sort through news feed or YouTube might use to recommend different types of videos. The true features in this case of our very simplified utility model would be things like the clicks that someone takes on the site, their engagement with different pieces of content, so a more kind of qualitative notion of engagement. You could look at aspects of the prevalence of hate speech in that system and then also something sort of very complex and and removed like community well-being, which you'd like to try to incorporate in that in some way. The shared resources that we have are both cognitive limits on the user's attention as well as physical limits on the space and the user's speed. So we put together this model. We picked some numbers for it, and we we selected a a proxy that only depended on clicks and engagements. And we got the result that that you might be expecting, given the context of this talk and and my points about good hard flow earlier on, which is to say that as we increase proxy utility, eventually, true utility crests over and falls. And and eventually, we actually uncorrelate our proxy metric from from true utility such that increasing our proxy is actually a good way to decrease true utility. And furthermore, we didn't try this just for one combination of of which features were in the subset. We tried it for all of them and looked at the true utility that was generated. And we find is that this cresting behavior is is quite common. In fact, we saw it every time. And this is more than just the particular details of the environment that we picked. We show in our primary result in the paper is this theorem here. So what it says is suppose we have utility function u and state space s, then if the set of reachable states is compact and and, really, this is the set of reachable states with at least a fixed value u, if that's compact, then in the sense, any proxy utility function that only depends on a subset of attributes will either lead you to hit an environmental bound on our boundary condition for attributes, or it will drive your utility down toward arbitrarily low. And this result sort of is is kind of our primary results in the paper. And the way that I see it is is this is our attempt to formalize an aspect of Goodhart's law. And then now I I we want to build systems that can do a better job of aligning with what we care about. So we look at the ways that you might weaken this theorem in order to get around its conclusions. So first, we have impact regularization as something that you might do. And this is a way to loosely introduce dependence on all features of the world through a reference to the current world state. So what we're saying is that instead of maximizing just the proxy metric, you could maximize the proxy metric minus some cost on the total feature distance. And it turns out that this allows you to, guarantee improvement over your initial state, but as as you might expect, also limits the total utility you can reach at optimum. On the other hand, what we can do is relax the assumption that there's a fixed utility function, and this introduces a notion of interactivity. So, specifically, what we show is that if you choose your features carefully, there's a local neighborhood around your proxy utility function such that or local neighborhood around your state, I should say, such that a proxy utility function has correlation with true utility. So if you give me a state, I can draw a boundary around that and give you a proxy utility function such that increases in the proxy, at least locally, lead to increases in true utility. And in doing so, you can actually craft a solution that has guaranteed improvement locally, provided you can update the incentives of the AI system fast enough. And as we look at at take this kind of interactivity idea for for managing misalignment and gaps and specifications, we should be able to see evidence of this in real AI systems. And we we looked to do this, we looked at a survey of how recommender systems have been modified in the past in order to better align with human values. And so this is a paper we put in ML workshop at ICML last summer. And we identified this kind of recurring pattern in the way that these incentives have been updated. Step one is is an identify step where system designers become aware of either a desired or undesired outcome and create an associated abstract category. This is all sort of human work. And, you know, past examples of this, probably the most famous one is clickbait, which did not really exist before the Internet. I mean, there there were things about false advertising, but it wasn't important for us to clearly identify this particular type of content and develop a category until people started trying to manipulate Internet systems, and it became relevant to identify this. Another category would be recognizing that recommender systems are involved are involved or implicated in prevalence of spreading conspiracy theories or things like harassment online. And so this is identifying your problem or a goal you'd like to accomplish and creating an associated concept. The next step is to operationalize this idea, which is developing a concrete procedure in order to identify instances or measure the prevalence of that abstract concept. This can be hand coded new metrics. So for clickbait, one of the solutions that the industry developed was return time. So how long after clicking on a link do they wait before you come back to the system? And if that's very short, they take that as a proxy for something being clickbait. The more modern solution now often involves large groups of people classifying content into different content moderation signals and AI models that can then be deployed. And then the final step is to actually adjust the system based on these operationalized concepts. So you can modify your recommender system in order to increase or decrease the prevalence of that concept. And this happens either by changing the incentives of employees, changing the metrics that you use in AB tests when you choose what to launch, and, directly modifying the incentives of the AI system itself. And I think what's nice about this is that this aligns with that interactive management of metrics that we predicted should happen in in the previous paper. And and, really, I think when you look at this, there there's a really interesting question on what actually happens at this identify step, and how can we make that faster and better. Okay. So so this is kind of a picture of misalignment. We've presented the idea that AI systems can be thought of as a type of principal agent problem with their designers. And we've talked about different strategies that you might use in order to manage misalignment issues. I can see a raised hand here. Yes. Question."
      },
      {
        "speaker": "Speaker 4",
        "start": 90.0,
        "end": 90.0,
        "transcript": "Yeah. Hi. Thank you. I just wanted to just to make this a little bit more concrete, can you give an example of what an undesired outcome would be?"
      },
      {
        "speaker": "Speaker 2",
        "start": 105.0,
        "end": 105.0,
        "transcript": "So an an undesired outcome would be that you build a recommender system you're, say, Facebook. You build a recommender system to identify links, and it's paying attention to clicks. And what you notice is that lots and lots of people are clicking on really there's a proliferation of listicles and sort of shallow content. And you don't think that your users are actually getting value from that content directly. And so that, like, that is a pattern of behaviors both in, like, a pattern of the types of content that's being generated and a pattern in terms of the user behavior for how they interact with that con with that content. And if we go back to, say, 1995, I think we did not have clickbait as a word. I actually should look up when it got when it officially got added to dictionaries when I give this talk in the future. But that that word, like, that you know, in in principle, we could have had that exist, but there's sort of a social construction of this is what is a category that we would like to collectively identify and then then change. And and so in this case, that's happening internally to the company, and, well, we'll we'll see where we're headed next in a little bit. K."
      },
      {
        "speaker": "Speaker 4",
        "start": 120.0,
        "end": 120.0,
        "transcript": "And I guess the the the follow-up question that would be then why is clickbait necessarily undesired. Right? Because I can imagine I I there's there's ways in which I can see how it is undesired, but then there's also ways where I can see people love clickbait sometimes, and it gives them that boost of dopamine, and they, you know, just wanted to see that weird picture or that weird whatever the thing was. So I'm curious about how you decide that that's undesired."
      },
      {
        "speaker": "Speaker 2",
        "start": 135.0,
        "end": 135.0,
        "transcript": "That is a really good question that I think, we should return to later on in the talk. Right now, at at this at this stage, what we're talking about is just the things internal to a company managing its recommender system. So, really, when on this slide, when we say undesired, all that that means is someone at the company has decided that this is bad. Yes. I've got another hand there."
      },
      {
        "speaker": "Speaker 5",
        "start": 150.0,
        "end": 150.0,
        "transcript": "Thanks. Yeah. I've been working in our commander system and AI impact assessment, so I'm super curious in aligning our commander systems to human values. How do you account for filter bubble and echo chamber effects? So"
      },
      {
        "speaker": "Speaker 2",
        "start": 165.0,
        "end": 165.0,
        "transcript": "I wanna be clear that this paper is not describing a procedure for aligning recommender systems with human values. This paper is summarizing and surveying public information about ways that previous recommender systems have been updated to better align with well, different, I I guess, you know, cue well, we're calling them human values, I guess. But so so, really, what this is what this paper was the main contribution in this paper was kind of hunting through the various blogs in which companies had posted updates about, we changed our recommender system to do x and kind of compiling that all into one place. In terms of questions on the the types of questions you're raising about filter bubbles and I'm sorry. I what was the other"
      },
      {
        "speaker": "Speaker 6",
        "start": 180.0,
        "end": 180.0,
        "transcript": "Echo chamber."
      },
      {
        "speaker": "Speaker 2",
        "start": 195.0,
        "end": 195.0,
        "transcript": "And echo chambers?"
      },
      {
        "speaker": "Speaker 5",
        "start": 210.0,
        "end": 210.0,
        "transcript": "I guess that's sort of their interactions when people interact from a human perspective, like me interacting with our commander system. That is something that's, like, might happen. Right?"
      },
      {
        "speaker": "Speaker 2",
        "start": 225.0,
        "end": 225.0,
        "transcript": "Right. So so I think one of the I I think what that gets at like, the the core tension that that identifies is perhaps some components of the mutability of preferences and the ways that recommender systems can interact with people and and change their preferences. It's not a matter of sort of aligning to a static set of preferences that that an individual might have. Mhmm. And that's a really interesting question. I actually have a couple of slides at the very end of this talk that I I can get into about some work that we're doing that looks at trying to understand what basically, looking through a model of a recommender system interacting with someone who has non static preferences that change in response to what they see, and trying to identify what even are the the ways to to talk about good versus bad influence."
      },
      {
        "speaker": "Speaker 7",
        "start": 240.0,
        "end": 240.0,
        "transcript": "Thank you."
      },
      {
        "speaker": "Speaker 2",
        "start": 255.0,
        "end": 255.0,
        "transcript": "Thank you. I've got another question up here as well. Please go ahead."
      },
      {
        "speaker": "Speaker 8",
        "start": 270.0,
        "end": 270.0,
        "transcript": "I just wanted to kinda ask a question about the I I mean, in the context of these questions we're we're hearing about a kind of map versus territory distinction with these algorithms because when you talk about the humans and the metrics that you're optimizing for when you actually cross validate and, like, train your recommender system versus their, like, their ability to express your real goals or so, like, a gap between your model of good and good from the perspective of the user. And, I mean, the feedback loop that you're showing kind of implicitly solves for continuously trying to adjust your your, you know, map of the territory to better map the territory, you know, assuming the people and their wants and needs are the territory. And I'm I'm curious whether this sort of management of the epistemic gap in the in a in data systems is motivating for you when looking at these kinds of effectively model governance procedures."
      },
      {
        "speaker": "Speaker 2",
        "start": 285.0,
        "end": 285.0,
        "transcript": "Yes. Absolutely. And I think if I can, I will defer, like, my I'll defer that question to until I've gone through the next few slides? Because I think the tension that all of you are are highlighting here is that, especially for a group like this, I've set up a bit of a straw man. We started off talking about AI systems, alignment, principal agent problems, and we're we're identifying that even in a scenario where it is only a principal and an agent, you still have phenomena of misalignment that are difficult to manage. But as as you've all pointed out, there's there's much, much more to this picture. So let me return to to this a little bit and and talk about so we we've described a way that a system can deal with these types of epistemic gaps. And now I wanna think about the ways that people deal with those epistemic gaps or or not necessarily epistemic gaps, but but we'll sort of see. So in one of the early papers that I identified these misalignment problems in artificial intelligence, they they talk about side effects. With an example of a robot that's been given a task to carry a bunch of blocks from one side of the room to the other. And the way that you would do this in, like, a standard grad school, like, grad student solution with the mark off decision process is you give it a big reward for for doing that. And what they work through in the paper is, like, oh, well, the system, unless you tell it that it's bad to knock over the vase and block it, there happens to be some expensive vase in the middle of the room, it will go through and and knock it over. And now if you give a person sort of a similar analogous thing, right, you you give them a contract where you tell them, we'll give you a lot of money if you carry these boxes from one side of the room to the other, they're not gonna knock over the base. And the reason why they're not gonna do it is a combination of things. One is more of a standard problem you talk about in artificial intelligence, which is people will often be able to predict that the base will be broken, but that's not enough. Right? You need to also predict that it's bad to break the base. And humans will pretty reliably do this. And in doing so, we'll infer that there's this negative cost or positive cost, negative reward, which is a cost, on breaking the vase. And what do they use? What cognitive tools do they use in order to do this? Well, it's really cognitive tools about external normative structure. It's cognitive tools which tell them that, oh, I'm being hired for this job. If I break that vase, someone's gonna take me to court and sue me and and recoup damages for it. And and we sort of we we we know that in at least the societies I'm I'm referencing for this, that would be what's likely to happen. Although it doesn't have to be official institutions, it could also be sort of more soft governance style things. It could be that, you know, this is you're actually moving helping your friend move, and you just know that, like, if you break that vase, they're all gonna get mad at you, and they're not gonna invite you to go get beer with them afterwards. Right? And and so you you get these external components of incentives, which are crucial to avoiding breaking to avoiding the negative outcome here. So to summarize, human contracts rely on tons of structure. What is it reasonable to think that the parties had in mind when they agreed? And then reasonable and these other gaps filling components are provided by institutions, norms, and law. And in order to build systems that effectively align with our goals in in sort of the broad writ large, I'd argue that we need to provide similar components to AI systems and allow AI systems to draw upon these types of external normative structure. And this is basically what what I think we can try to work towards. So the proposal is really that in our model of a principal agent system for artificial intelligence, where we have our designer, they've got the true reward function, and they've got a proxy, and the system optimizes for that, there's one gap which is between this sort of r tilde and r star. But then there's a second aspect of the gap, which is all of these other people are stakeholders in the system, and they care about its output. And the thing that's kind of missing technologically is what are these affordances to these external parties? What are the actions that they can take to modify the incentives of AI systems? Now I think as we as we look at this idea, it's very nice to sort of suggest and and and draw upon. This is actually happening right now. So if we look at recommender systems again, there are groups like FactMatter, the Global Disinformation Index, and and, like, a combination of, like, nonprofit and for profit companies that are all creating metrics external to these platforms that could be incorporated into these systems. The barriers to adoption are that there's no direct connection between PR concerns and the adoption or inclusion of new metrics. And in fact, adopting a new metric is often bad from a PR incentives perspective because it tells it communicates to the world that you were doing something wrong or that you were missing something. There's a strong incentive not to change current systems due to profit motives, and you can't you just can't really apply contested platform level. If we think about content moderation at the level of hate speech for a global platform, there there is no I I would I certainly believe that there's no single metric that the world as it stands on right now would be happy to rest on. Like, this is what defines hate speech. And you and you end up with some loose combination of all those that's aggregated and and sort of doesn't leave anyone happy. And I think there's a clean system design for what's effectively a values layer for an AI services ecosystem that that can get around some of these issues. So this idea builds upon three different parties. There's the user population and the AI service. This is basically, you know, the normal standard model. The new thing that we're introducing are these external parties, these metric providers. And what these can do is provide definitions of values, the two AI services. And and in doing so, they are sort of society's answer to how do we manage this epistemic gap in a way that is distributed and shares power. Then once these value definitions go into an AI service, rather than just selecting, oh, here's the values we want. We're gonna apply them across our platform. AI service can then present these libraries of values to users along with experiences for learning about and prioritizing values, which then allows users to connect back to the AI AI service by presenting themselves as a point within this latent value space. In a sense, what this does is it is it allows external parties to have input into the space of values that we're modeling people within. And just a heads up, my computer has decided to start showing me the rainbow pinwheel of death. I'm not sure if the rest of you can see it, but I'll be going through the rest. I'll just continuing on briefly because this is kind of our our last real slide, and then we can take questions. Once users provide their choices of which metrics they want to influence their experience, the real thing that makes all of this work is financial incentives for metric providers. We can't expect providers to do this work for free. This is in fact some of the most important work that our societies will do. And so one of the key questions and and questions I'd love to get input from you guys on is what are the ways to set the incentives of those metric providers based on user choices about which values matter to them. One of the key reasons that you want these incentives to be there and related to user choices is that the bottom leg of this triangle is really a trust relationship because the users are now able to trust in the value definitions and terms provided by metric providers rather than directly trusting the AI service. And I think this opens up a wide range of possibilities and and really lays the groundwork for more value pluralism in online platforms. And with that, I'll I'll stop here and and move to take questions."
      },
      {
        "speaker": "Speaker 1",
        "start": 300.0,
        "end": 300.0,
        "transcript": "Awesome. Thank you so much, Dylan. You've already started taking some questions as well, so, which is which is great. But, I'm just looking at the who's who's been surfacing issues. And the first person we haven't heard from yet is Divya. Divya, do you wanna give voice to your question?"
      },
      {
        "speaker": "Speaker 6",
        "start": 315.0,
        "end": 315.0,
        "transcript": "Sure. Yeah. I think you started touching on this. I asked the question a bit earlier in the presentation with the idea of these external metrics. But at the beginning, I had this question around, you know, a lot of this seems to be about deciding whose preferences to align with, and part of that is just, you know, you mentioned the first model was around in a company. Those are just the people who decide, and that's where we're going from. And so my question was around, are there ways to mediate between conflicting preferences that you're exploring? And I would add on now, do you think coming from this idea of value pluralism being engendered by the metric providers, is that the way to mediate between conflicting preferences?"
      },
      {
        "speaker": "Speaker 3",
        "start": 330.0,
        "end": 330.0,
        "transcript": "Yes. I think that's my answer here. Also, can you all hear me? My computer decided to restart. So I'm"
      },
      {
        "speaker": "Speaker 6",
        "start": 345.0,
        "end": 345.0,
        "transcript": "Yes. Yeah."
      },
      {
        "speaker": "Speaker 3",
        "start": 360.0,
        "end": 360.0,
        "transcript": "Great. Okay. Good. So, yes, I think my answer here is that those metric providers combined with effective effective agency for users on which metrics end up impacting their experience is is part of my answer around, whose values to align with. Now this doesn't say there aren't any collective decisions to be made. There's still moderation questions that you have to decide on at the level of, which types of metrics will you allow onto your platform. But I think that that the the dynamics of that decision seem very different to me, and the incentives behind it seem very different to me in in the sense that you don't have to get something that everyone is happy with. You have to provide an option that gives a a reasonable choice for most people and and doesn't have anything that's considered too morally objectionable by the rest."
      },
      {
        "speaker": "Speaker 6",
        "start": 375.0,
        "end": 375.0,
        "transcript": "Mhmm. What do"
      },
      {
        "speaker": "Speaker 4",
        "start": 390.0,
        "end": 390.0,
        "transcript": "you think are"
      },
      {
        "speaker": "Speaker 6",
        "start": 405.0,
        "end": 405.0,
        "transcript": "the major sorry. Not to take up too much time. Feel free to speed around this answer. What do you think are the major kind of barriers to that? Because in my view, you know, we already have in a sense the capability to as you mentioned, some of these external metrics are already being constructed. There's a lot of broad understanding over what they might look like, and yet we don't see a huge amount of adoption. We already see these alignment problems in the in the small scale with all of the kinds of information ecosystem, for example, pieces you mentioned. So what do you think are the broad kind of barriers here? And and, you know, if you have thoughts on solving them, obviously, that would be great."
      },
      {
        "speaker": "Speaker 3",
        "start": 420.0,
        "end": 420.0,
        "transcript": "I think one of the big barriers is that this is there are a lot of moving pieces that all need to be put in place before a system like this can get off the ground. There's user knowledge around familiarity with what different metrics mean, how to trust the institutions that are are sort of the that are providing them in effect. And there is a problem additionally of these metrics are all out there, but they're all one off, and they're all being marketed as, basically approaches to do better content moderation. And I think one of the key things about the proposal here is that it's not content moderation. It is it is something quite different in in that you are allowing external control of the ranking system directly."
      },
      {
        "speaker": "Speaker 6",
        "start": 435.0,
        "end": 435.0,
        "transcript": "Great. Thank"
      },
      {
        "speaker": "Speaker 3",
        "start": 450.0,
        "end": 450.0,
        "transcript": "you. Yeah. Great. Thank you."
      },
      {
        "speaker": "Speaker 1",
        "start": 465.0,
        "end": 465.0,
        "transcript": "Alright. Turn it over to Josh. Josh has a a bunch of ideas in the in the chat. What do you wanna raise?"
      },
      {
        "speaker": "Speaker 3",
        "start": 480.0,
        "end": 480.0,
        "transcript": "I lost access to the chat through or"
      },
      {
        "speaker": "Speaker 2",
        "start": 495.0,
        "end": 495.0,
        "transcript": "yeah."
      },
      {
        "speaker": "Speaker 9",
        "start": 510.0,
        "end": 510.0,
        "transcript": "I don't know if you get Josh Josh over. No worries. I'll I'll summarize."
      },
      {
        "speaker": "Speaker 3",
        "start": 525.0,
        "end": 525.0,
        "transcript": "Great. Thank you."
      },
      {
        "speaker": "Speaker 9",
        "start": 540.0,
        "end": 540.0,
        "transcript": "This is a really interesting talk. I guess I have a ton of questions. One really short one was just on the most recent topic following the divvy. So when you talk about offering metrics, are you literally talking about, like, the utility functions that the AI service, like, the model will be consuming? Like, you have an algorithm, like, again, and you'll be saying, like, like, the discriminator is gonna be using this function to optimize, you know, like, we'll be optimizing this thing. Like, that's, like, that level of specific specificity, or are you saying there's some, like, sort of, like, big abstract value function that's getting, you know, derived to control these underlying actual implementations?"
      },
      {
        "speaker": "Speaker 3",
        "start": 555.0,
        "end": 555.0,
        "transcript": "I think that's a that's a technical question. Right? It's a there are questions about what is technologically possible in building that affordance. Right? So so I I think there does need to be an affordance for externally defined features. Basically, some of managing that epistemic gap has to happen outside of the company because PR incentives are so weak and perverse. And that there's, like you know, if you look at that picture, you know, identify, operationalize, adjust, that identify step is, like, a clear failing Mhmm. Of of where current systems fall short. So so I think that the main thing I'm trying to argue behind on this talk is is that those affordances should be there and that we need them in some way. Now that the practical reality is of should they be a trained neural net that an external group manages the dataset for and perhaps the training for, that's a really interesting question that has to do with, like, technical questions on the exploitability of those objectives. Right? So, like, all of the things from before about the system working around gaps in the specification will still apply."
      },
      {
        "speaker": "Speaker 9",
        "start": 570.0,
        "end": 570.0,
        "transcript": "Yeah. I think it's really super interesting subject because, you know, like, I guess it's really wonderful that you're approaching this from an ethical perspective because we're kind of like I've been talking to another person kind of at FHI about, like, the the, essentially, the economics perspective on AI disintegration. Exactly how we'd like you know, what parts of these different sort of, like, workflows can actually be separated in the different companies? Because what you're proposing here is essentially, like, the, you know, like, this Metro provider nominally, like, a different institution would be, like, somehow governing these AI sort of services. Right? But, like Mhmm. There exist a separate institutions, whereas, typically, the workflow right now, like, they have to be sitting in the same room on the same team working on the same stuff because, like, these systems are actually, like, very, very, like, fragile in ways. They need to be quite fine tuned."
      },
      {
        "speaker": "Speaker 3",
        "start": 585.0,
        "end": 585.0,
        "transcript": "Right. Right. And I think that it's like, I I think there is a really direct analogy to contracting here of external institutions providing the definition of terms that set the incentives of agents in the economy. Right? Like, I think that that that nails down very, very well. And then there are this is why it's not a purely this is a it's a fundamentally interdisciplinary problem to solve here because you have to be thinking about the the, like, practical realities of what different groups can exist, what are they capable of doing, what do they want to be doing, alongside these really, you know, complex questions around the state of practice of artificial intelligence, And how do we build systems that that do what even one person wants is is very, very hard, and this is this moves pretty far beyond that."
      },
      {
        "speaker": "Speaker 9",
        "start": 600.0,
        "end": 600.0,
        "transcript": "I'm gonna be very terrible and ask a really quick follow-up question. Well So sorry to monopolize. But it's actually directly related to this contract perspective. So I've actually been kind of randomly thinking about the taxonomy of contracts recently."
      },
      {
        "speaker": "Speaker 1",
        "start": 615.0,
        "end": 615.0,
        "transcript": "Josh Josh, you've already done two questions, and we have"
      },
      {
        "speaker": "Speaker 9",
        "start": 630.0,
        "end": 630.0,
        "transcript": "some other"
      },
      {
        "speaker": "Speaker 1",
        "start": 645.0,
        "end": 645.0,
        "transcript": "people in line. I'm sorry."
      },
      {
        "speaker": "Speaker 9",
        "start": 660.0,
        "end": 660.0,
        "transcript": "I'll follow-up later in the slide."
      },
      {
        "speaker": "Speaker 1",
        "start": 675.0,
        "end": 675.0,
        "transcript": "Yeah. Please do. We we have a question, that was building on that, from Peter who we haven't heard from. So if it's okay, I'd like to drop Peter in. Mhmm. Do you wanna, explain what you're thinking?"
      },
      {
        "speaker": "Speaker 10",
        "start": 690.0,
        "end": 690.0,
        "transcript": "Yeah. So the the thing that really strikes me is if we start to try to build this this library and I think, you know, a few of us have projects that that may turn into parts of it. What should we try to build first? Should we should we try to build like a completely abstract ethical reasoning system or should we try to build something that is specific to this kind of industrial scale problem that the social media platforms have which is either like you know over Youtube videos or some of the library of media like how do you express different ethical judgments or over conversations and statements in online conversations, posts on forums, how do you express values? Like do you have any intuition about where the the best place to start is?"
      },
      {
        "speaker": "Speaker 3",
        "start": 705.0,
        "end": 705.0,
        "transcript": "I think my and this is where I certainly I don't have that much confidence in my intuitions per se, but what what I've been trying to do here is to meet people where they are and and do things that are practically useful. I I think that this is"
      },
      {
        "speaker": "Speaker 2",
        "start": 720.0,
        "end": 720.0,
        "transcript": "there"
      },
      {
        "speaker": "Speaker 3",
        "start": 735.0,
        "end": 735.0,
        "transcript": "are a lot of different moving parts that that sort of all have to kind of get there, and and they they mutually support each other. Right? The existence of metric providers that are integrated with AI services means that more users will know about which metrics they trust and which ones they don't. And so I I tend to think getting something that's a proof of concept of what this looks like in place in in actual AI systems is is the thing to drive for and in order to make that happen you have to have an actual AI service, metric providers from institutions that have some degree of of social legitimacy, and you need, like, a a nontrivial user base that's interacting with those types of systems. And, I think figuring out how and and you have to have money flowing in the system. That that's the that's that's what we need to work towards, or at least that's what I'm trying to work towards, I think."
      },
      {
        "speaker": "Speaker 1",
        "start": 750.0,
        "end": 750.0,
        "transcript": "And sorry to keep messing up the order here, but I wanna make sure to get in as many voices as we can. Amy had a question as well."
      },
      {
        "speaker": "Speaker 9",
        "start": 765.0,
        "end": 765.0,
        "transcript": "Please."
      },
      {
        "speaker": "Speaker 7",
        "start": 780.0,
        "end": 780.0,
        "transcript": "Oh, yeah. Thank sorry for cutting in front of the line. Good to see you again, Dylan. I had a question about process versus outcome here. Mhmm. So it's so correct me if I'm wrong. It seems like you're mainly talking about alignment of outcomes via these metrics. But are you thinking at all about procedural aspects here? So, like, with regards to the not just the metric institution, which I think you can design to potentially be legitimate, but also the connection to the AI algorithm itself. Because I because I see potential tensions there."
      },
      {
        "speaker": "Speaker 3",
        "start": 795.0,
        "end": 795.0,
        "transcript": "I'd love to talk about what those tensions are. They're like, you know,"
      },
      {
        "speaker": "Speaker 7",
        "start": 810.0,
        "end": 810.0,
        "transcript": "you have to explainability, transparency, like, auditing services that might be trying to audit these algorithms. The the process by which you're getting to these particular outcomes might be something that people might object to perhaps Right."
      },
      {
        "speaker": "Speaker 3",
        "start": 825.0,
        "end": 825.0,
        "transcript": "Yes. So I don't think this I I think of this as being complementary to algorithmic auditing ideas, and definitely not something that supplants them. What I think this get if you're thinking about this as an auditor, what this gives you is lots and lots of metrics that you know you can run within the system to look for things that you don't like and don't trust. So the the space of possible metrics, at least the way that I see it, it it defines a set of different recommendation options that the system can provide. And by probing those and looking at the distribution of the population within those outcomes, I think you can get insights into the process by which people are sorted into that. The the primary process component that I think about is the way that people that way that people are embedded into a point in that latent space. So what is the preference solicitation mechanism that that you want to work out with users and how do you like the the problem I think about a lot is overcoming the incentives"
      },
      {
        "speaker": "Speaker 2",
        "start": 840.0,
        "end": 840.0,
        "transcript": "around"
      },
      {
        "speaker": "Speaker 3",
        "start": 855.0,
        "end": 855.0,
        "transcript": "effectively learning people's preferences despite the fact that the short term incentives for that are really, quite weak. And, in order for this system to work, you have to have effective agency and control over what your point is. And I think about that through my best answer for that right now is paid surveys that match demographic measurements of the of the user population. So So you actually find a subset that you survey and pay to actually go through a preference elicitation process, and then that can serve as a type of demographic or regional default for setting the prior on how someone sorted into those different latent points."
      },
      {
        "speaker": "Speaker 7",
        "start": 870.0,
        "end": 870.0,
        "transcript": "Thanks. Yeah. I think this community in particular is really interested in that question of, like, the perceived legitimacy around these types of user preference schemes?"
      },
      {
        "speaker": "Speaker 3",
        "start": 885.0,
        "end": 885.0,
        "transcript": "Yeah. I I would love to talk more around that or or if there are suggestions that that people would have to to follow-up, that would be really interesting for me."
      },
      {
        "speaker": "Speaker 1",
        "start": 900.0,
        "end": 900.0,
        "transcript": "Great. Just as a last thought, Bobby mentioned something about about working in industry in a metric provider with the tantalizing possibility of quitting next month. Do you have any final thoughts based on this on this discussion and and your experience? Or, from Bobby and"
      },
      {
        "speaker": "Speaker 5",
        "start": 915.0,
        "end": 915.0,
        "transcript": "Yeah. Maybe I wanted to jump in on one quick question. In when in your, slide affording for external metrics, there was no relationship between the user population and the metrics provider?"
      },
      {
        "speaker": "Speaker 3",
        "start": 930.0,
        "end": 930.0,
        "transcript": "There was. My computer just decided to have a rainbow community at death right before that. Okay. Okay. That bottom leg is talking about sort of the incentives for metric providers should create incentives for them to build trust with populations that they serve, and I think that's a really important component of the system that's just missing right now. And going back, you can have data sharing relationships that leverage that trust with metric providers. So you might have other data that's actually not present on the, on the platform itself that you could share with the metric providers that they would have access to in in order to better match their metrics."
      },
      {
        "speaker": "Speaker 5",
        "start": 945.0,
        "end": 945.0,
        "transcript": "Thanks. Yeah. I'll look to dig into."
      },
      {
        "speaker": "Speaker 1",
        "start": 960.0,
        "end": 960.0,
        "transcript": "Thank you. Okay. Well, thank you, so much, Dylan. This is a really good discussion. We have a lot more questions and and comments in the chat than we were able to get to. Thanks all for your patience in that. Before we wrap up, please, take a moment in a when I count to three or from three to unmute and share some applause and gratitude, to Dylan for for joining us today. So three, two, one. Go."
      },
      {
        "speaker": "Speaker 2",
        "start": 975.0,
        "end": 975.0,
        "transcript": "Thank"
      },
      {
        "speaker": "Speaker 3",
        "start": 990.0,
        "end": 990.0,
        "transcript": "you. Thank you all, and this is a great end of talk, Ritu. I'll probably steal this in"
      },
      {
        "speaker": "Speaker 2",
        "start": 1005.0,
        "end": 1005.0,
        "transcript": "my"
      },
      {
        "speaker": "Speaker 3",
        "start": 1020.0,
        "end": 1020.0,
        "transcript": "next one."
      }
    ],
    "summary": null
  }
}