{
  "metadata": {
    "transaction_key": null,
    "request_id": "metagov:a-rule-based-approach-to-mitigating-ai-risk-miyazono",
    "sha256": null,
    "created": "2025-10-27T23:37:59.752732+00:00",
    "duration": null,
    "channels": 1,
    "models": [
      "metagov-manual"
    ],
    "model_info": {
      "metagov-manual": {
        "name": "metagov-manual",
        "version": "2025-10-01",
        "arch": "manual"
      }
    },
    "warnings": null,
    "summary_info": null
  },
  "results": {
    "channels": [],
    "utterances": [
      {
        "speaker": "Speaker 1",
        "start": 0.0,
        "end": 0.0,
        "transcript": "Alright. Cool. Liz, take it away. Okay. Thank you. And today, we're grateful that Evan Miyazono has come to be our seminar guest presenter. Evan is, I believe, Medigov's newest research director. So welcome. Thank you for coming. It's been incredible to be in conversation with you the past few months, and I personally need to know more about your work on governing AI with rules, not values. It's also funny because I work a lot on values. So I'm very interested to hear how do you do this, and why are you quoting Alexander Hamilton from the Federalist Papers? Thank you so much for being here, and over to you. Thanks."
      },
      {
        "speaker": "Speaker 2",
        "start": 15.0,
        "end": 15.0,
        "transcript": "Well, thank you, Liz, and thank you so much for having me. I'm very excited for this conversation. I expect it to be much more of a conversation. Please do interrupt. Like, the last I don't know how many slides. Last, like, eight slides at least are, like, basically just prompts for conversation or, like, visualizations for things that I will I, like, want to explore with you rather than, like, me talking. So expected this will be a little bit of, like, me contact staying or providing context. You can interrupt whenever. And, there will also be a bunch of things that I am doing that are not covered in the slides. And so I might, like, drop breadcrumbs at that and a and then say, like, if you're interested in that, let's talk more about it offline. For instance, I don't think I have anything on in these slides about my interests in having chips that disable themselves if you do disallowed workloads on or if you try if or rather, chips that shut off in software if you try and get them to do something that's against the policy and that that disable themselves in hardware if you try to remove that module. That is a thing that I am actively trying to push forward. Nothing on that in in here. And I would expect that people in this group may bulk at the centralization of power that that may result in, and I may appeal to authority and point out that Vatalika did say that compute governance restrictions seem like the least unlibertarian way to achieve safe AGI. Happy to provide that that quote later if useful. So yes. If the treadmill is too distracting for anyone, let me know. The there we go. I also try to be very quick with the links. The as of, like, brief background, my academic experience is materials engineering and experimental physics. Rather quickly pivoted after leaving grad school to join protocol labs. I was the first PhD hire there. I built out the research program generally, all of the, like, what how do we evaluate higher and, like, scale internal research? Didn't do much of the, like, direct to research man researcher management, but also built out the grants, adviserships, sponsorships, all of that. And that included at some point asking what is the best research program look like? That is more better science, faster, and cheaper. The first, like, faster and cheaper are very quantitative. The other two are very subjective. You need tools to elicit preferences and aggregate those, and I started really getting into a lot of social choice theory stuff. And then you have to build mechanisms to try to elicit proposals and direct researchers in those directions. So it also include a lot of, like, weird game theoretic type of things. So that became kind of like a venture studio to support public goods, like, support funding open sore new mechanisms for funding open source software as a generalization of funding research as a public good. And two years ago, I left all of that to try to make actual progress against AI. So more things that I will not talk about anymore in this conversation, but would be happy to go into more depth To the specific question of why am I quoting Federalist Papers is that, one, I really like quoting Federalist Papers. The Federalist Papers, I think they have a lot of good insights that people who don't think about the social side of sociotechnical problems benefit from exposure to a lot of the ideas that I hear from the, like, decentralized governance side are often just restatements of reinventions of things that have been discussed, which is unsurprising to everyone in this group. But part of the reason why it was why that quote was there is that this blog post was elicited slash commissioned by the Cosmos Institute who very much like leveraging or building on the shoulders of the philosophical giants of relevant fields. So you're welcome to take a look at this later. This is gonna be mostly talking through a lot of the content there you know, like, structurally very different way. Starting off with a question of what does democratic regulation of AI look like, I think that this is an interesting narrowing of the question of, like, what is how do how does democracy and AI intersect? And I claim that for the most part, regulation is trying to prevent catastrophic outcomes or and to some like, maintain the the existence of public goods. I find people saying things like this model won't self replicate. We try to be controlled conditions to look a lot like this railing is totally safe. Look how hard I can push on it. And I don't think this is how we build railings now. I think that we have quantifiable estimates of what safe what safe properties look like, how those properties will be battle tested in the world. And we really probably shouldn't have a whole lot of railings going up with, like, random heuristics. I did look up the story of, right, it seems like it may be apocryphal that every bridge builder in the Roman Empire had to stand under it while Legion marched over it to show that it was safe. It would be nice if something like that were true, but it it would kind of be nicer if we had something some some similar level of confidence in systems that we are building, particularly software systems we're building now. I also like to distinguish between helpful warning labels where you see it. You kind of get a very clear impression of what it's saying, or you can pretty quickly evaluate, like, how relevant is this to me? How much do I need to understand this in the case of, like, if you see a complex warning label and you can just avoid the thing and you don't have to deal with it, you'll probably just see that and say, like, okay. Alright. This is this is beyond my risk threshold. I would contrast this with for anyone who's been to California, these warning labels as very unhelpful warning labels that are required by law, and they basically go on anything that you could plausibly consider a cancer risk in California. And so here they are on a package of bamboo skewers and sitting in a coffee shop, no clear indication of what it is that is exposing you to cancer risks. It's just there. It has to be there. Welcome to California. You can categorize this however you like. ChatGPT can make mistakes. Check important info. This feels to me a lot like this bridge doesn't hold infinite weight. Check before you drive over it with any load. The question becomes how valuable is it to even have this tool from a societal perspective if they've just abdicated all all liability, I think that there's, like, a a real caveat emptor moment here, and it would be really nice to move beyond that. And so I think that having something that lets us move beyond that almost technologically feels potentially, like, arguably necessary and sufficient for progress. This does kind of feed into then what the question of democratic control looks like. I think a lot of people I expect a lot of people here to say, constitutional AI. This is great. Like, this exists. Humans get together, write a yeah. Sargav's air quotes indicate that he is all also on the same page with I as I am about the potential or the the risks and flaws here. I like in constitutional AI. You have a bunch of people get together, write a bunch of principles, and then you have the AI system follow those principles. But, like, what, in my opinion, this looks like if you actually map it onto the legal system is humans as constitution authors, AI as defendant, AI as prosecutor, AI as judge. This doesn't feel like control because people aren't checking those AI outcomes outputs against the constitution. It's an AI that's doing that. So, like, this this is not control. This is, like, gesturing at control. And I also really appreciate the metaphor of, like, AI isn't construct acted. It's grown. And if you personify AI, what we're currently doing is a lot like sending a like, raising a child on the unadulterated Internet for some number of years and then putting them in a finishing school car called RLHF and expecting that they will be fully functioning members of society. I think that this is I think that there are many things that should be done about this, but one of them is maybe starting to consider the insufficiency of AI alignment as a goal. And, actually sorry. My throat is doing something weird. Oh my sorry for my voice getting odd occasionally. Michael Nielsen actually posted something to his notebook very recently on I think it was titled the insufficiency of alignment. Oh, reconsidering alignment as a goal. Here it is. And kind of echoes a lot of these general concerns. One of the things that I would highlight is growing interest in the field of AI control, which is this notion that instead of saying we're gonna take an AI system that doesn't want the things that I want and make it a system that does want the things that I want, let's instead make a box that we trust and trust the things that come out of the box because we trust the box. This is the don't make uranium safe. Make enough safeguards in your generator so that you can trust the generator even though we didn't make the uranium safe. So in this context, there's a really great paper that I have down here. Yeah. Control improving safety despite intentional subversion where you can show that if you trust a weaker model, but you don't trust a more powerful model, that you can have some confidence some, like, heuristic, empirical confidence that the more powerful system is not deceiving you because you can play some games, build some, like, cool mechanisms that can give you control in this very high dimensional space of generating software. There's also, in a much lower dimensional space, this notion of shielding from reinforcement learning where you can prune out unsafe actions in this very low dimensional, like, did the autonomous robot crash into the wall? And in this system, you can very it's low dimensional enough that you can, like, specify formally and objectively, this is what unsafe looks like. And then you can generate a computational proof that the system will never enter the unsafe region, will never do an unsafe thing. And I think that the the challenge here is that we have both these very we have these very high dimensional approximate, like, one or two nines guarantees, and then we have these low dimensional mathematically grounded, provable guarantees to the extent that your model is correct. This that being your model, the world is correct. So the question becomes, can you kind of bridge these? Can you create a hybrid that is the very high dimensional very high dimensional system like code or English language, but very quantifiable guarantees? And of this, like, general summary slide, I would say that if you are looking at something that is very example driven, treat others as you wanna be treated. This seems to be the general alignment perspective, which is very different from something that is much more rule based, don't make classmates cry. On this side, I think that we agree that this is neither necessary nor sufficient, but it correlates really strongly with and is very objective when it comes to evaluating whether or not someone has done something has has one one of the children is misbehaving, which I think is very different from something that is much more subjective, but would be, if executed well, very if executed well would be great, but it's very hard to convince other people that you are acting in a way where you are treating others as you wanna be treated. I think that the like, a useful note is this notion of are we doing optimization or constraints on when we have instructions we're giving AI systems? So the question being, if you want an AI system to do something, but the company wants the opposite, do those preferences cancel each other out? That seems to be in many ways how this current systems work where if you could explain well enough why it should give you the recipe for a drug or you can find some some loopholes, then you can cancel things out. But if there is a way to specify, do not go outside this bound, then something that looks like an international policy that may have similar overlap with a national or a personal policy, you can end all of those together. And you can satisfy the anded set of constraints where only things that are safe by all standards are allowed through. And I feel like this matches the way we do governance currently much more than this kind of summing together value metrics or preferences. I also think this political structure is an interesting shout out for this group because in a world where you're saying don't like, we've agreed as a society, don't do these sets of things. Democracies and institutions that are very good at setting the like, deciding, deliberating, and outputting policies. And if we can make those policies formalized to be sufficiently scalable across AI outputs, which I haven't explained how yet but will, that feels like it is a natural extension of democracy. Whereas going to something where we take we align a system to an individual, we say, like, here is this person or this groups or, like, an extrapolated set of values. We are going to put those values into a thing and then we are just going to trust it and propagate it. That feels much more authoritarian. Zargan, that seems long for me to pause to read. Do you wanna just talk through it?"
      },
      {
        "speaker": "Speaker 3",
        "start": 30.0,
        "end": 30.0,
        "transcript": "I mean, I can say it was now I'm interrupting. I was trying not to interrupt you. I was just saying that the rules based safety is something that you assert through, basically, low level specifications, and that values based safety is more like what we want from the emergent properties. And that going back to your, like, examples of, like, transportation systems and such, the the cyber physical systems in the world that are like that that become a high dimensional and entangled, Basically, we aim to achieve both. But to your point, you have to have the the rules based specifications for the components in order to reliably relate their behavior to the emergent behaviors which do or don't satisfy the values."
      },
      {
        "speaker": "Speaker 2",
        "start": 45.0,
        "end": 45.0,
        "transcript": "Yeah. I definitely agree. And I will also add that I may be framing this more as a either or, and I much more believe in a, like, let's just do all let let's do all the things to the extent we can, trying to reduce risk through values, doing that better is good, and we should also, try to have rule or constraint based, structures as well."
      },
      {
        "speaker": "Speaker 4",
        "start": 60.0,
        "end": 60.0,
        "transcript": "And just interrupting, are you saying that the emergent values should originate in the rule based structure and be an extension of it?"
      },
      {
        "speaker": "Speaker 2",
        "start": 75.0,
        "end": 75.0,
        "transcript": "I think that the I think they should be separate where we should put rules in place and require that there's evidence that actions do not break rules."
      },
      {
        "speaker": "Speaker 4",
        "start": 90.0,
        "end": 90.0,
        "transcript": "No. I was actually more asking what Zargan was saying. What I what I'm what I'm asking is in the case where you're not doing that and you are using a generative process to create behavior, you should still look at your rules based stuff as context, so to still get it vaguely right even when you don't have a rule."
      },
      {
        "speaker": "Speaker 3",
        "start": 105.0,
        "end": 105.0,
        "transcript": "Yeah. I mean so there there's a there's a complementarity between what rules produce in terms of constraints and the degrees of freedom that they leave open. And so, inevitably, there's a generative kind of nondeterministic component in in these systems. And so I I don't wanna derail. So I'll just note that I have some some ongoing work that I know Evan has read and Alon, who's on the call, contributed to on the relationship between rules and institutions where protocols in particular, which are in the rules bound category and institutions which are closer to stable emergent patterns of behavior. And to Kevin's point, like, we need both. Like, we we have to figure out how to make these frameworks fit together in order to get safe infrastructures. Anyway, I'll stop."
      },
      {
        "speaker": "Speaker 2",
        "start": 120.0,
        "end": 120.0,
        "transcript": "Pointing a little bit more at why value based alignment might be insufficient, I would say that if well, I guess, as a, like, toy example, I think there are a lot of people who work on AI alignment where if you said, I have a protocol by which I can put a person into a box with the AI system, and the AI system will come out or, like, the person will emerge after, like, spending eighty hours with it or, like, eight hundred hours with the thing, and the AI system will will understand and pursue goals in exactly the way that person would. People would say, like, cool. We're done. We can pack it up and go home. And I would expect people in this group to say, like, that is maybe half the problem. Expanding that to, like, how do we get all of humanity to be represented in a meaningful way? How do we define what that means? Like, if you if each of these machines costs like, if if that booth could be built and it could be built for, like, $5,000,000. This is not a solution to what we would consider the problem partly because what you want today isn't what you want tomorrow. Unclear which one you align to. And if it's not actually even your values, it's, like, OpenAI's approximation of your personal values, then anything that's on too long a time horizon is going to feel paternal paternalizing or, sorry, patronizing. That's the word. Anything on too long time horizon is gonna feel patronizing. Anything on too short a time horizon is gonna be the, like, engagement engine click baity type stuff. Unclear how you split that difference with alignment alone. It's also unclear how you merge values across a group. I think that if we have I guess, one a fun example that I use here is if you assume that every piece of software can be sufficiently generated by an a AI system aligned with societal values and it can make all the right design decisions because it's aligned with societal values, that means that every time an EM and a PM have disagreed that one of them was misaligned with societal values. I think that there are, like, real design choices and trade offs and world models that can differ validly and that alignment kind of assumes the nonexistence of those or it assumes that one of those is right, which seems strong. Also, there is this note that values are steeped in ideology. And if you align an AI system with your values, then anyone who doesn't share your ideology is going to consider that AI system an existential threat. If you have something where I I in the next few slides, I will present a system where, in theory, you should feel equally comfortable with a system that is built by a company that you do not trust at all as you would one from a company you trust. So I like to anchor this in this notion that reviewing outputs is hard as things get increasingly complex. Typically, you generate an output, and then you have to review that output. And then you regenerate the thing because you didn't get what you want. And this is, like, roughly how things work with, like, an internal employee where you try and align them with your values over time. You get them believing in the mission. You get them to do things your the the way your company does them. This is very different from how you work with an external contractor where you say these are the requirements. And from those requirements, a solution is developed, and you do validation and verification. And all of the engineers are thinking like Evan doesn't I know more about BNB than Evan does and you're right. But the interesting thing about this is that I think this scales very well to AI systems if you can write down the requirements in a way that is objectively checkable. Because in that case, this loop becomes you get a tool that helps you express specifications. You get a tool that helps you do validation and understand if you specify the thing you wanted. And, actually, I should probably pull this up because I didn't include it in these slides because I forgot. But I I like this notion as a so this is just straight from that blog post. There is what you want, what you asked for, your model of the world, and how reality really works. And these are there we go. There are gaps between these."
      },
      {
        "speaker": "Speaker 1",
        "start": 135.0,
        "end": 135.0,
        "transcript": "Formal"
      },
      {
        "speaker": "Speaker 2",
        "start": 150.0,
        "end": 150.0,
        "transcript": "verification closes this gap between the spec and the model, but you need validation, that first loop I had, to yeah. Zee will definitely shop chat more. You can close this this gap with something like formal verification. This gap, you can only improve by making your moral model more explicit or being flexible on what model you use. And this, you require validation to be able to reduce The but my hope would be that you could once once you generate a sufficiently good set of requirements, you could have AI based development and verification that the solution does indeed match the requirements. I claim that this should work, for specific domains across a lot of different domains. And there are also people who have started talking about what this should look like at a very broad scale. What would it look like for human level AI systems to kind of unify their or to give to have a specification language that covered most, if not all, of the concerns that you would have and what would that AI architecture look like. Specifically, you should have a separable auditable world model. You should be able to create a specification in that, like, digital twin of the relevant system that states what safety means and have a verifier that creates a proof certificate that the safety region the specified safety region is not violated by is not exited by any proposed action. So this is a position paper from almost a year ago. I will note that, like, many people who come across this first look at this and say, like, this is a very modernist approach. Are you insane? Have you learned nothing? And I would say it's actually much more of a metamodern approach where it's acknowledging that all models are wrong, but we need tools to be able to find the gaps between models. It's better to have, like, I think also phrased well as plans are useless, but planning is everything. Models are useless, but modeling is everything. And oh, it did make it into the slide. So this was the the gap. Save that for later. So this paper has somewhat of a community around it. I organized a summit to try and bring people together to say, what is the next thing that people are doing? And it seems like there are next steps, Some of which include a project that I am undertaking that I'll share a bit more on. One of the projects represented here is an attempt to build the theory that will implement this. This is by a researcher who I used to manage. Managing him meant mostly taking diagrams like this and turning them into diagrams like this. But the general idea is to take to let stakeholders use AI powered tools for their expressing theories and requirements, putting them into something that looks like a GitHub for multiphysics simulation and turning them plugging them into a proof generated solution generator and getting a proof out that you can then check. And Atlas started off with this plan of let's try to make progress building specification languages for all of these things. The guaranteed safe AI framework is kind of like, we'll have one model and one specification language. And I think, really, we have a lot of specifications out in the world in a bunch of different forms. Could we use AI based tools to convert those into objective objective validatable specifications that formalize what those policies mean? And so for instance, in law, there's a project called Catala that was trying to formalize tax code. And the at scale. The cyber has a lot of things checked here because we have specification languages for expressing the specific properties you want with mathematical certainty and object objectivity. And the notion of formal verification is, like, let's take these programs. That's cool. Normalizing data use agreements, but would be excited to hear more about that. So okay. So I'm I think that there's enough conversation topics that I wanna kind of blast through some examples. Each has, like, a very narrow point that I wanna make. One is that example of preventing latent bias in classifiers. I think we all probably heard about horrible errors that happened in terms of classification. This one was, like, if you have women's on your resume because it was trained on it was classifier trained on existing resumes, the classifier downgraded you. You could take a bias classifier, and I would say move the review of is this classifier biased, which is an an objective decision or objective output from a subjective decision making process. You can move that subjective decision making process upstream of the action rather than downstream of the action. And you can do that, I think, with reasonably good rigor. So the key component missing here is something like an open source variation generator where given a resume, you create variants of that resume that toggle membership in relevant protected categories. And you feed each of those into the biased resume classifier and you average the result. This has the effect of essentially collapsing down what your variation generator or projecting down in, like, resume scoring space what the variation generator has stated are equivalence. This has to be open source because you need this to be a target of governance. People need to deliberate. Like, are these resumes actually equally good? But this starts making this something where it's a very it is a very transparent it gives you guarantees around transparency for an intrinsically nontransparent process because no one wants to share what their classifier is evaluating based on because that's gonna be some sort of competitive edge. But you don't need to get that, and you also don't need some crazy homomorphic encryption type evaluation here. You can have something that I think is better than what we have now, which arguably is a pretty low bar, but an important standard to try and beat. So"
      },
      {
        "speaker": "Speaker 3",
        "start": 165.0,
        "end": 165.0,
        "transcript": "Can I make a quick comment? You you've described"
      },
      {
        "speaker": "Speaker 2",
        "start": 180.0,
        "end": 180.0,
        "transcript": "what I"
      },
      {
        "speaker": "Speaker 3",
        "start": 195.0,
        "end": 195.0,
        "transcript": "would consider, like, a a pretty canonical robustness requirement. So, like, this is something that you could do in just about any problem where you have a a sensitivity that is undesirable, and you can define. In this case, it's bias is what that that that sensitivity is. Your variation generator plus, like, smoothing your results out is Yep. Like I think that to your meta language for standards, like, this pattern is maps nicely onto robust control and robust analysis. And anytime you have a decision making, apparatus or, like, a a control system that needs to be robust to this kind of, like, insensitive to something, you can do a version of what you described here. So it's it's a Yeah. It's just this example. That's actually something you should be able to apply over and over and over again."
      },
      {
        "speaker": "Speaker 2",
        "start": 210.0,
        "end": 210.0,
        "transcript": "That is that is precisely the point that I want people to take away from this. Steve, go ahead."
      },
      {
        "speaker": "Speaker 4",
        "start": 225.0,
        "end": 225.0,
        "transcript": "So I'm a little bit confused about the, the variation generation. So are you taking existing resumes and just altering them slightly by changing the race of the applicant or something like that?"
      },
      {
        "speaker": "Speaker 2",
        "start": 240.0,
        "end": 240.0,
        "transcript": "Yeah. The claim would be that for each protected category, can you have a system that, given a resume, makes a resume that is of equal quality, very subjective, of equal quality, to be determined by governance processes that toggles membership. So for, like, can you make one that looks equally good where the candidate is clearly much older or much younger? Could you make one where could you make a resume that has all the relevant indicators for membership in a religion or a different religion?"
      },
      {
        "speaker": "Speaker 4",
        "start": 255.0,
        "end": 255.0,
        "transcript": "Yeah. What I'm asking, I suppose, is are you trying to decorrelate it entirely from the original resume? In other words, aren't you just blanking the context and running it through the the classifier so you wouldn't have to totally, you know, decorrelate it and take all identifiers out? You could literally just, you know, take out white and put in black or whatever it is and then send it out. I don't understand the degree that it's being varied before it's, comparatively classified because you're you're com you're, comparing one to one. In other words, there's there's the original resume and then the varied resume, and then you're trying to see, on average, how much that causes a variance. Right?"
      },
      {
        "speaker": "Speaker 2",
        "start": 270.0,
        "end": 270.0,
        "transcript": "Yeah."
      },
      {
        "speaker": "Speaker 3",
        "start": 285.0,
        "end": 285.0,
        "transcript": "Quick point though. There's two things going on at the same time, maybe this will help, is if you have a governance over, like, say, a tolerance range, so, like, how much can the output vary as a function of a thing a single bit flip, basically. You say, okay. I want to say that the amount the outcome can change as a result of any record being flipped on a particular bit, say, race or gender. There's a kind of empirical, like, okay. We're gonna retrain until we pass those tests, but they're also back to Evan's point about, like, making things provable. There are certain techniques for training a classifier which can, like, make this not an issue. So underlying under the hood of classifiers is a solution to a convex optimization problem. You can constrain it. You can literally assert that feasible solutions are only those that are insensitive to a particular bit flip. It will get worse in terms of its performance, statistically speaking, and its loss function because, basically, anytime you can train an optimization problem, you make the objective function worse. But in the process"
      },
      {
        "speaker": "Speaker 4",
        "start": 300.0,
        "end": 300.0,
        "transcript": "You get hold on to somebody and then say, run over there. Okay. Right. Yeah."
      },
      {
        "speaker": "Speaker 3",
        "start": 315.0,
        "end": 315.0,
        "transcript": "No. But you get a strong guarantee. You get a technical affordance that they won't. So then when you run these tests, it passes all of them because you coded the constraint that you can't have a different outcome on on flipping a bit in the constraint space of the underlying optimization that solve the classifier. I actually had a long conversation with folks in the scientific computing community about, like, why the heck this isn't supported directly in, like, scikit learn when you're, like, building stuff. And the short answer was, like, nobody who's paying for development understands or cares. And that's a real problem in my opinion. But we could intervene in problems like this pretty directly if we have specifications and specification languages that are saying, hey. We expect insensitivity on these dimensions. You create an incentive to build into the tools the technical affordances to get those guarantees. And"
      },
      {
        "speaker": "Speaker 2",
        "start": 330.0,
        "end": 330.0,
        "transcript": "if, like, race is a radio button on the application, that is very feasible. And the reason that this open source variation generator is not absolutely trivial is because that could be the name. Well, also hard to find."
      },
      {
        "speaker": "Speaker 3",
        "start": 345.0,
        "end": 345.0,
        "transcript": "Regulatory issues where now because the approaches have been to ignore this stuff, we end up, like, blanking this stuff out, which unfortunately means that if you remove it, you can't use it to enforce the nonbias. And so you get this weird interaction effect with the way that risk and regulation work, where if you bring those variables into focus at all, you're, like, at risk of being considered biased when you actually need them in your dataset in order to assert these, like, technical assurances at the constraint enforcement level. And I realize we've gotten kind of technical, but the point is, like, if you actually shift the way people talk about and build this stuff, you can actually improve significantly on what I would consider the values margins, but you're doing it through a rules based approach, which in this case means literally coding directly into your classifiers the fact that you must be insensitive to a particular bit flip."
      },
      {
        "speaker": "Speaker 2",
        "start": 360.0,
        "end": 360.0,
        "transcript": "And also the fact that having a specification language or tools that are open and public and targets a debate, and it's not just three whitish guys on a Zoom call discussing this also helps. But that yes. The so another example I wanted to point to was that I I I strongly believe that having specification language that making it easy to use specifications for software is going to be easier than seeing Vibe coding through to its logical conclusion. And the kinds of things that I've been pitching are we should have tools in these gray arrows that move us from higher in the top left of this diagram where we have, like, I want a piece of software or I want a thing that does blank. And then tools help you make informed design decisions and work you down to formal descriptions of various pieces that generate for you verified implementations. This is kind of like a long term vision b where, like, you're you're making decisions, but the tools help you understand and make those decisions rather than requiring you to take the actions or yeah. And and this is like engineering management rather than engineering grunt work. It's like how do you how do you split how do you make AI systems to be better engineering contractors rather than be be the engineers? And so the the step that we're taking is we've built a tool that gives you natural language documentation mapped against a mechanized formal spec. And from that, the plan is to have annotations where if something is highlighted in red, it's because it's not represented from one side on the other side. And you can get AI since we're needed. This is, like, very early stages, but the goal would be to have something that made a jump for, like, a soft or for a a small software project to be able to go from, basically your I think what we're targeting is basically this arrow here. Can you go from component requirements and reference documentation to component architecture? And have my hope our hope is that by roughly October, we will be able to load a piece of natural language like, natural language spec, have a mechanized spec, and introduce an error to the mechanized spec, give it to an engineer who is familiar with the field and the work that is being specified, so here it would be cryptography, and have the engineer find the error in the mechanized spec or errors that we introduced with no formal methods experience. So that's kind of the, like, where I think that this should go. I had this I gave a previous call out to this injection of formalizing the tax code. I love pointing out that the tax code actually is mechanized out there. People don't realize it, and it wasn't really done with the collaboration of the lawmakers. But, like, there's a server that the IRS runs, and the numbers go in and numbers come out. And, like, I think that there are probably a lot of bugs and weird edge cases, and it would be really nice if we had tools so that lawmakers could actually just write tax code as, like, executable tax code. And I think that, like, the the the the why now on all of this is because there was a ton of work a few decades ago on building ontologies and languages and ways of specifying these sorts of things, but they were really hard to learn. And you could read them, but they were pretty hard to write in. And language models, I think, are making all of these things much easier and much easier to adopt. And so it's a matter of resurrecting the right things, strapping on the right prompts and context and just seeing how well things go. Or if they're not doing sufficiently well, building up the datasets, sending them to the Frontier Labs and getting the models to be better at those things. I think there's also, like, fun examples I could talk to of, like, what would it mean to have a verifiably compliant dark pool? I think that these are, like, things that you could conceivably build, but I think it'd be worth having a moment for, or, like, eight minutes, I guess, for questions and conversations. Sorry. That went longer than I expected."
      },
      {
        "speaker": "Speaker 1",
        "start": 375.0,
        "end": 375.0,
        "transcript": "Thank you so much, Evan."
      },
      {
        "speaker": "Speaker 5",
        "start": 390.0,
        "end": 390.0,
        "transcript": "Yeah. Super interesting talk, Evan. Thank you. So I guess my question maybe this is a bit facile, but hopefully it'll entertain me. Like, you know, the the idea of Asimov's laws of robotics was kind of speaking to this impossibility of rule based constraint and, you know, kind of like the Hofstadter style rules based AI, you know, all those projects that we kinda left behind in the seventies. Sounds like you're coming back around. I mean, how do you how do you engage with that? Really more cultural history rather than scientific history around the sort of the foibles and failures of rule based control for intelligent systems."
      },
      {
        "speaker": "Speaker 2",
        "start": 405.0,
        "end": 405.0,
        "transcript": "I think that people that I talk to now assume that it is not like, we kind of stopped claiming that rules are sufficient. And I think that, like, I dent I I would claim that they, like, something that looks more like defense in-depth have rules, have all the other techniques as well, and as many things as possible to prevent bad outcomes. I think that there is I guess one of the things that I would point to is this I feel like much of the, like, good old fashioned AI control was we'll have one ontology. We'll make it the best possible ontology. It will be sufficient it will be either right or sufficient. And moving to a framework in which can we move between models or have multiple check against multiple models in a way that reduces the likelihood that any one model missing a thing can lead to bad outcomes. I yeah. I'm curious if other people have thoughts. I Steve, I'm curate I'd love to hear your third way. I think also there's just an acknowledgment that, like, while those are poor ways of achieving capabilities, they might be good ways of constraining against bad outcomes."
      },
      {
        "speaker": "Speaker 4",
        "start": 420.0,
        "end": 420.0,
        "transcript": "Yeah. So as far as getting back to the whole idea of value alignment, yeah, I agree with all your critiques of it and so on and so forth. But I think there is a a better alternative in which we can essentially segregate ourselves into our true selves and our sort of aspirational selves, and we can put our professed preferences into a an AI agent that interacts with the world as an aspirational digital self. And these aspirational preferences are much easier to be aligned than, true human preferences. And I have a number of papers that I've written over the past six months elaborating all the aspects of this. I'm about to do a huge coming out unless wrong. I, based a lot of my this work on sort of a Steve, Alejandro's work, with foresight, so we have similar origin stories, but I've taken it in a different direction. So, yeah, I mean, I"
      },
      {
        "speaker": "Speaker 2",
        "start": 435.0,
        "end": 435.0,
        "transcript": "I'm very excited about it."
      },
      {
        "speaker": "Speaker 4",
        "start": 450.0,
        "end": 450.0,
        "transcript": "I did. I I did actually a community call on the digital twin concept and this idea of having these aspirational selves take over democracy, but with strict fiduciary duties to the individual. So it's a new way to aggregate things, not a constitutional AI at all, but an ongoing process which is refined over time and doesn't, I believe, have any of the problems that you elucidated under your critique of, you know, value based alignment."
      },
      {
        "speaker": "Speaker 2",
        "start": 465.0,
        "end": 465.0,
        "transcript": "Yeah."
      },
      {
        "speaker": "Speaker 4",
        "start": 480.0,
        "end": 480.0,
        "transcript": "So yeah. So it what we what we want today isn't what we want tomorrow that's thoroughly addressed by the, you know, changing aspirational nature of things, and you're always changing what you want your twin to represent out in the world. What I want is what we want. So there's all sorts of reproductive groups that, you know, bind together with very low various levels of subsidiarity so that it really isn't a problem. Daniel knows all about that type of stuff. And, values are steeped in ideology. That's true, but this is something that can be resolved over time. And once again, ultimately, the human's true values are not what are being tested here, which are always gonna be biased in every other damn thing. But rather can they refine their aspirational values over time and then aggregate those into a meaningful, coherent, functioning, Pareto Pareto positive interactions, essentially."
      },
      {
        "speaker": "Speaker 2",
        "start": 495.0,
        "end": 495.0,
        "transcript": "Anyway, I'm very excited to read more about that."
      },
      {
        "speaker": "Speaker 4",
        "start": 510.0,
        "end": 510.0,
        "transcript": "Okay. Well, excellent. We will communicate."
      },
      {
        "speaker": "Speaker 3",
        "start": 525.0,
        "end": 525.0,
        "transcript": "I I know we're gonna be at time in a second, so I'll just note that if people wanna do an extra, like, twenty minutes of jamming on these topics, I'm happy to hang out. I love Evan's work, and I've been following it closely and have lots more thoughts. And it seems like there might be a group that wants to hang out for a few minutes."
      },
      {
        "speaker": "Speaker 2",
        "start": 540.0,
        "end": 540.0,
        "transcript": "Yeah. I've got some time."
      },
      {
        "speaker": "Speaker 1",
        "start": 555.0,
        "end": 555.0,
        "transcript": "That's great. So what we'll do in that case is, ask Evan now for a concluding statement on his on what he shared today, and we'll wrap the recording at the top of the hour and just allow the discussion to continue."
      },
      {
        "speaker": "Speaker 2",
        "start": 570.0,
        "end": 570.0,
        "transcript": "I think concluding statement would be feel free to reach out with with things you'd like to with, like, questions, comments, possible applications, avenues for possible collaboration. Let me know if you come across people or problems that would benefit from specifications in this style. And I have unintentionally become kind of a shelling point for people interested in this general direction. So happy to happy to try and do matchmaking there. Oftentimes, people who are sufficiently proactive are the limiting factor rather than resources or problems or solutions. So if you find yourself being able to help with that, I would love to chat."
      },
      {
        "speaker": "Speaker 4",
        "start": 585.0,
        "end": 585.0,
        "transcript": "If I'm allowed a a quick one, is there any reason why didn't you include the premodern"
      },
      {
        "speaker": "Speaker 2",
        "start": 600.0,
        "end": 600.0,
        "transcript": "face? Lack of familiarity. I No. No. No. No. No. No. No. No. No. No. No. No. No. No. No"
      }
    ],
    "summary": null
  }
}