Tech Talk: Talking Tech with Umang Bhatt on Algorithmic Resignation

CDT Tech Talks | 2025-01-13 | 30:43

In today’s episode, we tackle a fascinating question: What happens when an AI system deployed by a company decides to "resign"—stopping its recommendations or restricting access to its outputs? Can such actions help mitigate reputational or legal risks for organizations? To help us explore this, we’re joined by Dr. Umang Bhatt, Assistant Professor and Faculty Fellow at the Center for Data Science at New York University, CDT Non-Resident Fellow, and co-author of the paper When Should Algorithms Resign?: A Proposal for AI Governance, which delves into this thought-provoking concept.

Top Keywords

resignation 0.017
uncertainty 0.009
system 0.009
systems 0.008
individuals 0.007
algorithmic resignation 0.007
might 0.006
good 0.006
context 0.006
human 0.005
hurricane 0.005
work 0.005

Transcript

Export JSON

Speaker 0 0:00 – 0:10

Welcome to CDT's Tech Talk, where we dish on tech and Internet policy while also explaining what these policies mean to our daily lives. I'm Jamal Magby, and it's time to talk tech.

Speaker 1 0:20 – 0:22

Welcome to Tech Talk by

Speaker 0 0:23 – 1:13

CT. T. What happens when a company deploys an AI system and that system decides to step back and stop providing recommendations? What happens when an AI system restricts access to outputs, and can doing this help prevent organizations from potential reputational or legal risk? Here to help us unpack these questions is doctor Umang Bhatt, assistant professor and faculty fellow at the Center for Data Science at New York University, CDT nonresident fellow, and coauthor of the paper, When Should Algorithms Resign? A Proposal for AI Governance, which explores this groundbreaking concept. Thanks so much for joining us today. It's a pleasure to be here. First thing, I wanna say congratulations on your on your paper, and thank you so much for being a nonresident fellow. Your paper talks about algorithmic resignation.

Speaker 1 1:13 – 3:30

And I'd like to hear from you, what does that exactly mean? So I guess it makes sense to, like, give a little bit of a backstory as to, like, how we even got here. The thing I love about being involved with the Center for Democracy and Technology and CDT generally is that I have most of my training is in machine learning and AI. And one of the themes that arises, especially in the area where I'd done my PhD group that I was in mostly focused on, uncertainty quantification, Bayesian machine learning more generally. And their help themselves is saying that we know when predictions and models fail because we have calibrated estimates of when systems are uncertain. And if you have if you're, like, kind of armed with that information, you might decide to do clever things like potentially defer prediction to humans. K? So this is something that was there. It's kind of existed in the statistics community, in the Bayesian machine learning community for a while. But it I never found it in the public discourse around responsible AI, where we kind of think about empowering individuals, providing them with agency, making sure that we adhere to fairness principles and transparency principles more generally. But we weren't, like, acknowledging that, like, these systems are inherently uncertain. And organizations, individuals, potentially, will would change their decision making were they exposed to that uncertain information. And we started thinking about how can we operationalize this this idea of what's known as, like, selective classifications, only doing classifications in certain parts of input space or notions of, like, deferring to human judgment. What would it look like for us to communicate that to, organizations more generally? And that's kind of where we birthed the idea of algorithmic resignation. Algorithmic resignation really lets individual organizations consider when they decide not to use AI systems. It might be some function of the uncertainty in the system itself. That might be some function of the individuals who are being supported or just a function of the policies that are in the space you're in. Right? I think there's like, we you don't need to get fancy about this. There are, like, explicit rules in certain areas where you can't use these systems in these contexts. So that's a little bit about about resignation and kind of how we got into this line of work.

Speaker 0 3:30 – 3:37

Can you help me understand what's the difference between resignation and just turning the algorithm off?

Speaker 1 3:38 – 5:39

I think the way I like to think about it is is resignation in some sense, and and this, again, this deferral literature is a cleverer variant of turning off this system entirely. Because, really, it's answering the question of when I do this. Right? And and if we're simple I I think this has been touted in the AI safety community as a kill switch and off button, something that, like, turns off the AI system when it goes rogue. But on a more basic level, on a day to day level, when do we adaptively decide to use specific AI systems and not to use AI systems? And can we get deliberate and precise about it? And that's why I'm, like, pretty excited about this because it lets us get pretty granular. It lets us say, actually, you we know you're pretty good at this. We want you to be able to struggle through this writing assignment by yourself unaided. Right? We want you to be able to, like, make pasta sauce without a Tesla robot in your kitchen. Like, you you enjoy doing this. This brings you joy. And and so it provides us with protections on a micro level, which gets us pretty beautiful macro gains. Now there are certain contexts where you, like, you would definitely consider, like, completely barring access, barring usage entirely. You can think about, like, ideas of, like, for certain individuals, there might be no condition under which it's reasonable for us to actually let you use an AI system. So I like to think about it like you might need to take certain exams or have certain qualifications before you're allowed to, make give some sort of financial advice or trade particular stocks. Like, there's we have regimented, security clearance levels or certifications that are required before you get to that point. And if we kind of invert resignation on its head, it's basically the same thing. It's saying that, like, there might be certain areas on a micro level where we're not letting you play ball with us yet. Yeah. But there are certain areas where you are permitted, and there's, like, a potential form of recourse there. So that's a little bit about how it differs from turning turning off the system,

Speaker 0 5:40 – 5:59

entirely. Your paper emphasizes how algorithmic resignation can help organizations navigate rep reputational and legal challenges. I'd like to hear a little bit more about how this concept aligns with or anticipates frameworks like, for example, the EU's AI app.

Speaker 1 6:00 – 12:32

Yeah. I think this is this is something that, like, again, because we're now instead of putting on the hat of, like, a machine learning engineer who's just interested in not giving uncertain predictions to individuals because, effectively, uncertainty quantification tells us, okay. Well, I'm, like, 10% confident in this particular answer that I'm giving you, which basically means 90% of the time, we're expecting to be wrong. And sometimes it might be correct and you might move on, but 90% of the time, it's not gonna work. That's what this means. So what do we do with that information? Do we communicate that to you? And and there are particular there's lots the literature kind of varies in this context because communicating that information kind of shows the, integrity of an organization saying, like, look. We we know we're not perfect, and we're telling you upfront we're not perfect. Right? And maybe you're in an unregulated regime, so you're not with, like, the FAA where there are really strict guidelines on, for for airplanes, and in particular, safety requirements that are put in place. So you're kind of in the wild, wild west of the consumer world. Yeah. Where communicating this information to you might not do anything. They could tell you, yeah, right, about 10% of the time. And you're like, oh, cool. Doesn't matter. I didn't even know. And and you kind of run with it. Right? So Yeah. There's an interesting question around how this our instantiation of algorithmic resignation connects to a larger discourse around human oversight. And what role do users, decision makers have in the context of deploying and using AI systems. Right? I think I've the thing I like to think about and and I say this all the time to to my students and and to when when I give talks, it's like, I know we'll build pretty powerful AI systems. Mhmm. And we all kind of are kind of in awe at the progress and capability that we've seen in the last few years. Oh, yeah. But it's not clear that we know when to use them. We're not using them cleverly. Right? We haven't built the roads to put our cars on. We haven't built the tar that'll then rent these cars run fast. And the question is, what was how do we put, again, protections in place? Like, what does the crosswalk look like that lets humans still cross the road when we have cars running around? Right? What happens what when do you introduce traffic lights of some sort? And so resignation starts to instantiate this conversation, and the EUAI Act also has a similar mechanism in some sense. Now, again, the EUAI Act applies more generally, or more specifically to specific use cases, but the ethos that's presented in much of the, much of this piece of regulation, kind of applies more generally. So in article 14 of the AI act, it talks about human oversight. Like, what role do humans have? And and and how do they integrate this information into them, into their into their thinking? So I think the our discussion around, algorithmic resignation kind of starts to instantiate this. It's effectively saying, okay. Well, what does oversight look like? Because oversight ought to prevent the misuse of AI systems. And we're basically saying, well, when do we throw up our hands and say, don't use this AI system because I know it doesn't work or you're already very good at this or it's cost inefficient for an organization to waste queries on an individual who's an expert on this particular question. Right? Like, by if you have a PhD in mathematics and you're asking what two plus two is just to is that something you're gonna need? And and and we've had some work on this that there are patterns that differ between experts using AI systems, large language models, and nonexperts using large language models. Particularly, we had a paper that appeared in the proceedings of the National Academy of Sciences, PNAS, that effectively said that if you interactively evaluate, AKA you just watch experts use these AI systems in practice, you'll see their usage patterns differ, which then suggests to us that that oversight patterns need to differ, which suggests to us that when they decide not to use it, AKA when they resign, the use of AI systems will also differ. And this becomes pretty important. And then just to kind of, get get concrete here in in the EUAI Act article 14, There is a very explicit statement that says again, connecting back to the ops button that we need human oversight. I'll, I'll read it pretty precisely if that's right. The high risk AI system must be provided to a deployer such that natural persons to whom human oversight is assigned, so people who are just using these systems, are enabled as appropriate and proportionate to understand the capabilities and limitations. So you're told when it works, when it doesn't work. Right? You're able to remain aware of the tendency to automatically rely. So this connects to some other work we've had, which which I'm happy to discuss on, like, well, if I don't just turn off the system entirely, I might communicate to you that, like, you're over relying, or I communicate to you that the system is bad. Right? So instead of just, a big brother +1 94 +1 984 as off button where an organization's, like, pulling the plug on your access. Right? And, like, you get I just think about, like, those cartoons where it's, like, denied access, denied access. We don't want that world. Right? We want a world where you might be communicated that you are over relying or that you're automatically relying or, like, kind of creating alertness checks for you along the way. That's kind of what that particular article suggests. Along with correctly interpreting a high risk AI systems output, deciding when not to use the AI system, very explicitly says this, and then when to intervene in the operation and interrupt the system through a stop button. So there is that stop button all the way at the end, but it's like that last safeguard of, like, human oversight, human it kind of safeguards human judgment and protects them protects it in in a very specific context, very nicely. So, yeah, I I think that this is how a lot of the ideas that we're we were thinking about when we're working on resignation, like, naturally connect with some of the, important work that's happening in the, regulatory community, which is why I've been very excited about this getting joining the CDT community because for us, I'm I don't have legal training. I'm not trained as a policymaker. I work I like working with them, understanding how some of the tools that we can build or some of the capabilities we can endow on these systems, how they can address explicitly some of these. As as you can tell, it's, like, pretty analogous the way I talk about it technically in the way it's literally written in legislation. It's a very different languages, but it feels Yeah. Like a translation of each other with some lossiness. So And I think this

Speaker 0 12:32 – 12:46

this kinda leads me to my next question because a key theme in the paper is the balance between human and algorithmic decision making. Yep. And I'd like us to keep exploring how you envision organizations determining

Speaker 1 12:46 – 25:33

when an AI should resign and when a human should step up? Yeah. This is a tough question. This is a tough question, Nick, and requires taking a stance on on what it what it looked like. I I think this is hard because the thing that we we have we have another work that was very recently accepted that kind of starts to actually claim that this is actually a really personal question. And it's a question that access ought to be personalized. So So even two people with identical trainings or into identical pretty similar behavior will still require or want an AI system at different times. And and in different context, it'll require AI it'll require organizations to be precise about two different individuals potentially being provided AI systems for the same task at different times. And I think that this is counterintuitive to start. Right? Like, it it really requires being precise on what you care about optimizing for. And, if you care about performance, then maybe you decide to forego and not care about human judgment entirely, in which case maybe this discussion is actually not for you. Like, I guess you're you're interested not in you're not interested in in in maximizing human agency, and maybe there's certain certain context where that is true. Yeah. So you so you're kind of in this position where if you wanted to trade off how good your employees, your workers, your the people in your organization who you're trusting to make decisions, how they feel, how confident they are, how what what their self perception is of their competence on a particular task, trading off that with the competence on the task itself and the performance, like, accuracy as we'd say, is is kind of at the heart of this. Right? So I want to make sure that you feel good about being an employee at my organization. So, for example, if you're doing content moderation, this is a classic example, potentially, like, if you're just really good at moderating one specific type of content, do I just give you that all the time? Is that okay? Is that permissible? Is that humane? Right? If I just have you moderate like, you're incredible moderating the content that comes out of India, but you're atrocious at everything else. For some for some reason, let's just say. How how would that look? What would your does that degrade your self perception of competence? Maybe your performance is high, but I need to, like, potentially, like, resign the use of an AI system or, like, potentially, like, encourage some sort of diversity in the task, which requires, and maintains your high level of alertness. So it really depends on the the the things you care to optimize for. And, and I always encourage individuals and, like, companies that I that I talk to about this, like, be be more precise about it. And I and I think, like, there's lots of great smart people who've written about, like, the what companies optimize for Mhmm. Kind of is reflected in their performance in the markets. So if you're optimizing for, like, profit share, you might make harsh private equity cuts, and we've seen this in the airline industry very explicitly. Boeing is a great example of this. What would happen if you if you optimized manufacturing patterns to save costs? But if you cared about worker management and you cared about your employees, you may obliquely, inadvertently end up at the right at at profit maximizing outcome. And, it's counterintuitive, and I'm not one to opine on this. So this is, obviously, just from what I've read, this is not my research area, but there's natural as as you see, like, this, it connects with, like now we're getting into, like, more economic conversations where, like, John k has a, a an an economist who's written a book called Obliquity, which literally just talks about, like, inadvertent objectives. But it very naturally aligns with this question of resignation because it's kind of effectively saying, what do you care about? And if you care about performance, then make sure you don't provide the system when you potentially know the system is wrong and someone might over rely, or you know that this person will check out and start over relying. Because they're so good at this task. I can actually save costs and and just let them do this task by themselves and feel good about themselves. So you end up with this with this, with with these varying context specific, task specific examples of when you decide to actually provide AI assistance. Some of the work that we've done in this space effectively says, like, if you're watching people take answer queries, quiz questions is a classic large language model dataset called, MMLU, massively oh my god. Couldn't even tell you what MMLU stood for. It's massive multitask language understanding. I apologize. MMLU, if you look at the MMLU questions, like, multiple choice questions that span, like, 57 different subjects. So taking a few subjects and you watch the way individuals, human decision makers answer questions. Now a lot of the benchmarking in the large language model community will just be, like, how good are LLMs at doing this task, which I think is great and important if you care about capabilities. But, like, in practice, again, if we care about use, what we more care about what I care about more is what happens when I show individuals, you know, LLMs and then have it aid their decision making. And you'll see that some people might be pretty good, let's say, like, elementary level mathematics. Right? And other people might be good at, like, law questions or other people might be good at computer science questions or biology questions. So everybody has differing expertise. These are, like, elementary level questions. So each of us will be, will have our expertise, quote, unquote, will be in different on different categories. So now the question is if I decide to introduce an AI system to this context, and now I'm providing you with a large language model output, Mhmm. How do you integrate that into your decision making? Does it make someone who's already very good at computer science worse off? Does it make someone who's good at history, great at history? What ends up happening? And you'll see that almost everybody has one or two categories where they actually are better off not having access to the AI system. Alright. Almost almost every category. And it might be nonnegligibly so, and it might not be across accuracy. It might be across the way you perceive yourself and, like, how confident you are in your decision making. Mhmm. Because you the system might erode our confidence. Actually, tended to potentially erode our our confidence or, and it depends and it differs from across individuals. So what we find is that we can actually learn in a personalized fashion when when you're answering these questions over these various, like, schools school subjects, different people will have different policies for when they will need an LLM to help them do well at the task. And to me, this is kind of surprising insofar as it's it really suggests to us that, like, at the heart of this resignation discussion is, like, when how do I communicate to you that you are already really good at this, and you do not need decision support or AI assistance. Or I'm only providing you with AI assistance because you're not very good. And this led us to another line of work where we start to, like, communicate to you. Like, hey, bud. You're actually great at this. Keep going. Positive reinforcement. And you can imagine analogs, like, negative reinforcement. Actually, this system's, like Mhmm. Not great. Like, you're on your own, bud. And and and you can imagine, like, different styles of what we call frictions kind of taking taking a page from the behavioral econ literature from, like, Cass Sunstein and Rich Thaler's work on, on nudging. This is kind of huge in the in the context of, like, advertising for a long time and and still is. Introducing that in the large language model community and in the ad community more generally as a way of, like, potentially being a middle ground where you're maybe not resigning entirely, but you're, like, putting up frictions that are saying, like, red flag, red flag. Like, are you sure you wanna proceed? Are you sure you wanna proceed? This system's not very good. This system's not very good. You can still access the LLM and still see an existence. So these are design decisions that we expect, like, organizations to to to make alongside internal research teams and folks who are actually thinking about this. The good thing is most most machine learning engineers and and researchers in this space, like, will be familiar with how uncertain some of these large Genes models tend to be or or more generally, any AI system tends to be. So it's the the natural question is what do you do in the context when these systems aren't very good? And then they're they're they if they are uncertain, what do you how do you proceed? I I the one one thing I'll just mention before before I get some more questions from you if you have any more. But, yeah, the way in which we communicate uncertainty to individuals is really important. And I think there's not enough good literature in the modern AI era on this particular topic, especially in this era of large language models. Like, you don't query chat TBT and get a response. That's like, I don't really know what you're saying, but here's what I right? Like yeah. I mean, like, that's how we conversationally we talk amongst each other other like that. We communicate uncertainty, express doubt, and even visually, we tend to do this. So growing up, grew up on, in New Jersey, so we seldom had hurricanes. But I remember hurricane Sandy as a kid. Before we lost power for two weeks, we you you would be able to turn on the TV and see what the hurricane like, what what where was this hurricane gonna go? And and for those who've seen a hurricane plot, in some sense, it's like, here's where we here's where the eye of the hurricane is, and it's a conic region that kind of extends out towards where they expect landfall to be. Now Mhmm. Hurricane doesn't get bigger. That is the first thing every kid kind of I definitely went through that. Right? Like, I was not like, that's a that's a pretty big that's pretty big. There's something geospecifically wrong. So first things first, it doesn't get bigger. But we've expressed uncertainty over the models that we have of where this hurricane could go, and We've communicated to a populace without saying it explicitly. You're somewhere it's somewhere on the Eastern Seaboard. Sandy was terrible. And some other ones, we have pretty accurate models of, and we're kind of like, okay. We know it'll be in this region, and then there are lots of other string of other decisions that are made as a result of that information. And this is such a clever, clear, quiet way of communicating uncertainty. It is there's uncertainty in the school models. We don't know where this hurricane is gonna go. We think we know where it's gonna go. It should go somewhere in this 95% confidence interval near The Carolinas or near New Jersey, on the coast of Jersey. And we've communicated that to individuals, and then we kind of let them make their own decisions saying, okay. Like, you can choose to, prepare as you see fit. Right? Or there's there's a government intervention in that particular context. But, coming back to the conversation around uncertainty, in some sense, like, if we had vast amounts of uncertainty, you don't see hurricanes that are, like, the size of the Eastern Seaboard. We don't decide to show, like, estimates that are really bad. Right? So, like, the the the questions that that arise around uncertainty communication is that there's a time and place for us to provide models as assistance to individuals. There's a time and place for us to communicate uncertainty about those models to individuals, and then there's a time to say something like it's truly fact. Right? And then that's kind of like, you need to, leave Tampa, Florida. Yeah. You you can get out like that. There's no there's no uncertainty left. Like, leave now. And and so so you can imagine in in in a different organizational context, in that decision making process, it's like this is when you potentially will, decide to evacuate a particular town. There's some interesting work on, like, providing frictions in the context of, like, manufacturing settings, health care settings as well. A lot of the work that my group does is is focused on, like, human gene collaboration more generally in the algorithms that underpin it. And and and resignation has, like, interesting legal context, but also, like, it, it has, there's there's fun there's fun, interesting mechanisms to know when Jamal needs access to ChatGPT versus Umang to complete the same task, and that requires, like, immune modeling procedures. And will also require different uncertainty communication or different communication of information. Like, maybe you have one individual who respects the government so much where just saying that this has been approved by a particular federal body. Like, yeah, this is an FDA approved drug. Incredible. That's amazing. Like, people are just, like, right away. And then there's other people that for that exact same communication of fad, that will completely mean something completely different. Right? Like, it's the same piece of information, but you'll perceive it differently. It'll change your decision making in court in a in a different manner, and and this kind of becomes a part of our narrative around around helping organizations kind of understand when you should resign the use of an AI system in favor of human judgment. And, for some individuals, that might be never. For others individuals, it might be always.

Speaker 0 25:34 – 25:43

With those frictions built in, I are have you found that any industries are better suited to adopt, algorithmic resignation

Speaker 1 25:44 – 29:24

Yeah. Early on. Yeah. Or or what are those industries? Yeah. Great question. Great question. I I think that very explicitly, like, in highly regulated industries, we I'd I'd expect this to be start use like, this will be used, more soon. Mhmm. Specifically, it's like in in settings, like, in banking and and in finance, like, you don't you're not even seeing the large adoption of of AI systems in the context of, like, systems that affect their major product. Now, like, I'm not talking about your customer service bots. I'm not talking about, like, other other systems that you have. There's a tried and tested set of regulations that you need to go through with your model risk management team to get any AI system improved. And a lot of what's currently being built may not, be up to par with respect to that. But I do expect that the explicit prescription of where AI systems can and cannot be used will be attractive to people in these highly regulated industries, like finance, like health care. I think there's interesting, context around, like, potentially, like, national security and, defense, around, like, explicit prescriptions of, like, here's what we here's where we do here's the protocols we follow, and here's when we decide to, like, defer and say no. This is, like, the judgment that we're going to to use instead. And in education, I I do think that, like, in in the classroom, we'll see, like, certain classrooms will permit the use of ChatTBT. Like, we've seen outright bans from schools. Is that correct? That's, like, an outright resignation. It is gone. That's obviously, like, that's, like, banning Wikipedia. That's, like, banning like, and now the way we viewed the Encyclopedia Britannica is how I imagine all of these two school kids today view Wikipedia. Right? Because they can query AI systems in the way that we are creating Wikipedia or when I was growing up. And I still remember as a school kid, people were saying, you're not allowed to use Wikipedia for this. But I was like, no. But this is pretty well. Good summary. What you just want me to go to the citations and then citations? So, like, it's that's that's fine. It's like it it does it's a citation finder in some sense. So an outright ban is a natural reaction. It is an ultimate form of resignation, but it's probably very inappropriate. Right? So, so, again, to be to be clear, like, these highly regulated industries in which there are specific context and as I go and, like, start to talk to individuals around this, we're kind of, like, putting together a catalog of, like, private sector use cases around AI. There's a lot of great work on public sector use of AI. There's inventories that OMB are starting to collect, and there's lots of other fun things that people are trying to do around understanding how agencies and the government uses AI, and it's happening pretty much in most governments around the world. But, really, how are private sector organizations using AI systems? And and can we understand what their governance structure is? And more importantly, for me, I'm curious about where they decide not to use those AI systems that you built. So you built something that's powerful, and you're and you I want to know very precisely when you decide not to. And so aside from those highly regulated industries, the the, education just feels right because we've seen those outright bans. Right? And we've seen them to be very public. Public again. Oh, no JTPT allowed in these classrooms because it doesn't it stifles writing, but maybe it doesn't. Does it help at editing maybe? Like, does it does it make you a better editor? Does it make us does it permit us to get to do fact finding better? Does it, or at least kind of help us ID it in some sense? Or is it like mode finding? And we don't necessarily know, but this is a context in in a space where I expect us to see more clever, precise characterizations and policies of where AI should and should not be used. Again, it'll always come with that prescription. Anytime you see both of those things, that's where I start to think of, like, my light bulb goes off of, like, oh, like, you're just effectively instantiating,

Speaker 0 29:25 – 29:47

the resignation of an algorithm in favor of whatever human judgment or whatever process you've been using for a long time. Before we wrap, this has been a fantastic conversation. So thank you so much. And I just wanna I just want you to share where listeners can learn more about you and your work and the work of your group, and where can they find you. Yeah. So you can find me on, lots of the all the all the normal social media.

Speaker 1 29:49 – 30:32

And most active on what was formerly known as Twitter, Blue Sky, and, and and you can reach me via email. I my my group is currently based at, New York University, and we have lots of fun projects going on, which we're always excited to to talk to people about. And, I have my website, which I imagine, Nimmel, will link, as well. So thank you very much for your time. I appreciate, CDT for all the great work they do to, entertain academics like us to to hear our our crazy blue sky ideas and see if they make any sense to the policy community, to tech policy community, or just in general.

Speaker 0 30:32 – 30:42

So, thanks very much for your time, Joel. Of course. Thank you. And it's been a pleasure. And and thank you again, and, we look forward to having you you back on soon. Yeah. Looking forward.

Speaker 1 30:42 – 30:43

Thanks.