Zhu Metagov

Speaker 1 0:00 – 0:00

Awesome. So, thank you so much, Hadi, for joining us in our meta governance seminar. So Hadi is, a professor at Carnegie Mellon in the HCII and has been doing a lot of really awesome work on, AI and governance, and online communities such as Wikipedia. And, yeah, I will let you, have the floor, Hayid.

Speaker 2 0:15 – 0:15

Thank you, Amy. Let me share my screen.

Speaker 1 0:30 – 0:30

Yeah. So my screen? Yeah. I can see it.

Speaker 2 0:45 – 0:45

Okay. Great.

Speaker 1 1:00 – 1:00

So people will be kind of, like, leaving questions in the chat. This is kind of, like, the norm we've developed as the talk was on. And then after the after your presentation's over, we'll have, like, a, you know, twenty ish minute discussion about that.

Speaker 2 1:15 – 1:15

Okay. Okay. Great. Hi, everyone. My name is Hai Zhu. I am a assistant professor in the Human Computer Interaction Institute at CMU. It is my pleasure to speak here. So first of all, my I'm really interested in, tools, technologies have changed, how we connect, communicate, exchange, and work with each other. We have been seeing that these technologies are impacting the groups, communities, organizations, and societies in important and complex ways. So, I do research, in h c I social computing at a high level. I'm really interested in studying how these technologies and the groups, communities, and organizations are co evolving and transforming each other in the process. And how can we design better AI technologies to support people and groups, communities, and organizations. So, I guess at a high level, I do researchers in areas like community driven AI design, which is, I'm going to spend most of time in today's talk to talk about and also do some work in the future of work with AI and human interaction and ethical AI like developing toolkits to better help the practitioners to like device their, machine learning system, etcetera. So, so in today's talk, I'm going to focusing on the or I called it community driven AI design. I conduct, like a research on multiple domains and communities and trying to see how we can design AI systems that can benefit the community. So for example, I have been working on this AI supported content moderation. As we know, Wikipedia faces significant challenges in maintaining the quality of its content and the English Wikipedia alone receives about 160,000 new edits every day. And in today's talk, I can share some of our recent results on how we are helping Wikipedia communities to build, to redesign, or to create a better quality predictive systems and to help Wikipedia editors to better to mod moderate the content. And another domain I work on is like online mental health community and people with mental health problems are increasingly turning to online peer support communities for help instead of professional services because these peer support services are actually more much cheaper and more accessible. I work with Seven Cups, which is one of the largest like mental health community in the world and our work is to understand these people's experiencing these online peer mental health communities and design various AI tool systems like training chatbot and peer support environment. I also work on AI supported child protection in collaboration with Allegheny County. For example, we have been working with Allegheny County to redesign the algorithmic screening tool to aid the core screeners in making recommendations regarding further investigation. I also studied gig work, gig platforms, and the, like, gig economy as a whole with a goal of, designing, like, decision support, support tools to empower and enhance gig workers. So, one of the research so the focus is, yeah, how can we design, deploy, and evaluate these you know innovative AI tools that can better support the members in the community and improve the overall community well-being. So in today's talk, I guess I can, I will focus on the first one? So the AI supported, content moderation in the context of Wikipedia and it's, quality predictive system called Auris. And, feel free to ask any question. You can like unmute yourself or put your question in the chat. So, a bit of background as I mentioned Wikipedia they have a lot of new edits actually just made every day by various groups of editors of all over the world and quality control is one of the like major tasks faced by the Wikipedia community. And in the past few years, they have been developing and using these automatic like machine learning based system called objective revision evaluation system, ARIS. And the ARIS, so it has a predictive model which can generate these predictions on the edit quality whether they have a model to predict whether the incoming edits is damaging or not. They also have another model to predict whether it's a good face edit or not. They also have a separate models to predict the article quality. And these predict, these predictions have been incorporated in, like more than 30 tools and applications. For example, if you go to like recent changes on Wikipedia, you will see that there is actually a a filter embedded on the interface. And you can use that filter to identify like a damaging edit or non damaging edit, and that damaging score is provided by the ORS prediction model. And this ORS prediction model also incorporated in some of these editing tools like Haggle. Let me see. We have a question. So how would you care, we have a question about how would you characterize some other major tasks of the Wikipedia community, yeah, aside from quality control. Yeah. I think there is a very, good paper. I think it's called, wiki work. So what they did is they look into, like, all the bounce stars and they use that as a tool, the ways to understand the different kinds of work, that actually take place, different kinds of tasks take take place on Wikipedia. So, quality control or like abandoned fighting is one of the tasks, but there are also the other task like, conflict resolution. They have, like, community building. Of course, there is other tasks like editing and contributing the content like generating policies also like a yeah. I think you can maybe later I can find out that paper or some of you might know that paper. You can share the link on the chat. So there are a lot of yeah actually different kinds of tasks on the Wikipedia and yes quality control is just one of them. And let's see. Next one. So wait a minute. Where is my screen? Okay. Okay. So in the context of Wikipedia and ORS and we have conducted and we have some also have some ongoing work. And today I'm going to talk about this series of work we have been doing. So first is like we're trying to like understand the Wikipedia community stakeholders value for a quality predictive system like ARRIS. So it's an interview study and we interview the like different stakeholders on Wikipedia and trying to understand their values. And the second study I'm going to talk about is how we create the visualization system to help capture and explain the trade offs between the different community values. And then I'm going to briefly talk about some of our ongoing work that we're trying to create, to conduct these community workshops, the deliberation workshops that allow the community stakeholders to discuss and negotiate the trade off. So first of all, I will talk about the qualitative work, understanding Wikipedia community members value. So the, we conducted interviews with six team participants and we interviewed the artist creator, two two developers and four Wikimedia product team members and seven Wikipedia editors and two researchers who do research about Wikipedia. And in the interviews we ask questions like yeah what role on Wikipedia, your experience related to ORS, and also their opinions ideas for the future. So we conduct this semi structured interviews and trying to like actually give a lot of flexibility and like ask a lot of open questions to our participants. And then we analyze our data using ground zero approach and we analyze and code every line of interview transcripts and conduct these group meetings to class the codes and discuss and iterate on the themes. So in the results we identified two creator values. So the value is that the creator of Aura is thinking important and five convergent community values. The values that community members actually collectively think as important. So in today's talk I would just highlight a a few like actually we identified these five different convergent community values and they are effort reduction, human authority, workflow support, and a positive engagement and community trust. And if you're interested you can come like refer to our look at our paper and in the interest of time and I will only focus on two values. So first thing, the first thing like that our participants like repeatedly told us that they believe it's important for the for this aura system and other quality predictive system is that they can reduce the effort of the community maintenance. For example one tour developer told us that if we can leverage the manpower that we do have with more automation and these people will have less backlog and can focus on other contributions. So one of the important value is to have this system to design a way that can reduce the effort of community maintenance. And meanwhile, participants also repeatedly mentioned that the positive engagement specifically to encourage the positive engagement from diverse and groups are also extremely important. For example, one researcher saying that I think that the article quality is driven to large extent by the diversity of the hundreds of users and there is a and and the one of the Wikimedia foundation employees also told us that the current ecosystem of Wikipedia limits the diversity of the contributors. So the ecosystem needs to change in order to be more welcoming to certain kinds of people. Well while we like to look deeper into these different values and we realize that actually there are often conflicts and tensions between these values. So so these two value I just mentioned the effort reduction and positive engagement with diverse added groups. If you're trying to map them into the actual system criteria for creating these quality predictive systems, they actually map to different criterias. For example, the effort reductions is probably mapped to like have a high overall accuracy and also like a low or force force negative like catching all the potential like a damaging edits. However, the positive engagement with diverse edit groups might be like it should be matched to a map to like criteria like low force positive. So you should not want to falsely label any good edits as damaging edits because if you do so, that will discourage the motivation of these editors. Also you might want to have like low disparity between the model performance on the different editor groups. So the model should be should treat equally like a newcomer experienced editor anonymous editors if you want to at least like treat them similarly so that you can like actually positively engage all these different diverse groups. However on the other hand when you're actually designing these models actually these criterias you cannot meet all of them satisfy all of them simultaneously. Actually we already know that there is trade off between the false negative and the false positive And also there is trade off actually between the overall accuracy and, like if you want to equalize like the performance across different groups. So, there is, like recent work in fairness in machine learning showing that, yet once you're trying to improve the some of the fairness metrics that your overall like accuracy will decrease. So then that motivates our next work. So how we want to create the visualization system to capture the trade off and then explain the trade offs between all these important community values and explain this trade off to the community members and then they can make a decisions on what is the acceptable trade off. So then we created this variety, a visualization system called RS Explorer. It is a set of it, verifications to help the application designers and community members to understand the inherent trade offs in the Wikipedia aura system. We follow the iterative design process, we start with under the understanding the needs of like yeah, identify the needs of understanding these trade offs and we have the ideation and then design various set of prototypes from low fidelity to high fidelity and conduct the user testing and refine our prototype based on the user's feedback. And so the the visualizations actually has four different interfaces. The first interface is called about ORS. It's a landing page where we provide a basic overview of the AURUS system and how the AURUS score is generated and how the AURUS make predictions actually especially according to a threshold and how you can see there are and we also present like figures and then to show the of this to help people better understand some machine learning concepts and metrics like false positive and false negative. And in this way the users will have the necessary knowledge to explore the realizations that comes in the next steps. And then we have okay then we have this interface called a threshold explorer. So the, this so the threshold setting actually determines the like the trade off between the false positive and false negative. For example, if the threshold is set to be 0.5 that any edits with a score above 0.5 will be predicted as damaging and once you move the threshold and it will actually generate either more lenient model or more strict model. And then the explorer allows users to play with the threshold setting and the virilize the trade offs of a particular setting. We also have this group disparity that the the virilizer and which allows the users to see and compare the models performance on the different groups in terms of accuracy, false positive, false negative, etcetera. And finally, we have this interface called the threshold recommenders. So based on the task that the users have, it's either maybe automatic reverting tools or it's editing tools. So we use this, like, oh, yeah, this recommenders will recommend directly recommend a threshold to the user according to their preferences and goals. Then we conducted this evaluation by recruiting like 10 participants and five of them are from Wikipedia community and five of them are like outside the Wikipedia community. We want to see whether these tools first can really help address challenges that Wikipedia community members have and also to what extent it can be generalized. So, and we ask our participants to use our explorer and ask questions afterwards And our evaluation suggests that the RS explorer improved the participants understanding of the trade offs, associated with a different threshold setting and also understand associate impacts of different threshold on the different stakeholder groups on Wikipedia. We also identify some interesting like a phenomenon. For example although the group disparity virolizer helped surface the ORS models performance disparity in different area groups. Actually, the ORS model performed best in, among, the when they're trying to assess the experienced editors like edits, but the errors is much higher for the inexperienced editor like newcomers or anonymous errors. However, most participants accept the disparity as a natural occurrence and were not really concerned about the fairness implications in the system. And with these tools, we believe that we can help like community members to really understand and navigate this trade off. And so the next step which is our current ongoing work is we're trying to conduct the community workshops that allow the community stakeholders to actually discuss and then negotiate the trade off. So our plan is like we want to combine this visualization and the community deliberation workshops to explain the tensions and to the community members and resolve the tension between the community members. We plan to recruit participants from the Dutch Wikipedia community and the English Wikipedia community and we designed this deliberation protocol where we are we ask our participant to first complete a pre survey to understand their role in the community and their like knowledge about the auras system and then we ask our participant to explore the auras explorer interface and they can use this interface to create one model card which contains a model that they think are most the best for the community that they can they want to recommend to the community. And then we organize a group discussion and ask a participant to discuss with each other and together they are going to like decide on yeah maybe which model they think is best for the community and write a proposal to justify their decision. And at the end we also ask ask them to fill out the individual post survey. We want to see if participating in this activity actually change people's perception towards our system and also corrects their overall feedback. So here is we updated our auras explorer a bit. So we, the one feature we add is like every time when people identify a acceptable trade off and then they can actually choose it and then generate a model card. And we believe that model card is a tool that can help facilitate the discussion because it's a combined it contains a concise set of informations that are most relevant for the discussion. So people can create a model card, which has these performance metrics, across the different, other groups, and then they can also provide their justifications like or why they believe this model is good for the community. So, yeah, in the discussion we designed a few prompt questions like which model you think is producing the best outcome and we'll also encourage people to think about what is a definition of the good outcome in Wikipedia and for for all the individual model people select, people, participant can together discuss the pros and cons of the different models. And especially we encourage them to think more about for their own community. For example, English Wikipedia, it is a larger Wikipedia community and or or like the Dutch community, which is smaller for their own community, what will be the best model for them? And throughout the whole process, they can always go back to our auris explorer to generate a new model if they want. And, yeah, at the end, we'll ask them to write about a group proposal of the model they select. So we are still in the process of recruiting participants. And if any of you is interested in participating and if you are a Wikipedia editor, that will be the best. And please, feel free to email me. And, actually, we we run into a lot of trouble in, like, recruiting participants. Yeah, even though a lot of people show interest, but then, like, actually attending these workshop and also, like, doing presurvey and discussing this topic, it turned out to be a little bit challenging somehow. Yeah. So we do we can discuss more if people are curious. So, but yeah, if you are interested participating, let me know if you have ideas to better like a recruit participant also please contact me. Thank you. And overall, I think these set of work trying to like first, we want to we contributed to like a broader understanding of the human values related to the AI supported governance, in the online communities. And also our work has the implications on the improvement of, Wikipedia's Aura system as well as has a broader implication on design of a lot of content moderation systems for other online communities such as Reddit and Twitch. And we, designed this novel approach, the visualization workshop, and we believe that that can facilitate greater community control and agency in the design of AI systems that will impact the community. So that's the work on Wikipedia. Amy, do you think I have time to talk about another project, or or should we, like, maybe start the discussion?

Speaker 1 1:30 – 1:30

Yeah. I mean, I think we typically wanna reserve, like, the last twenty ish minutes for discussion. So maybe we'll open the floor too if folks have discussion questions. And then, you know, if there's time, we can come back to Uh-huh. The other work you've done.

Speaker 3 1:45 – 1:45

But

Speaker 2 2:00 – 2:00

Okay. Yes. Sounds good. Yeah. I think that makes a lot of sense. Yeah. Let's maybe start the discussion.

Speaker 1 2:15 – 2:15

Okay. Awesome. I think you've already answered the first question that was in the chat. So does anyone else have a question that they'd like to ask?

Speaker 4 2:30 – 2:30

Well, I can just ask it rather than typing it out. Well, actually, I just finished. I just put it there. So the quick question is, do you think that this particular model for governing these AI models, which obviously, BI models have a particular role on sort of Wikimedia governance, Do you think that could provide a template for community control of other sort of relatively complicated technical sort of objects and interfaces inside, you know, sort of like some sort of complicated technical system that a community is using?

Speaker 2 2:45 – 2:45

Yeah. Yeah. Great question. So I believe so. So I think these approaches, like, we talk about, like, first, really understanding and maybe I will show some of that. Sorry. The pros okay. So you may hear. So the process of, yeah, first understanding the community values with regard to their design, the use of this AI system, and then trying to identify or figure out ways to capture the trade off and give the community sort of agency and control in terms of the what is acceptable trade off for them. And, also creating some of the, like, community, actually, deliberation workshops to explain the trade off and, engage community stakeholder in a decision making. I I believe these templates or this process could be generalized to other like community and in other like a complex system. So but I will also think that it will be particularly maybe but but another thing is while I'm talking, I think that there is also yeah. This this particular method is, developed in the context of, like, a predictive system. So it's, actually have a a clear prediction goal, and it's usually a classifier generator probability and then have a threshold and the trade off. So and so this is, so this kind of prediction task is, is employed in a lot of, actually complicated system, but not all of the system. So you can imagine if, for example, we are recently also doing things like, yeah, creating a a training chatbot for Zen Cup so that they can use to better actually help their, listeners to develop their clinical skills. So, some of the specific realization system we developed in the context prediction system might not be applied for, for example, when we talk about the designing a chatbot. But maybe the general, like, idea of understanding values and capture the trade off and then engage communities, in the discussion. So those, principles might still be applied. Some of the but some of the specific, for example, virologations or these specific tools we designed to facilitate the process might not be applied to other setting. And a great question, thank you. And also you have a related question and who controls Auris now? Okay yeah That's a great question. So let's see. Do they still control Auris after this or is experimenting, feeding information, community sentiment to the Wikimedia machine learning group and these intention different etcetera. So yeah, so first of all Auris is developed actually it it maintained as an open source project. Although, their creator is Aaron Huffaker who was a Wikimedia foundation employee, but he's now actually moved to Microsoft and then they are like a machine learning platform teams now take over of the maintenance of the our assistant. But still it's like, very much, like, community maintained system. So it's already, like, sort of, like, the the whole data collection, for example, how they collected their training data is they did this campaign. So, like, allow anyone to, like, like, a label data, and then they are all they are, like, codes are open sourced. And if you have if you want, you can always check and see, like, what kind of the algorithm they use when they're developing the models. But I think what we are doing is, like, open this up to more broader community. So it's not just like maybe a few people who really have a specialized knowledge to understand how it works, but like anyone in the community, if they want, they can also in guide engage in this discussion and provide, inputs. But we will also, yeah, chatting with machine learning, platform teams at, in Wikimedia Foundation. So, they sometimes yeah. On one hand, they they really like the whole process, and they even say that they want to apply this process to any machine learning system that will be applied, will be developed in Wikipedia community. But on the other hand, they are really limited in their like manpower so they will refuse any significant changes to their infrastructure. And so so yeah it's a little bit complicated yeah, dynamics in this. But, yeah, it's a good question. So about who controls Auris, it's a open source project, but still for like any other project, there were still a few core contributors who oftentimes are like Wikimedia, like, employees. And then but we want to like broader up broaden this up so that more people can be engaged. Let's see. Oh, we have a few more questions. So let's see which one I should answer first from

Speaker 1 3:00 – 3:00

We typically just go in order down the chat. So I think Okay. I I can go next. So I I although I think you have already kind of touched upon this a little bit. So I was asking about the the focus on the tool in terms of just this threshold value and if you think there are other things that a community could get involved in when it comes to model work. I guess you, kind of alluded to that, like, members of the community already do data labeling to some degree, and they can participate in code writing. But that perhaps most members of the community aren't gonna get involved in that. And so, is it primarily discussions of, like, the trade offs between thresholds, you'd say?

Speaker 2 3:15 – 3:15

So in our study, we focus on the trade offs trade off related to the threshold, which is false positive, false negative, and also trade off between the overall accuracy and the fairness. So it's like performance across different groups. So that's, we call it like a outcome trade off. So in our side, we'll focus on this, but you're right. So actually, people can participate in, like, different stages in the whole pipeline, like data collections, like the trade off. It's actually the model selection stages. So they can we were trying to open up the models, the model selection stages to all the people, but people, Wikipedia community also get they they, like, they just voluntarily will create a report and submit a report sort of, to identify, like, a box, in the system, and they will also report, like, strange behaviors of auras, and they will also, like, monitor things like, oh, some, like, tool developers. They use auras, like, misuse auras and create applications that does not benefit the community. So they also have this reporting system. So we it's a interesting yeah. It's a very interesting actual dynamics in in Wikipedia, and it's really a community sort of maintained, like, a machine learning system.

Speaker 1 3:30 – 3:30

Thank you. Looks like we have a question from Max. Did you wanna talk about your question next?

Speaker 5 3:45 – 3:45

Yeah. So I I I really like this idea of augmenting, like, helping people understand the importance of trade offs and values, but I'm I'm kind of wondering about how values are conceptualized. And, I mean, I guess, more specifically, what are the types of values that might not be captured very well with this particular system? Mhmm. That's kind of the first part. And the second part that I'm curious about is, like, who even gets to propose what values are and yeah. Like, how do you think about values? I'll start we'll start with just a just start with the first question.

Speaker 2 4:00 – 4:00

Okay. Yeah. Both are great questions. So first of all, questions, like, yeah, some of their, yeah, values. I completely agree that they are very hard to actually operationalize. So fairness actually is the value that we already have tons of work from the literature from other actual machine learning researchers. They have they propose all the different ways to actually operationalize the outcome fairness, but you're right some of the procedural fairness is really hard to capture. And also in our case of the five convergent community values we realize some of the values that you can have the system criteria that to be mapped to, but some values actually we have the the fifth value we identify is called community trust. They think that you should build a system that can advance the trust of the system itself and also trust of the community members. So those are extremely hard to like operationalize what does even that mean when you are like actually want to incorporate these values in the design of the system. So, I guess what we do, we we are still exploring in this area. Love to hear what all of you are thinking about, like so how how to yeah actually translate some of these values into the actual design choices of the system. I think it's a very interesting question. And also who get to pick the value? So again I think you will defer to to the community to make the decisions on what is the most important values for them. And then but if they identify more of our values, then we show them the trade off, and then it's up to them to decide on, yeah, what is the acceptable trade off is.

Speaker 5 4:15 – 4:15

If I if I might just add a little question there then. I mean, I I'm just imagining a scenario where there are people thinking of values that are particularly hard to translate into this format, and they might say, oh, maybe there's a little bit of a bias or I mean, I don't know if bias is the right word, but a a little bit of a shift of emphasis on the types of values that are the most legible. And they might say, like, our values can't be very easily proceduralized. And so we as a group are, you know, not captured very well. How how might you think about that scenario?

Speaker 2 4:30 – 4:30

Yeah. I think that could definitely happen that some maybe people will feel like their own value are not actually reflected in your system design, maybe partly because they are really hard to be translated. So yeah. I do not have a very good answer for that. I think it's a very hard question. So it's really maybe it's on the case by case more like maybe if we identify yeah. If there is a value like people believe are extremely important and then maybe we will we could engage community stakeholder in the co design workshop to provide them opportunities to like actually trying to express like maybe ways of translating the their important values to the specific design choices. So maybe that could be one way to help that, but I don't know. I think it's a very good question.

Speaker 1 4:45 – 4:45

Ben, did you have a follow-up question to that? I saw your chat kind of related to Max's question.

Speaker 6 5:00 – 5:00

Yeah. I I mean, I suppose I come from a humanities background. So, I sort of think about values quite a lot and perhaps perhaps differently. I don't know. And I I just I found the idea of someone saying that effort reduction was a core value, that they wanted for for their their governance system. I mean, like, makes perfect sense, but you'd normally talk about in terms of, like, efficiency. I mean, I just checked. I've got sort of I've got sort of a concordance of, like, ten ten million words of popular books on tech culture. And, like, work is absolutely one of the key terms. Like, work is, you know, it's like something like one in every 500 words is the word work. So it's this really sort of key key term. And and to then sort of state a value. I mean, maybe it's particular to Wikipedia. Right? Because it's voluntary and people wanna, you know, get through mid you know, hundreds of thousands of posts. But, like, to say effort reduction seems to be quite contrary to sort of prevailing values that I recognize. So I just found that a bit interesting and weird.

Speaker 2 5:15 – 5:15

Right. Yeah. Yeah. Yeah. I see your point. I I actually agree it's maybe because it's, like, grounded this is in specific context of Wikipedia community. It's like a really it's a volunteer based communities. And then people yeah, I think maybe more, I think this is a short term. So I think the completeness they want to reduce the maintenance effort. So they do want to spend a lot of time to maintain, but they want to spend their time their limited time and effort on more creative tasks like actually writing and contributing like actually contributing content to Wikipedia. But the current status is like a lot of the editors they have to spend a lot of time just to remove those damaging edits and then just to maintain the quality of the of the do do this thing they think they're tedious work so they want the AI system to come in and to help reduce some maintenance effort.

Speaker 6 5:30 – 5:30

Yeah. I mean I mean, it may it makes perfect sense. It's just sort of a it's thinking about it as a value. It just seem yeah. I don't know. Yeah.

Speaker 1 5:45 – 5:45

Josh, did you have a follow-up question at all based off of your latest comments in the chat?

Speaker 4 6:00 – 6:00

No. No. It's it's not a question, really. I was just responding to your mentioning of ties. Just really quickly, I there's, the speaker last week. How you pronounce this? Ty or t? I mispronounced the first time. It was really embarrassing. Anyways, Ty is a philosopher at Utah, and we're talking about value capture and specifically, like, some of the dark sides of, like, constructing or operationalizing these kinds of metrics. Mhmm. But I was just mentioning that in this case, it feels like it's not nearly as bad, in this, like, given that the metrics you kinda capture, like, it's, like, fairness. Right? You know? Or yeah. My sense is it's not bad. Who knows? Who knows? All sorts of unanticipated side effects. I can't imagine.

Speaker 2 6:15 – 6:15

Mhmm. Mhmm. Actually, that's the second project that I I have on slides, although we don't have time maybe to go through it. But the idea is exactly exactly so there are so many ways to operationalize one value for example fairness. So here in the context of child welfare, what does actually fairness even mean? Yeah. Maybe there like your definition of fairness will actually lead to even like a more damage to the already vulnerable population. So the the whole idea is like we're trying to yes these are just a limited set of fairness definition, but we are I see that they are actually defining in very different ways. Either statistical parity, like, equalizing odds on awareness or the individual fairness, like, whether the similar individuals should be treated similarly. So this study is we're trying to see if we can have our participant community stakeholders first explain all these different fairness definitions and then allow them to make decisions on, yeah, how to operationalize these metrics. What is the most appropriate way in that setting to in their own community to operationalize these metrics. But the one of the challenges like once we like actually explain these different definition to our participants and maybe this is the one so okay so maybe not this so or maybe you can already see that people really disagree So, they they cannot have a consensus about even even among like just a limited set of participants like 12 participants. They already disagree with each other. What is the right operationalization for fairness in the context of child protection? You know, in different scenario, like, what is the right metrics to use? Yeah. I definitely see there are a lot of opportunities and challenges in areas of, like, operationalizing, like, metrics. And, yeah, I would love to, like, look at the work. Maybe I can look at your schedule and find, your last speaker and find out his work.

Speaker 4 6:30 – 6:30

Hey. He's doing some fabulous work. It's, I'll I'll I'll post it in the chat.

Speaker 2 6:45 – 6:45

Okay. Great. Thank you.

Speaker 1 7:00 – 7:00

Any last questions? We have, probably time for one more before we should probably wrap up. If not, I actually have one more question. So you kind of alluded to this in the last bit about, like, what happens when people disagree. And I guess I was gonna ask about that in the Wikipedia case. So how do you think this kind of community governance of AI fits into Wikipedia's, like, existing governance model, in terms of, like, you have multiple stakeholders. They're talking about the thresholds or the values that they prefer. What happens when they disagree? Like, how do you eventually get to consensus, or how do you decide who should have the most say in determining what the final value is?

Speaker 2 7:15 – 7:15

Yeah. Yeah. A very good question. So in these cases, we will actually defer to Wikipedia some of the existing conflict resolution mechanism they have. Actually, I think Amy, you and your collaborator have the paper and talking about all the different deliberation, the conflict resolution mechanisms on Wikipedia. So they I think this is not that different from other sort of conflicts on Wikipedia. So eventually, they could go through the protocol and then they may be eventually might, like, people might identify a few possible candidates for the model, and eventually, people can, like, vote to select the model they believe most appropriate for given, like, application. So that's one approach. So it's like we use some of the existing conflict resolution mechanisms to help the community, to allow communities to achieve consensus. Another approach is actually, we also approach to give people flexibilities and also have this call it distributed decision making. So maybe eventually some reviewer some, like, vandalism fighters and who are the people who basically they they are really enthusiastic about catching vandalism. So they can maybe create choose their own threshold. They they will want to use while they are, like, trying to monitor the incoming edits. Or you can imagine maybe give them people, like, the opportunities to develop their own personalized, like, systems to help with their own tasks to align with their own workflow. So that's another approach so it's not necessarily every time we have to like have the exact same model for everyone but in certain cases it is like a recent change maybe But even for recent change interface, you can also imagine that people can actually set up set up their own threshold.

Speaker 1 7:30 – 7:30

Ed, did you have a last comment you wanted to make? Probably don't have time for a question though. Ed? Separio? Oh, it I can't hear you. Is your mic?

Speaker 3 7:45 – 7:45

Can you hear me now?

Speaker 1 8:00 – 8:00

Yeah.

Speaker 3 8:15 – 8:15

Sorry. I didn't well, okay. I I mean, something something I noticed, you know, I've been looking at Wikipedia for a really long time. And something that somehow is is like a good property of it. An individual has a limit to how much damage they can do somehow because it's sort of like human scale work. And they've always been very so suspect about introducing mechanical tools because of this idea that someone could create enormous amount of damage. And then for example, you could create hundreds of units and then and then and then do many many edits all across the project which which sort of go out to like normal, you know, normal labor that people cannot do. And I think it's been really interesting, you know, thinking about sort of, you know, the amount of like moderation labor that exists in the system and and and how you empower people. So is that somehow, like, somehow quite quite fundamental to this to the system?

Speaker 1 8:30 – 8:30

Yeah. No question. That's definitely a reason to kind of incorporate all these mechanisms that Heidi's talking about to be, really careful. Okay. So I think we're running out of of time. So let's all unmute, and applaud our speakers. So three, two, one.

Speaker 2 8:45 – 8:45

Oh, thank you.

Speaker 1 9:00 – 9:00

Thank you so much again, Haidi. And, yeah, for everyone here, let us know if you, need access to the Slack. We'll be posting the recording of this, once it's up.

Speaker 2 9:15 – 9:15

Thank you. Thank you for having me. Bye bye. Have a good day.

Speaker 1 9:30 – 9:30

Thanks. Bye.

Zhu Metagov

Top Keywords

Transcript

Listen