Choral Data Trust Experiment By Serpentine Galleries Introna
Metagovernance Seminar Archive | 2025-10-21 | Unknown
Speaker 1: Great. So welcome, everyone, to Medigev seminar. Thank you all for being here. I'm very, very excited to introduce, our speakers today. So it looks like we have both Tommy and Jennifer to share today on the Coral Data Trust. This is a project that I I feel like I've heard about it for a while. I've heard of y'all's work on this, but really when the white paper came out, I was like,...
Top Keywords
- data 0.014
- choirs 0.010
- serpentine 0.008
- trust 0.007
- dataset 0.007
- rights 0.007
- tommy 0.006
- data intermediary 0.006
- around 0.005
- choir 0.005
- governance 0.005
- holly 0.005
Transcript
Speaker 1
0:00 – 0:00
Great. So welcome, everyone, to Medigev seminar. Thank you all for being here. I'm very, very excited to introduce, our speakers today. So it looks like we have both Tommy and Jennifer to share today on the Coral Data Trust. This is a project that I I feel like I've heard about it for a while. I've heard of y'all's work on this, but really when the white paper came out, I was like, wow. This is kind of one of the, first real, like, thorough examples of a data trust that I've seen and that, I really am excited to learn more about the the research and the work. So, I think data governance is obviously a big interest to a lot of folks in our space, so it feels really awesome and right and timely that y'all are here, and I can't wait to learn. So, please take it away, and thanks again for coming.
Speaker 2
0:15 – 0:15
Brilliant. Thank you so much, Val. I don't know, Jen, if you wanna share the the shared screen or yeah. K. Brilliant.
Speaker 1
0:30 – 0:30
Oh, yeah. And I'll just throw out there and say presentation will be about, thirty, forty minutes, and we'll have the rest of the time for q and a. If you have questions throughout, feel free to type them in the chat, and then we will I'll help kind of facilitate the q and a once the presentation's over. So type your questions in the chat throughout the presentation, and then we'll get to them in the last twenty minutes. Thanks.
Speaker 2
0:45 – 0:45
Perfect. Thank you so much. Yeah. It's it's it's brilliant to be here. It's very nice to be part of the and I finally have joined the MetaGov Slack, so look forward to being more involved in the future. But just to quickly introduce myself, I'm Tommy Introna. I'm an r and d platform producer at Serpentine. Serpentine is a public art gallery in London. We have a real building, which is where I am now. So if you're ever in London, you should come visit us. So but alongside of our exhibition program that we're known for, we also have various other programs that are very important to us as ecologies program, civic program, and also the technology program, which is which is where I'm based. So the arts technologies, we've been department for around twelve years, and we explore different ways that artists are working with technology and different ways that technology is influencing the way artists are working. So our department is split between two kind of key areas of work. So we have the commissioning side of things where we're commissioning and producing major new works using advanced technologies. And then we also have a thread, which is our R and D program, which is where I'm mostly situated, which is focusing on the building twenty first century cultural infrastructure. We've done this through research, through developing prototypes, through publishing strategic briefings for this sector. So I'll drop in some links as kind of we go. So if you wanna follow-up with some of our other work, you can. Today, we're talking about the crawled data trust experiments. We I'm saying sort of inverted comments there, which we'll come to later, whether it is a trust or not. But I should just before we start, acknowledge that the project that was developed by a brilliant team that includes my colleagues, Seth Serpentine, Victoria Vanova, Eva Jaeger, Ruth Fortis, but also Mercedes Benz from King's College London, the artists Holly Herndon and Matt Dreyerhurst, obviously. And, of course, the brilliant Jennifer Ding who is with us today. So maybe, Jen, I'll hand over to you to introduce yourself.
Speaker 3
1:00 – 1:00
Sounds great. Thanks, Tommy, and great to be here today with all of you. So I'm Jen. I work at an AI data infrastructure startup in London, but I was very lucky to join this project as the data steward. And previously, I was on the open science team at the Turing Institute. So hi, Alex, good to see
Speaker 4
1:15 – 1:15
you from there, and, Liz, I've actually heard a
Speaker 3
1:30 – 1:30
lot about you from Stuart Lynn, who I worked very closely with while I was at Turing. So really lovely to be here today, and to share about this work with the Serpentine on the Coral Data Trust experiment. So to kick us off, Tommy has already introduced a few themes that are really relevant to the project, but really what this project tapped into were two age old traditions in The UK, and that those are community choirs and UK trust law. So, really exploring some themes around, collective data governance, tapping into different legal frameworks such as trust law, UK GDPR, UK performers rights to explore how we can also give any governance framework some teeth, and as the name very much implies, it was also very much an experiment. So we started thinking that a trust was going to be our form, and then we ended with trust in inverted commas as Tommy mentioned. So trust itself, was a concept very important to our experiment, but, as we'll get into later, we ended up actually exploring some new, legal vehicles, around the framework for our governance. So let me jump, straight into the plan for today. We'll start with a little bit about the art exhibit and the research project, the coral data trust experiment that took place alongside. Then I'll present, the part of the research that I led, which was around building capacity for data governance, then hand over to Tommy to share more on our legal frameworks that we prototyped, and some future work for, the governance of our dataset. So, the The Call was an amazing exhibition that recently closed, ran from October until February, and it was with the artists Holly Herndon and Matt Dreyhurst. And quite a lot of the actual art exhibit that that went into our production was all part of the the art, if you will. So the artist composed a songbook of hymns that was purpose built for collecting a training data set for, a choral AI voice model. So, both the the creation of the music, the recording of the music, the setting up of this collective data governance framework, and all the questions around the governance of the data, the use of the model, have all really been central, to this project, And really what we have focused on is what, Tommy's colleague Victoria calls live action role playing or LARPing good faith AI development. So what does it look like when the builders of the models, the contributors of the data are all interested in shifting the power dynamic or or changing the status quo, so that data subjects or data contributors to AI models actually have more opportunities for empowerment. And a lot of this was, centered around this provocation from Holly, as you can see on the slide, and the provocation really was can we develop some more, empowering, and new and better rituals around what the process of creating AI is like, and just like coordinating voices in a choir, the artists really saw AI as a possible coordination technology as well. So here is a snapshot, a few snapshots from the process itself. You can see some of
Speaker 4
1:45 – 1:45
the
Speaker 3
2:00 – 2:00
songbooks, in, on display from the exhibition. There was also this interesting design of the ambisonic mic recording layout, so we could collect, really high quality data that would future proof the dataset, if you will, for future model development. And finally, you can see a nice snapshot from one of the recording sessions. Of course, it's important to mention the 15 UK choirs from across The UK who answered the call and decided to join the recording sessions and contribute to the dataset. I think many of them didn't realize how much they were getting into regarding the governance part of the experiment, but they have been amazing, both on the artistic side, and jumping on board to have some really complex conversations about data governance. So moving on to that section, we had a very tight timeline here around what we had in mind, between when the project started in April and when the exhibit opened in October. There's quite a bit of different forms of engagement that we explored from one on one conversations, open sessions on Zoom, and then we ended up running a few surveys, one using just a classic Google forms layout, and the second one using Polis to try to extract, different forms of preferences around things like data sharing, around licensing, and also around governance itself, how, the choirs were interested in making decisions about the dataset moving forward. Our goals are really these three. So, you know, it's not very common that, people within the same dataset necessarily think of that as a collective identity, but we wanted to explore what it might look like to enable some of that identity formation. And with that identity formation, explore open opportunities to build capacity for collective action among the choirs to make decisions about the dataset. It was very important to us that we didn't come in choosing a specific form, whether that was a trust, a cooperative, etcetera, but rather that the form itself would be driven by the preferences from the community. And finally, it was also really important for the team that any sort of governance framework would be backed with some sort of legal vehicle. Some quotes from our choir members around, that have surfaced from the engagement, thinking about, you know, how it feels to even think about their voices as data or raw material for training AI models. Some really interesting themes, came out around how they felt about this process to begin with. I think many of them really didn't, as live performers, the idea of transforming voice into something digital, let alone something that becomes part of a model, was interesting, but scary, as you can see on the screen, but also quite unusable, disembodied, or soulless. But at the same time, they recognized, you know, at this a lot of this development is already happening. So if there is an opportunity to use the experiment as a way to frame a best in class example, or to shape what best practice could look like, that was the opportunity that they found. So jumping into some of the, the surveys that we actually ran, the first one was, a data preferences survey. So this really was around, both the collection of the data, the use of the data, and for different purposes. Some interesting highlights, from this. So, you know, this is really small, but this first one here on the left, something interesting that came out of this was, the the user, of the dataset really mattered to the choir. So there was a lot of understanding of what Holly and Matt's intentions were were, what the Serpentine's intentions were. So there was trust that was based in that relationship. But we saw that the comfort level, dropped significantly when it came to other potential users, of the dataset to build other models. So I think that contextual understanding and that basis of trust and the relationships and an, a, transparency around the users and use cases was really something highlighted from this survey. Another interesting point was around crediting. So, many of them were actually quite ambivalent to being individually credited for the contribution, but felt very strongly that they wanted their acquire's contribution to be credited. So I think all of this really told us that, you know, some sort of blanket approach to crediting or sharing really wasn't the right fit. It was very nuanced and independent on the context. So any governance form would need to take that into account. Another interesting finding was that, we had originally assumed that, many of the choir members were more interested in the artistic aspects of the project and wouldn't be interested in getting involved on the decision making side. But in fact, they were actually quite interested in getting involved in the decision making. So with that in mind, we ended up setting up a data governance working group with representatives from each choirs, to make decisions going forward. The outcomes for the survey were all also quite helpful because we could translate the outcomes into targeted interventions in the model building pipeline. It really raised different priorities that were important for the choirs, And it really centered around transparency, around, you know, knowing who has the data, where it's stored geographically, getting to be part of defining unacceptable criteria or behavior for the model, and also setting license terms and defining choir recognition and profit sharing. So that actually leads us really smoothly to the second engagement we ran, which was around using the policy tool for guiding the kind of license that would be best fit for this dataset. So we had 20 seed state statements. The participants also submitted a few as well. So I think we ended up with around 25 in the end, and some interesting findings. Overall, we had three clusters that that really emerged. Two of them were a bit more risk averse towards, wide sharing of the data, and another group c was a bit more open towards open sharing of the data for a wide range of use cases. And sorry, these are I'm realizing these graphs are really small and probably completely illegible, but the white paper has them in a better form if if you're interested in in seeing, more granularly how people voted. But, one great thing was although around, you know, for profit sharing or open sharing, there was more, divergence across our three, preference groups. We did find a lot of agreement actually for many of the questions. So, most of them were interested in sharing the data with those users that were using the data for educational purposes, nonprofit use cases, those who agreed to comply with the license terms, things like that. So what this really indicated to us is if and when the dataset is released, and Tommy will share a little bit more about how we're thinking about that now. The right fit that captures, the preferences of the group would be a Creative Commons, share alike, non commercial with attribution license, the CC BY NCSA. So, that is how we use the polis to guide the selection of a future license. So now I'll hand over to to Tommy to share a little bit more about how we then took the outcomes from all of this engagement and the different forms of engagement and translate to that into, different legal frameworks, that we experimented with.
Speaker 2
2:15 – 2:15
Thank you, Jen. Yeah. Exactly. So, you know and I think it's worth maybe just reiterating this about, I guess, the ethos of, like, the r and d that we do at Serpentine. I think perhaps there's an assumption that with that what's possible within our art gallery kind of context is broadly speculative, imaginary kind of scope. But really what we're interested in here is prototyping something that, you know, can have real world implications, can be inform other sectors. And in order to do that, we really needed to delve deep into the robust kind of legal frameworks that we might be able to draw on to build this kind of collective this collective voice. Jen, would you jump in straight into the first? Sorry. I'm gonna have to, like, kind of ask you for that. Yeah. So exactly which legal mechanisms could enable this collective data governance. And here our aim is to produce the legal basis and a workable infrastructure to enable the governance of the data. And this needs to align with the preferences that Jen has already outlined whilst being legally sound and some way feasible. And I think that's an important factor limitation that we're working with. And in The UK, it's not really I think this is not unique to The UK. It's not really immediately clear what legal frameworks we can leverage to challenge the current sort of standard way of approaching AI development and increase community agency. But in this case, we drew on two concrete aspects of UK law. So the one is UK performers rights and the other is the general data protection regulation, which is GDPR. And we have UK GDPR, which is because we've left the EU, but we still have GDPR. It's effectively exactly the same provision. And, yeah, so most of this, I'm gonna kind of delve a little bit into detail, and I know this is very UK specific. But, hopefully, the way that we've kind of tried to approach this and develop a a mechanism that works for us will be transferable to wherever you are. Jen, do you want to jump to the next slide? Okay. So, yeah, performance rights. So under UK law, performance automatically receive intellectual property rights when their performances are recorded. So this, for example, might be to do with, like, a musician performing on a record. They automatically have an IP rights to their performance, and they might license that to a company that can that owns it. But primarily, they have that, and they also have a moral right to be attributed as a performer. So we kind of thought this might be something we can build on. The singers are in the choir or performers who are being recorded. So there's some kind of, like, parallel there that's easy for us to jump onto. And so, like, the question then is like, okay, who who does this ride protect? And in this case, all of the choir foresters that took part in the recording sessions had performance rights. Their consent was needed to record their performances and to reproduce these recordings. And so to utilize this in our experiment to collectivize this right, we drew up a performance rights agreement, and this grants us the exclusive license to these rights. Or at least it puts it in the trust into the it grants these rights to the trusted data intermediary. And I'll speak more about that trusted data intermediary formation in a moment. And this yeah, this really reflects the desires of the choruses that we'd already that we already established. Can you go to the next slide? Yeah. So in this agreement, there is a template license agreement, which is what we would use then to sublicense the dataset to other developers. There's also details about the government's requirements, which includes transparency, the mode of consultation. If there was a commercial arrangement agreed, then it would also it also details aspects about royalties and also reasserts their moral right to be attributed as a for their group identification to be attributed. Next slide, please. So the other legal framework that we drew upon was UK GDPR. And this comes into play where there is processing of personal data. So this is typically used, you know, when you're signing cookies and, you know, or signing up to a mailing list. It's we're all familiar with this right now, hopefully. But here, really, personal data only relates to information that can lead to identification of an individual. It's a very individualized right and also only relates to identification. So interestingly, in our experiments, this only protects or relates to a very specific group of the singers, which were the soloists. And apologies. We didn't we didn't have time in the beginning to go through the details. But the way that the song was set up was that there were eight soloists who formed like an inner circle of this, of the singing. And then there was a group acquired that's in the perimeter, that sort of were less identifiable. But the soloists who were close miked, their voice, their likeness was much more, like, identifiable and therefore the GDPR applies. So this is an agreement that's only really with soloists. And I'm just looking at the details here. But yeah. So they we arranged so they signed a data rights mandate, which, again, handed over mandated their GDPR rights to the trusted data intermediary who could then assert those on their behalf, but they crucially could still exercise those rights individually. Next, please. Thank you. So, yeah, so this is and you got to excuse some. I'm a arts produce arts technologies producer, researcher. I'm not a lawyer, so I'm trying to do my do it justice. But my understanding is that, you know, some some of the kind of interesting advances or and kind of challenges of doing this kind of mandating of data rights is that it is untested in law. Like, there's not necessarily a precedent for it, but it does do some interesting things in terms of collectivizing this right and also create some kind of interesting benefits for the individuals in that they don't need to manage the managed, you know, checking that if the data has been used appropriately. They don't need to necessarily understand the technical the complex technical specifics of data mining, which might be not their interests or skill set. So we as a trusted data intermediary can do this work on their behalf. Yeah. This is a sort of example of, you know, if, say, if someone came to us, we're wanting to train a generative AI model that produced hate speech. The data intermediary with these two agreements can exercise these rights on their behalf, and it would make it very difficult for the development of an AI voice to take place. Next slide, please, Tim. Okay. This is a kind of crucial it's a crucial slide. There's a lot in this one. But this is really outlining this trusted data intermediary structure and why it's not a trust. So, like Jen said, whilst we were always sort of sure that we wanted to establish an organizational entity, we weren't sure exactly what form it would take. And we explored three possible legal mechanisms, a individual trustee, a legal trust, or a company limited by guarantee. And within The UK context, at least, the problem with there's problems with the individual trustee model and with the trust model. And these are really come down to with the individual trustee, they would be exposed to unlimited personal liability, which, you know, for example, if we had asked Jen to become this trustee, it would have exposed as an individual to potentially a huge amount of risk that was would be sort of, you know, unfeasible to unreasonable to expect an individual to take on. And a corporate entity becoming a trust had some other issues, one being that they would require a £100,000 in paid up shares. This is perhaps not a huge amount of money for a larger organizations, but for smaller community led organizations or charities or other kind of community organizations that might be wanting to set up some kind of data trust. This felt like not really a reasonable kind of expectation and sort of, yeah, not realistic in practice. There are other and there's additional barriers to setting up a trust such as extensive customs and tax compliance requirements and other uncertainties whether these rights, the GDPR rights personal and their performance rights could actually be transferred to a trust. So ultimately, we selected Serpentine's existing trading company. So this is another kind of key detail, which is that Serpentine is registered as a charity. Our simply, our gallery itself is a charitable organization, but we also have a trading company that's sort of, like, attached to us, which kind of looks after all of our commerce aspects about, you know, selling books in the bookshop, running the cafe, etcetera. So we could use this existing trading entity to implement this company limited by guarantee structure. And this is easy to set up. It's limited liability. It's operationally efficient. In that way, it's we're sort of effectively the same organization. And it means that the trusted data intermediary remains within the the gallery, and we have the sort of knowledge and trust kind of built through this project. So whilst being the data this trusted data intermediary expands the scope of the seven time mission, it's potentially a role that other galleries, libraries, archives, museums could play in the future. So I know that was, like, the biggest, most dense slide, but we can maybe unpack some of that in the in the conversation afterwards. So, Jen, could you jump me over to the next steps? Brilliant. So where are we now? We've, you know, as these projects typical of these projects, there's a huge amount of administrative complexity with working with over 300 foresters and trying to manage the governance. And we're still in the process of collecting all the signatures from the choirs that mandates the Seven Titan to exercise these rights. We've got over 60%, which feels pretty good going so far, but there's still some way to go there. And this speaks to some of the complexities of this approach, but it's also an area to further kind of work in terms of streamlining this process or coming up with creative, efficient ways to collect the permissions from the choirs. The next slide, please, Jen. So we've also begun with our dates to our data representatives for a survey. The survey we tried the survey because it was very difficult to organize meetings with a big group of people, but this initial survey was establishing how they want to move forward and also testing some of the initial cases that have come to us. If you go on to the next slide. So we've had like a couple of interests, a couple of people coming up to us within our network who are interested in using the dataset. And we pose this one of these requests to the data reps to start exploring like, how this governance process works in practice. So of the three choirs that wanted to kind of consider each proposal individually, only three did. The other six buyers that responded were happy for us to make that decision on their behalf. But three of them wanted to consider each case individually. And here you can see that when kind of considering this open source open model that was that that you Luther AI wanted to produce, two were happy for that to happen and one had an issue. And then the sort of next other next step that we're kind of dealing with, that's an interesting challenge now is kind of how do you host this dataset like this? How can we fit it within AI pipeline so that it's used and efficient and accessible to the people who have a genuine interest in using it, but can maintain our control over it. And we've explored a couple of options. I've uploaded a sample repo to Hugging Face. You can find it with that link. I can share that. And we've made use of their gated dataset features, which requires manual approval and agreement to the template license. So whilst there's something if this is definitely an area for future work because it doesn't feel, like, entirely, satisfactory or, like, pays doesn't feel like a perfect system. Like, we still feel a huge amount of responsibility in in uploading this dataset and sharing it with anyone. And how we manage that relationship feels like, yeah, it's a whole new range of questions for us to deal with. But there's there's some potential here, I think, using negated datasets. Can jump to the penultimate slide. I think maybe the last slide. Oh, sorry. I've already jumped ahead to this slide. This was me talking about the the hooking phase dataset. We can go to the next one. Yeah. So, I mean, I guess just to conclude a little bit, there's lots of script here for more research, but I think like what it's what we've tried to prove really is that within the sort of GLAM institutions, the galleries, libraries, archives, and museums, there is like a lot of potential for this public AI research, public interest AI research. And the big caveat being that it requires substantial resources and a level of investment for proper data stewardship. And we're hoping that we can kind of keep banging on this drum to show the potential of the kind of work that can be done within the cultural sector and hopefully get some kind of resources to kind of continue this work. I'll probably because I know that that's gone we're already at quarter two. So maybe I'll I'll wrap up there. Jen, I don't know if you had any final comments before we jump into some questions. Cool. Thank you.
Speaker 1
2:30 – 2:30
Awesome. Thank you both so much. This was such a rich presentation, and I must say, the chat is popping off. And thank you, Jen, for getting in there and answering so many questions live. My question was answered, and I feel satisfied.
Speaker 2
2:45 – 2:45
I couldn't even see the chat, so I didn't realize how much chat there was going on. This is great.
Speaker 1
3:00 – 3:00
Yeah. Yeah. I'm curious. Maybe, Jen, actually, if there were any questions or if you yeah. You know you know see what's in the chat even better than I do. So maybe there was a question that you wanted to bring to verbalize and we can discuss.
Speaker 3
3:15 – 3:15
Sounds great. Maybe I'll share a few thoughts now because I actually have to leave at, in five minutes. But thank you all for such inter interesting discussion in the chat. I think there were quite a few questions about how we had those initial conversations about AI. What even is AI? What is even is voice AI? And I just wanna share the the the public deck we ended up making, which is openly available as well, and something that I think was actually quite helpful is we didn't really come from an angle of, I don't know anything is wrong or right, but we just shared the many case studies that exist now and have popped up every month. Some of the participants shared a few as well. So Jacob Collier and what he does in his concerts around creating choirs out of the the audience was an interesting case study. We also had OpenAI and the initial release of their voice assistant, which sounded essentially like Scarlett Johansson exactly. So that was a interesting one to react to as well. But some of our participants also raised examples. Like, there was a country singer. I'm forgetting his name now, but he had a stroke ten years ago, and essentially was able to release a new single last year using old recordings of his voice. So you can already see across those examples the different levels of empowerment and the different impacts on the original, voice contributor to the process and also how actively they are involved in the process. So with those examples, I feel like we had a really good conversation across the three public conversations we had, where people felt like they had a better sense of what they wanted out of the project. Something that was quite interesting was, I think there was this almost gut reaction when we start talking about law and rights that our intention was actually take away rights or take away power. So there was a lot we had to do and really, explain that actually we were trying to do the opposite and we were trying to rewrite this, tendency to just sign away everything. It was a little sad that in some of the first conversations, some of the initial reactions the choirs had were like, oh, yeah, just take it all. Like, don't worry. I'll take it all. We just wanna join the project. And we're like, no, wait. Actually, we want to do the opposite. So I think thinking about the kind of experiences and, tools that can help change that default response to accept all cookies or take all my rights, I think will be an ongoing challenge.
Speaker 5
3:30 – 3:30
I I just wanna speak to that a second. I am part of the I triple e, p seventy twelve committee, and we are coming up with something we call my terms, and it's being finalized right now by the I triple e So that you could have instead of having to sign away, you know, terms of service that get pushed to you, you we can proactively push our terms of service at all service providers. So and there's similar structures where I'm working with DocsWorlds with regards to doing all that. So I'm gonna connect you with so many people. It's ridiculous. So what's the best way to get in contact with you? That's my question.
Speaker 2
3:45 – 3:45
Thank you so much, Steve. That sounds brilliant. I'll drop we'll drop all our emails in the in the chat. That sounds fantastic. And I think just to add to what Jen said, I think some of the power of the initial proposal from Holly Herndon and Matt Dreherst was, you know, using choral singing as a kind of effectively like a metaphor for AI training. And this sort of and this is something I experienced during singing in one of the recording in one of the recordings was the sense that it does it does this interesting thing where you can kind of almost sense that the significance of this, like, the the whole is more than some of the parts kind of feeling that you've that's true of AI training. And I think it's it's a useful entry point for people who are not familiar with AI to kind of enter into understanding, like, what this means, what it what it means for you as an individual versus the collective and the value of collectively collectivizing your rights or your bargaining power versus trying to tackle it as an individual. So, yeah, just to sort of wanted to add that.
Speaker 5
4:00 – 4:00
And I'll even throw in that I have an alternate, much more benign form of public protest, which would go perfectly for this, for securing the rights, that has a singing theme that's pre themed serpentine already. The Serpentine Street sing, I will include that as well.
Speaker 2
4:15 – 4:15
Brilliant. Thank you.
Speaker 1
4:30 – 4:30
I have a question around and anyone at this point now, if your question has not been answered or you have more questions, feel free to keep sharing in the chat or unmute. But I had a question I have a question around, like, data trust members and kind of the model, like, as it continues to grow because I feel like it's not like, yeah. AI is constantly growing because datasets are constantly growing. And so models are continuously being built and trained and, and used in the world. And, like, I would love to hear about, like, how data trust members can express their opinions on, like, ways that models are being used. Jennifer, I I feel like you said you have to hop off, so you're probably saying goodbye. But thank you so much.
Speaker 2
4:45 – 4:45
I think Jen and I already left, and they just frozen. There we go. So sorry, Valerie. Your question about this of, like, I guess, the ongoing updating, the kind of continuous updating of the preferences into the agreements.
Speaker 1
5:00 – 5:00
Yeah, exactly.
Speaker 2
5:15 – 5:15
Yeah. I think I mean, I think it is like a massive challenge with the way that, you know, it's never like a one way process with model development. And this is, you know, preferences one week versus preferences the next training one model last week and training one the following week. I think that's a massive challenge for us in terms of trying to, you know, make assert that our preferences is something that we're going through. A big debate in The UK at the moment is the AI copyright consultation that the government is doing, and they're talking about opt outs. And there's a you know, I think there's something interesting about the challenge of opting out once versus opting out today or tomorrow, last week, the week before, and, like, it's kind of a massive problem. In terms of our project, they we set up this I mean, communicating and working with over 300 people on a regular basis is a massive challenge. So we set up this data representative group that has a representative from eEduQire. And the idea is that we periodically check-in with them to check that the, you know, either that the governance still meets their desires or to check, like, new sort of edge cases that might come up or unexpected opportunities, perhaps, you know, yeah, something that we haven't considered before. So there's this opportunity for ongoing kind of interaction. But, yeah, I think it is a big challenge because these things shift, and you might change your mind whether you wanted to commit to commit last week.
Speaker 1
5:30 – 5:30
Totally. Well, thanks. Any other questions for folks?
Speaker 4
5:45 – 5:45
I just wanted to mention there was a talk I saw of Matt talking about, have I been trained? So the option we're talking about is people opting in, but this is the the opposite of opting out. And he talked about how that was a big project that came out from the call at the Serpentine, but I don't know Yes. The background on how that evolved, and if the Serpentine was directly involved or is that is that a spin off? If you want to talk
Speaker 2
6:00 – 6:00
to As have I been trained to predate the sort of Serpentine collaboration? So that's something that Spawning has been working on for quite a number of years now. I think I can't remember when they first released it. But, yeah, it's definitely one of the few examples of anyone who's built, like, a kind of workable tool for opting out, whether opt out is ultimately the best option for or, like, most best option for artists is you know, it seems to be like now that there's a consensus that opting out is our only choice. But like, you know, I like to keep the flag flying for opt in as as an option. But no, it's amazing work that they've done. And I think the interesting thing with working with Holly and Matt as as collaborators on this project is they are artists, but they're deeply involved in kind of these bigger infrastructural questions around data and how we manage that in and across different domains. So they're kind of like the dream kind of partners for the kind of work we wanna do, which is really like not just looking at kind of the creative, artistic, or technological R and d, but really thinking about the kind of infrastructural questions and how we can contribute to that.
Speaker 1
6:15 – 6:15
I have not a question, but a comment or a reflection perhaps, which is just hearing this, like, really broken down each step in the process. It's so crazy and wild how to contrast this process that y'all have created to what is the standard.
Speaker 2
6:30 – 6:30
Yeah. Mhmm. Yeah. I mean, it's and I think, again, like, this is another interesting thing, I guess, for our our department and also Holly and Matt, which is that we're interested at every level of this stack. Like, we're interested in, you know, the data collection and the kind of relationship with choirs and building that trust, building that into that sort of interaction and collaboration, but also interested in the technical development of these models. And so it's yeah. It's kind of like trying to intervene at each stage. And I think that comes back to Holly's kind of provocation at the top there, which is like if all if all media is training data, you know, how do we produce beautiful training data or like how do we produce beautiful processes for you. And, you know, how can we do that? And in order to do that, you need to kind of be involved at every step of the game. We you know, it has come up that, like, you know, this was like an experiment in, like, good faith AI model development. And, you know, the reality is probably more adversarial where there are competing demands, competing interests that maybe need some more for, like there's more tension and more kind of negotiation needed. This was very, like, kind of like, you know, idealized version, shall we say. So it's interesting to see, like, yeah, how we could translate, push this forward or perhaps, you know, work with other people in the future to sort of see how this plays out in a more adversarial kind of context. Yeah. The goal is standard. Yeah. Hopefully. Yes.
Speaker 6
6:45 – 6:45
The last point you just made about what it might look like in a more adversarial context
Speaker 2
7:00 – 7:00
Yeah.
Speaker 4
7:15 – 7:15
Can you
Speaker 6
7:30 – 7:30
tell us a little bit more about if any soloists or choirs wanted to really exercise their exit option? Like, they kinda went through the process or halfway through it and they realized, oh, man. This is not for me. Like, what did that like, the a, did that happen? And if so, how did you all navigate that? Is it actually even feasible to extract their data from the broader dataset once it's been trained?
Speaker 2
7:45 – 7:45
Yeah. I mean, it's a it's, a it's a great question. It didn't happen as far as I understand. And in terms of like, yeah, that removing yourself from the dataset, I mean, it's it would be it wouldn't be impossible because each choir was recorded individually. So we could kind of, you know, segment it according to each choir and only include the choirs that we had a 100% approval, etcetera, etcetera. It does get more complicated once the model's already trained and, like, questions about, like, them wanting to remove themselves from that. As I understand it, the you know, with the agreements of the kind of trust that they had in the artists in Holly and Matt was meant that I think because of the way they approached it meant that we didn't have these kind of issues. And but you saw there in the data preferences in the survey that we recently done that one of the buyers wasn't interested in sharing their data with this US startup even though it was a nonprofit, open source kind of model development. So there is, like, dissent or there is kind of, like, different opinions. Quite how we deal with that is I'm not kind of sure quite sure yet. But, yeah, it it it could have happened, but, like, I think we were lucky enough not to.
Speaker 6
8:00 – 8:00
Yeah. No. Thanks for that. I I can imagine, like, if we kinda, zoom out a little bit from as Val said, the gold standard model of this. Yes. And just, like, digging into that part. Right? It's like, oh, how do we think through the actual portability, the exit option in terms of this? But so glad you guys didn't have to deal with it. That's very cool too.
Speaker 2
8:15 – 8:15
No. But I think it's I think it's super interesting. I mean, there's there's huge there's huge areas of further research that we haven't done yet or haven't had don't actually ultimately, we'll never have the scope to do. And I think, like, some of these questions would be amazing to kind of dig deeper into and explore in detail. And also, like you say, we only have 60% of the signatures, but it and I think there's there's gaps in what we were able to do. Like, I would have been amazing to work with a designer who who knew who kind of, like, has that experience of, like, managing preferences and signatures with with communities. That's like a whole skill set that we didn't have in this project. It would have been amazing to have. So, yeah, there's there's a it's an there's there's gaps, but, hopefully, that's exciting for everyone to jump into into.
Speaker 6
8:30 – 8:30
Thanks, Annie.
Speaker 2
8:45 – 8:45
No worries.
Speaker 1
9:00 – 9:00
Awesome. Yeah. Thank you so much, Tommy and Jennifer, for joining us. We will end as we always do by giving our speakers a round of applause on meeting and showing our love. Thank you. Thank you.
Speaker 4
9:15 – 9:15
So just knowing yourself. And super lastly, the right way no more babies if you're I love
Speaker 1
9:30 – 9:30
the TV in the background.
Speaker 2
9:45 – 9:45
That's great. Thank you.
Speaker 1
10:00 – 10:00
Alright. Y'all, thanks for coming. Have a great rest of your days, and hopefully see you again at another Medigift seminar soon. Bye, everyone.