Introducing Koi Coco Miller Rennie Zargham

Speaker 1 0:00 – 0:00

Cool. Hi. Hello, everybody. Welcome to another MediGov seminar. It's nice to be back here. It's nice to see so many people here today. I think right now, we're about 30 people. Still people trickling in. It's lovely. Lots of new faces, lots of fresh faces, lots of old faces. So good to see so many people here for the first time. Today's seminar is going to be on a project called Koi pond. We're gonna be discussing COIS, knowledge organization infrastructure, and how they integrate into Manigo's ecosystem. Today's presenters are gonna include Brooke and Anne Coco, Luke Miller, Ellie Reni, Kimreck Lassargam. And I'm basically working from Koy during this presentation. Like, I'm Scent. I work at MediGov currently as of the community manager with Sal. Name's also changing. Senti. And I'm I'm ready. Like, I'm using Poi. I'm trying to, like, get as many people to, like, use it and become malleable and agentic with it. So yeah. Cool. Okay. Good morning. Welcome, everyone. I'm gonna be the host of the session. We're gonna do a thirty minute sort of presentation by the four presenters today. They're gonna go quick. I'm gonna keep things moving fast. Gonna be in the chat. I'm gonna be answering questions. I'm gonna be working with people. If you have questions as we're going, post them. And then after they present for thirty minutes, we're gonna do thirty minutes of recorded discussion. I'm gonna try to collate and compress all the questions I get. I might be able to throw them into Koi and see what Koi says and then pose them to our panel of presenters today, and then we're gonna do thirty minutes of unrecorded after seminar discussion where we can follow-up and really kind of do a deep dive into the tank. So with that, who am I passing off to first? I think it's Ellie. Right? Yep. Cool. Ellie, take it away.

Speaker 2 0:15 – 0:15

Sure. So COI stands for knowledge organization infrastructure, and if Senti is COI today, then Senti is gonna be hallucinating and saying some pretty strange things just to warn you for reasons that we will get to. It's still a very early stage project.

Speaker 1 0:30 – 0:30

It's true.

Speaker 2 0:45 – 0:45

So yeah. Many months ago, in a GovBase call, Josh Chan was asking the question, would adding an LLM to our Slack, as in MediGov Slack, make it easier to stay across our organization, to onboard people, to encounter meta gov's rules, those kinds of questions. And many people over many months joined the gov based call to discuss, and thanks to all of them for their inputs. My own personal interest was whether an LLM with access to a collectively agreed or bounded set of knowledge objects could be used by an ethnographer or anyone else seeking to understand the dynamics of the group. And around those early days, Michael Zagham, here about to hear from, came into the discussion to suggest that we use the Koi architecture, which had been developed by Block Science, and he will describe it better, and connect that to an LLM as one of the many possible Koi services that we could use and develop, which we will. So the question of what would a COI provide that other approaches using LLMs would not, for me, the answer was the ability to govern how knowledge is accessed and treated. And, typically, a firm will have its system designed so that knowledge processes conform to a system's instructions. But with a COI, the ontology of the system is local to the organization, and it can be emergent and evolving. So our interest at MediGov is how a group can cocreate that ontology, how it the ontology of that system changes through our community interactions and decision making, and also how we might be able to connect our knowledge to the knowledge of other groups through these systems. So I'm gonna hand over to Zagun right now.

Speaker 3 1:00 – 1:00

Awesome. Thank you. So, Ellie and I are co PIs on the COI project at Medigov with a kind of split between, social and technical focus, but we've collaborated for for for many years. So this has been a really fun project to bring our our works together. In particular, I wanna emphasize the fact that the technology technology that we're kind of experimenting here is not an LLM technology. It's a technology for managing information. And the first version of the software that we have iterated on was actually really about indexing internal knowledge at block science, where we do a lot of engineering and kind of translation work, where we're dealing with requirements, designs, algorithmic policies, rules, all sorts of different knowledge objects ranging from just, messages to documents to formal requirements to simulations to data from, like, operational systems. And so we found ourselves, you know, managing lots of different kinds of information and needing to remain across it. And so we started building out some systems. And then we went from indexing an internal search, slapped on an LLM as a sort of UX affordance, realized it fundamentally altered the user experience patterns, behavior of the team members, and then have since continued to iterate on our internal technology as a think of it as an organizational infrastructure. It's the tool to make what we're already doing work better with, as a result of our research, a strong requirement that we weren't assuming a kind of absolute ground truth kind of overly rigid ontology, specifically because we work across a lot of fields, and actually a rigid ontology created lots of issues. It made it very difficult for us to understand or find things. And so we we iterated on our tooling. We shared our technical implementation with the Medigov team, and we started this process of, re like, instantiating another version of the same core, let's say, principles and technology with a different, ontology and with a different group of people responsible for what it knows and how it expresses what it knows. And, for me, at least, this is a really exciting and compelling direction of research because we wanna move away from models where there's one size fits all technology or sort of overly rigid right answers to systems that can have kind of locally local sense making and, and even communication across regions. Longest view here is ways in which these distinct systems can talk to each other. If you're interested in the specifics of the technical motivation in architecture, I would check out the A Language for Knowledge Networks piece. But today, rather than talk too much about the way we imagined it, we're gonna focus on letting Luke demonstrate how it works today and the the the development work ongoing. But I believe before we get to that, I need to pass to Brooke to give you an important ethics related public service announcement.

Speaker 4 1:15 – 1:15

Yeah. Thanks, Argham. Hi. My name is Brooke. I am a PhD candidate at RMIT University in Melbourne, and I've been engaged with the MediGov community for about a year and a half now. As part of my PhD research, I'll be leading an ethnographic study that documents how MediGov navigates the adoption, integration and maintenance of its COI. And the research findings are intended to be used to provide an empirical feedback, to refine the project's digital governance models. So throughout the COI project, I'll be focusing on digital governance tools and the collective governance frameworks that our community implements to better align our COI with Medigov's values and strategic objectives. So because my research will examine how social dynamics and digital governance tools mutually interact and shape one another, ethnographic observations will specifically be focused on formal and informal governance interactions. So to that end, I will be monitoring different community discussions related to COI that occur both within Slack as well as in online meetings, such as this one. So if you're interested in participating in the ethnographic research, I'd encourage you to attend weekly gov based labs meetings that occur every Thursday at 9PM UTC, and any other meetings in which, the koi is is is the topic of conversation. I'll be attending these meetings and recording ethnographic notes based on what I observe. I won't publish anybody's name or contributions unless I received explicit consent to do so. And then during some COI project meetings, I may also conduct, different group exercises. And anybody who wants their participation in these exercises to be included in the ethnographic record, as well as anybody who's participating in interviews will be asked to complete a consent form. So very importantly, you can still contribute to the, koi pond project without participating in the ethnographic research. And if you wish to be excluded from the ethnographic research, you just have to notify me either through Slack or email. And so on one last note to help me manage discussions occurring over Slack, I'll be using the telescope bot, which is a participatory ethnographic tool that enables you as a community to actively participate in the ethnographic data collection. To use the bot, anybody can tag Slack posts that they think are relevant to the research, with a telescope emoji. If the post is relevant, the bot will send a DM to the author of the post requesting permission to archive that post in the ethnographic database. Post authors only need to consent once. After they grant their initial consent, all subsequently tagged posts will be automatically added to the dataset. However, authors will still receive notification every time one of their posts is tagged, and you retain the right to the option to withhold any specific posts from the study. Authors may also withdraw any or all of their posts, from the repository at any time by letting me know. Right now the Slack integration is still being developed for the telescope bot. So in the interim I'll be doing some of that DM ing manually. So to review anything that I just discussed or if you wanna learn any more about the ethnographic research, you can reference this article that I'm putting into the chat now. That's also been bookmarked to the Slack channels that are being observed by court. If you have any questions, just let me know. But I know we're all excited to see, the demo that Luke's gonna give, so I will pass it off to Luke. Thanks.

Speaker 5 1:30 – 1:30

Alrighty. Thanks for the introduction. My name is Luke. I'm a researcher at MediGov and Block Science, and I'm the, technical lead on on the koi pond project. So let's get started. Can everyone see my screen here?

Speaker 1 1:45 – 1:45

Yeah. Yeah. It's good. Great.

Speaker 5 2:00 – 2:00

So I'm gonna be walking through the systems that power COI and also giving a demonstration of how you can currently use it. So I'm gonna start by breaking down this acronym COI. So COI stands for knowledge organization infrastructure. So we can kinda talk about each of those components and what they mean. So first is knowledge. So COI is concerned with knowledge objects, and we define that as a pretty, pretty broad category of things. So it's really anything that we can identify that we want to consider as knowledge. So we have things that, you know, you more traditionally consider as sort of digital objects, Slack messages, blog posts, images, videos, web pages, arbitrary files. But we can also refer to more dynamic things like database queries, services, or even function calls. And we're also not restricted to purely digital objects, so we could also refer to physical books, people, and locations. These are all potential knowledge objects that we might be interested in talking about through COI. Secondly, it's about organization. So this looks like a lot of different things. It looks like communal governance. It looks like automation. It looks like manual curation of knowledge that people are interested in. It looks like the ongoing, development and configuration of the koi system. And this kind of organization is really interested in questions about how do we represent knowledge, what knowledge do we care about observing, And how do different knowledge objects relate to each other? And finally, COI is infrastructure. So it's a set of interlinked systems enabling the organization of knowledge, which is connected to sensors, which are the inputs to the system, which can observe knowledge objects and interfaces, which interact with the knowledge that we have stored in the system. So importantly, this is not just about LLMs. LLMs are our current, main interface and integration with the Koi system. But we want to be able to support future tools and integrations. And so Koi fundamentally is knowledge infrastructure. It's not a specific product or or interface. And so the the main way that we deal with knowledge objects is through this protocol we call reference identifiers or r IDs. So if you look at this kind of drawing down here, an r ID is a pointer to a knowledge object, which is that that class of of things I discussed in the previous slide. So the RID is merely an identifier, which which provides the mean means of referencing, any any range of knowledge objects that we're interested in talking about. And so the way that we get from this identifier to the knowledge and and underlying data itself is by dereferencing it. And so the way that this this looks like is we have a dereferenced function that's associated with each type of of RID that we're talking about. And that do reference function will take the identifiers that are stored within the RID, and input those into a function which can return the data. So I'm gonna break this down a little bit. So at the top here, we have the format, of of how our IDs are constructed. So you can see that there's three main components here. The first component is our space. This is sort of a wide a wide space in which different types of knowledge objects might exist. So on the right hand here, you see the diagram of what the Slack space looks like. So spaces are gonna map to larger containers like platforms, such as Slack. Our second component here is the format. The format returns refers to objects that would exist inside of the space. So in the case of Slack, we have channels, workspaces, messages, and users. And then finally, we have the reference component. And this is the component of the r d, which actually is referring to the thing that we're talking about. So in the case of Slack, you can you can look at this kind of first r d here at here at the bottom. The first component, the space in the format comprise the means of reference or the way in which we're referring to a thing. And then the the right half of the of the r d here is the is the reference itself. And so the structure of this is going to be determined based on, the specific means of reference that that already is using. So in the case of Slack, we have three sort of sub IDs here, which are broken up into the workspace ID, the channel ID, and the message ID. And so when we dereference an object, we have a function that is bound to the means of reference. So the slack dot message has its own function. And then we pass in the right hand component, the reference into that function. So it's able to break down this reference component into the necessary IDs that are needed to make a call to the Slack API and retrieve the underlying data that's associated with that knowledge object. Below that, I have two other examples of types of knowledge objects that exist in our system. So we can support sub stack posts, and then we sort of have this generic r I d handler, for web pages, which does, a simple get get request to, a URL. So r I d's formed, the basis of our system. It's kind of the language, that Koi uses to talk about knowledge, and to coordinate interactions between different subsystems that that Koi is made up of. And so the most kind of important subsystem, which we're gonna focus on today, is, the graph. So I'm I'm sure many of you are here because you're interested in in knowledge graphs. So this is kind of where it quite ties into that. So I'll quickly walk through our kind of implementation of a knowledge graph and the primitives that we use to build relationships between knowledge objects within our system. So the first component we have is an object. So this is any any knowledge object excluding the graph relations. So going back to our last our last slide, it could be a Slack message, a Substack post, a web page, or any other type of knowledge object that we add to the RID system. Next, we have sets. These are unordered groups of objects, that can be empty or contain kind of an arbitrary number of, objects within them. And then our last primitive is a link. This is a named directed edge or connection, between objects and sets that that live inside of our graph. And so you'll notice that I said in the in the first point, that objects are any r d objects excluding the graph relations. And the reason for that is that these sets and links are actually objects that have r d's themselves. And these are things that we can consider knowledge objects because they tell us something about, the relationship between knowledge. So to to look at an example here, on the right hand side, we have, like, kind of a a rough diagram of what this sort of graph, architecture might look like. So we have this green node here, which is has an edge pointing to the set of blue objects. All these blue nodes have their own individual edges to additional sets, which contain additional objects. And so in the case of Slack, this is what that might look like. So we have the set of of object types or formats to use the the r e, language, which can is composed of messages, users, channels, and workspaces. And these are the types of relations that we might want to represent within Slack, to sorry, to represent within the graph the hierarchy that that Slack has. So we might want to to map a workspace to the set of channels and set of users within that workspace. We might wanna map the set of messages within a channel to the channel they're in, and a user to a set of messages that they wrote. So I'm just gonna pause here to show these are some kind of cool graph views from our current system. These are different queries showing, subgraphs of our all of our Slack data. So the picture here on the left is is the large graph which shows all the connections. It gets very tangled because you have, messages that belong to threads, messages that belong to users, belong to channels, and all those connections kind of pull them into these groups. On the right hand side here, you have an isolated view of individual channels. And then the clusters that are kind of surrounding them are, threads that live within those channels. On the left hand side here, we have a, sort of, collection of all the individual threads not associated with any channels. And then on the right hand side here, this is actually my, user, showing all the messages that I've written and then their relationship to, the channels within Slack that they belong to. So I'm gonna talk about the current capabilities and then do a quick demo. So the sensors that we currently have active, and and sensors are the ways that we observe new knowledge objects into the system. We have these sort of four four inputs. So the first one is the live Slack sensor. So this is, actually was just turned on right before this call, which is now live in, these four channels down here, koi pond, board, attention, econ, and GoutBaseLabs. So this is rec receiving all of the events that are that are that are sent out every time the message is sent in in one of these channels, and those messages are automatically observed, into the system and then added to that graph that I just showed you. Second sensor we have is the Slack backfill sensor. This is a kind of, tool that we can run once to generate a huge graph and and compile all the historical Slack data that we have for Medigev. And then we have a similar sensor integration for Substack and sort of this catch all websites. So Substack is able to pull all of the associated Substack posts from Medigo Substack through their API. And the websites, we have sort of a manually curated list of websites of interest that we wanted to to enter into the system. So all these sensors provide the system with all the knowledge objects that it that it uses for a set of interfaces. So the interfaces are the way that we interact with the data, with the knowledge that's in COI. And so as I as I talked about in the first slide, Koi is not an l o m, but our main interface to Koi right now is an l o m. So it is an important part of the system. So the the the interface that we have live right now is the quick chatbot. So it's powered by an l m using, retrieval augmented generation system to pull knowledge objects from our, VoIP knowledge set and use that data to inform its response, when talking to users in the Slack. And so that uses a vector source system. So the knowledge objects within Koi, another subsystem is able to take that data, embed it into a vector space, and then use that, semantic search capabilities to help the LLM pull down data that's relevant to the conversation that it's having with the user. So to be clear, the the data observation from the live Slack sensor is happening in in these four channels listed here. But you can talk to Koei across any public, MediGov channel. And it will be able to if you look at the on the right side here, it will be able to cite all the sources and and knowledge that it pulled down. So let's quickly ask a question. So I'm just in the coupon chat here, and we're gonna ask it. Tell us about point on product. And so it's still a very basic system, but it has some capabilities. So if you have a conversation within a thread, if you continue that conversation within the thread, it will retain the context and history of previous chat messages. So it it came up with this answer, and you we can take a look at the sources it used. So it's taking the Medigov, Koipon spotlight post from the substack. It's taking the Medigov project clip on page, and it's also taking some Slack messages, which we're talking about the project. And so to use koi, you have to specifically mention it, using at koi even in a in a thread. And so it's only it's only viewing the messages that you specifically reference, specifically mentioned mentioned the bot end. So I can follow-up and ask, you know, how does link filler connect? Click on project. And so it's going to kind of remember the context of this conversation and pull down some more messages. So, yeah, there we go. I'm leading the development. That's correct. And and then we have the the sources here. Cool. So, yeah, feel free to try this out during or or after the the call. You can talk to coin any channel and and hopefully find out some stuff about my gov. Who have you got? So so this l m integration is still pretty, new. It has a lot of limitations. So it's not currently using any of the relational graph data that I that I talked about a few slides back. It has just very basic rag functionality. So it's just taking the prompt that you give it to it and retrieving related vectors, from the vector store, which match to knowledge objects and using that in its response. And so because of that and because of, you know, the the failings that and and and limitations, I think we all know about when talking with l m's. It can misrepresent people and knowledge. Even if the knowledge is in the system, it might not find it. Even if it finds the right knowledge, it still might make mistakes. So keep that in mind. So, yeah, that's kind of the end of the presentation. Before we go, I just wanna shout out a few things. So the this project is open source. You can check out the repository here. Join the clip on channel if you're interested in talking about this further. There should be a link in the the Zoom chat if you're not in the menu of Slack already. And then one last thing is we're doing a pull down of of a bunch of historical Slack data on Friday. So to go back to this slide, we're currently running the Slack sort of live sensor, but we're gonna be running the Slack backfill sensor on Friday, with everyone's data who wants to opt in. So if you want your historical messages across those four channels that we talked about to be part of the coin knowledge base, Then you'll have to opt in by Friday, and I can quickly show how to do that. So go to Slack and click on your profile and scroll down to the bottom. We have this about me section. And if you hit edit, this is where you can change your data export consent status. So if you want to opt in to this historical data, then make sure you set this to the green dot I consent by Friday at noon. Thank you.

Speaker 1 2:15 – 2:15

Nice. They're under thirty minutes. Beautiful stuff. We got a lot of questions here. Okay. I think what I'm gonna do is I'm gonna share my screen, and I'm gonna let's get Koi to put some questions together. So we've got these questions here. I'm collating many questions from the ongoing mega seminar. Please put them into a question. So let's see. Let's get Koi going while I'm doing this. Let's see how it works. So you got this little brain going on, and then now we have a response here. Okay. We've got six questions here. Don't know if these are actually really that helpful. But let's go with the first one, and then I'll kind of backfill from there. So, actually, let's let's do a little curation here. How does the COI tech tech stack serve the interests of function users compared to other platforms? Who wants to take on that kind of hallucinated question?

Speaker 5 2:30 – 2:30

I can take a stab at it.

Speaker 6 2:45 – 2:45

Okay. Go ahead.

Speaker 5 3:00 – 3:00

I'm not I'm not quite sure what, coin means by function users.

Speaker 1 3:15 – 3:15

Yeah. Neither do I.

Speaker 5 3:30 – 3:30

But I so I think the main advantage of coin here, and this is not something that, this is something we're working towards, which is not currently, like, the system is not currently capable of. But it's the communal management of this knowledge set. So with a lot of these technologies like LMS, you have, like, this huge amount of data that's kind of vacuumed up by OpenAI or or other training platforms that is all goes into the system. And there's very little kind of transparency or control, over how that data is used. So I think Koi provides a lot of advantages in that we can have, kind of very granular control over which knowledge objects that we want to include in the system. And we can do things like manage a specific set of objects that we wanna use for a certain purpose. So we could, you know, be able to represent a a wide range of consent statuses, or kind of intentions for the use of data, through communal governance processes.

Speaker 1 3:45 – 3:45

Okay. We got a maybe the next follow on to that question and kinda touched on the RID stuff is, is this format analogous to a type in programming, or is it more flexible? So talk a little bit about RIDs, the, like, the idea of type systems, and how you're doing referencing in the system.

Speaker 5 4:00 – 4:00

Yeah. Good question. So I think that I would say it's not really analogous to type. So if you're familiar with the URI specification, it's more like a scheme. So in the case of URLs, you have the HTTP at the beginning, which tells your browser what kind of protocol to use to to kind of get the data associated with that URL. So our user similar in that sort of the means of reference, those first two components, the space and the format, tell the system what functionings to run-in order to retrieve the data. But it doesn't necessarily say anything about what type, or form the data will be in when it's when it's retrieved from its source.

Speaker 3 4:15 – 4:15

Yeah. And I I wanna kinda jump in here that, like, actually, one of the major areas of research, and Luke has been really pushing this ahead, is sort of comparing and contrasting our requirements that had gave rise to our IDs in order to build the system with the specified functionality of URIs. And so, like, what we've observed is that, like, what we need is something that behaves like a URI, but we also wanna equip it with some potentially extra requirements. And so by satisfying URI specs, we get a high degree of interoperability. But by specifying some additional, needs, we can just be more specific, whether it's more specific because the organization wants something different, or it's more specific in the sense that the data service or dataset that's being referenced, or whatever the kind of knowledge object is, you might need some specificity in order to make sense of it, And that actually really does end up being one of the places where organization specific stuff comes into existence. So, like the way that we treat Slack is actually born of the values and preferences of the organization, Medigov. We could, in an extreme case, be like, hey, according to the terms of service of Slack, we get to access all this data, but that's not within the aligned with the values of Medigov as an organization. So the way in which we are referencing, accessing, and and just making use of data from Slack is bound by the policies that we are implementing via the software, and the patterns which we can support in our Koi are different from the patterns that are supported in other software because we've designed it with different requirements in mind.

Speaker 1 4:30 – 4:30

Maybe that's someone already asking a question in KoiPond. I wanna kind of do an unprecedented move here, but I think it's a good question. Why does Koi start with Slack and Substack and not some open source projects? There's an answer here from Koi that I'll post in the chat, but I'm curious to see how the answer that Koi gives aligns with the answer that the panelist gives.

Speaker 3 4:45 – 4:45

I'm gonna take a cut, but I think we should actually all answer this question maybe. So my opinion is that the development of a technology like Kauai is not really a purely academic project. It's actually operational. We have pain points where there's lots of information. We're trying to find it. We're trying to make sense of it. And what what ends up happening is that the lowest hanging fruit comes first, and it becomes a, like, hey. On a day to day basis, we are mostly interacting via Slack, and Slack is already very chat based. And so it was a natural location for these interactions, and the substacks are the things that are the most immediately necessary information to distribute from the perspective of this organization because it contains information about projects. And it's Substack as well as the website, which project information, you know, information about policies and rules within the org. And so you could just look at it from the from this, like, really, honestly, boring perspective that, like, that's the interface people are using. That's the most easily accessible data that that they actually need and want and are currently benefiting from getting this way. Iteratively extending to other stuff very much on the road map. But, you know, start with something real. Make it work. Try to make it better.

Speaker 1 5:00 – 5:00

Any of the other panelists wanna jump in?

Speaker 2 5:15 – 5:15

Just to say that we often we often think about very formal interactions as being superior to day to day, more conversational, interactions and questioning and decision making kinda on the move. And so one thing we're interested in this project is how those informal governance interactions, which don't even feel like governance, are actually steering an organization and whether we can have technologies that are responsive to that. So the the the more ethnographic components of this work, there's Brooke's PhD. Kelsey's doing some work on this on the block science side, and I'm kind of across the project in general is is concerned with those types of questions as well. David?

Speaker 6 5:30 – 5:30

I was just gonna say that the the the the purpose here in in my mind is is to be a a tool for an organization. This I think Zargan said it to some extent. The if a if a GitHub repos are the be all and end all of your organization, then by all means, that's the that's the first, sort of sensor that you wanna build out. But, you know, block science publishes a bunch of of papers. So we we brought in a bunch of papers, and Medigot talks a lot. So we the it's it's the it's the chat interfaces that are most interesting here.

Speaker 5 5:45 – 5:45

Well, I'll jump in at the end here. I agree with everything that was said. I wanna take it from the technical side, though. I think one of the key advantages of of coin the RT system is that we're really not dependent on any of these platforms. So in this case, SubSec and Slack do not form, like, a core part of our infrastructure. They're merely a source for a potential set of knowledge objects. And the the sensors that we build to interface with those platforms, and the integrations we build to do kind of automatic observation, Those are all open source, and so anyone can kinda take advantage of that. And at the same time, we wanna make it easier for for future communities that use Koi to build their own sensors for platforms that have knowledge

Speaker 6 6:00 – 6:00

they're interested in

Speaker 5 6:15 – 6:15

observing that don't exist already. So that would definitely that have knowledge they're interested in observing that don't exist already. So that would definitely include open source stuff too.

Speaker 1 6:30 – 6:30

Cool. Great. Thank you. Okay. There's, like, a let me see if I can try to, like, compound these two things together. There's sort of, like, two questions around, like like, the kind of resonance that this project has with the semantic web. I know RFPs also get brought up a lot with this. And I'm curious, like, how if at all I mean, the question is, like, whether or not it's kind of an accurate or inaccurate analogy to make between this project and the kind of semantic web. And alongside that, I'm curious if you have any thoughts on ways in which the kind of latent semantic undersurface of this might be able to kind of naively represent itself in a way that actually starts to have an influence on the way that people are interacting with each other in their digital environments. Currently, it seems like the way that the system is set up, it's it's like I mean, every time it posts, I feel like I'm just getting spammed with, like, 20 links. And, like, it's also incredibly verbose, and it it it goes so quick, like, in a way that, like like, a human would never actually type. Like like like, these kinds of, like, frictions that are kind of built into systems that are otherwise automated in order to kind of give it a sense of, like, a not humanity, but a sense of, like, to kind of expand the interaction space interaction time space. I'm just curious if you could sort of talk about how the system might actually be more adeptly situated within its environment in a way so that it becomes a kind of interlocutor or collaborator, a kind of agentic force within, like, within an organization rather than just a kind of butler or server. And maybe you can try to make a connection with that question and this idea of, like, the semantic web. A little bit abstract, but you you all are very smart. So I'm curious to see where you take it.

Speaker 3 6:45 – 6:45

I wanna react because I kinda oh, go ahead, Ellie.

Speaker 2 7:00 – 7:00

No. I was also just gonna say for the more technically minded people on our team, add to that whether the way that we're using it and the kinds of responses it's given, it's also just a function of it being plugged into ChatGPT at the moment and whether other more localized AIs might give us a different kind of interaction that feels more homely for us.

Speaker 3 7:15 – 7:15

Yeah. So the thing I was gonna react to was you you sort of comment about it being not just a butler. I think it's important to differentiate the there's sort of a a functional architecture going on with different components filling different functions. And the thing that's just like a butler is the LLM. It's sort of orchestrating. We're not trying to leverage the the large language bottle, the thing that hallucinates to to do kind of complex or specific tasks. We're actually imagining, and in some cases, implementing other services that do the specific tasks and that it just needs to know what to which things to invoke. So one of the practical use cases that we've discussed with with you, Senti, as well is the, you know, support for onboarding and whether we wanna have a specific kind of bot that helps match people with potential mentors or like, those would be more traditional microservices, which would have data available to them. They would have specific rules. Even if you accessed them by way of a conversational interface in Slack, it's really just the the ability of the the chat interface to identify what you're looking for potentially through an RID, pointer to it, invoke it, get its response, and return it to you. Some of the verbosity of what we have right now comes from the fact that we predominantly have a chatbot and a bunch of documents and messages, and that a big part of what we're doing with the battery of links is just making sure that we're limiting our hallucinations and making sure we're understanding whether it's answering in a way consistent with the materials versus making stuff up. And this is also part of the earliness of the project. Another piece of the research that is hasn't been presented here, but I think it's worth alluding to, is the kind of requirements engineering framework side of these kinds of systems. And so at Block Science, we have a pilot called Blocks of Docs where we actually characterize specific requirements around blocks of text and sort of working on asserting the extent to which LLMs can satisfy them, or which things that are returned by LLMs can be verified to satisfy them. And the important thing about that capability is it ultimately provides us some of the governance surface of these tools. If we don't want it to be so verb verbose, we need a way of saying and and technically, infrastructuring it to be less verbose. Right now, we have a lot of what I'll call default behavior in the sense that we just got it up and running. We're inviting people to participate in its governance, And we actually rather intentionally avoided making too many decisions about how it should behave because we wanted to present what we had, announce it, solicit feedback, and engage the broader community in deciding, like, how do we want it to behave? What data should it have access to? What kind of data services should be in this mesh? And, ultimately, the RID system is there to help make sure that we can identify those things and assert rules about them, and ultimately, both deliver, an infrastructure, a knowledge organization infrastructure to this organization, and ensure that the governance surface of it is, kinda exposed to the people in the community. And that creates some UX challenges. Long run, like, we you can't you can't totally govern a thing if you don't spend some time making sense of it, learning about it. And so we're trying really hard to strike the balance between technically building new capabilities, sharing those capabilities, and engaging this broader community in, like, how best to make use of them. And so I'm gonna stop there. I realized that was a little bit long.

Speaker 1 7:30 – 7:30

Thanks, Argham. Lee wants to jump in on the semantic web point because they're a nerd. That's their words. And then I'm gonna pass it over to Gregory. Sorry. I'm gonna I'm gonna mean that right. George. Sorry. CS who's gonna ask a question about argument argumentation theory. Great.

Speaker 7 7:45 – 7:45

Yeah. So read the semantic web stuff. I noticed that question coming in of a lot of this seems very resonant to semantic web things. Anyone can correct me here because I'm semi new to Koi. But the thing that's really exciting to me about Koi related to semantic web is that semantic web asks us to have basically ontology for the entire Internet that is interpretable to everyone and machines. And the problem with that is getting everyone to coordinate and agree on your universal semantic ontology, which is effortful and terrible and clearly still hasn't happened, even though we've been talking about it for a while. But the thing that's interesting and exciting about Koi is it allows a community to kind of define your own semantic understanding of your own artifacts. And then in in addition, individuals can add their own additions to this and their context to it. As a result, it's both responding to the community and individuals in the community as well as structured around a community's needs and directions. The thing that's also exciting about this is because it has that flexibility for, like, individual and community action, you can also link your community's koi with somebody else's koi very easily. Right? So it's like, oh, here's our context. And if we draw a semantic link to your context in your frame or your words that you're using, then we can almost, like, walk across different community knowledge bases in a way that doesn't require us to have a universal understanding of the Internet, but instead we can just kind of traverse this semantic graph, which eventually like, there will be maybe communities that are more isolated than other communities, but eventually it can add this communicability and machine readability without needing everyone to agree, which I think is really cool. And people can technically correct me if I am wrong.

Speaker 5 8:00 – 8:00

No. I I think that was a really good way of putting it. I just wanted to kind of add some more technical background. I I think that's a great point that, like, the the thing that coy really provides is the ability for communities to, like, hone down on what exactly their community benefits from. And that being said, we still want some interoperability to enable the type of, communication between communities or between coy instances that you mentioned. But that interoperability is not required at a global scale. And so, I mean, to kind of enable that, which is is why we're, like, developing this as an open source project, and as kind of the first stage as something that Medigob and BlockSense are collaboratively working on to kind of develop standards that can be shared between them. But it allows each community to sort of pick and choose which types of knowledge objects they care about, which types of knowledge objects they want to, build integrations for or implement existing integrations. And on the graph side, which that that kind of ties to which knowledge objects you're talking about. So a certain type of semantic relationship might go with, for example, Slack objects. So in in the graph I showed, there are certain relationships between channels and messages and users, which we're probably gonna wanna share the same syntax if two groups are talking about Slack. But it's designed to be more extensible in that global interoperability requires very, like, disciplined, standards development and and and governing bodies to deal with that. So we're we're sort of shrinking it down to a much more local scale.

Speaker 1 8:15 – 8:15

Alright. Nice. Let's move on to this question about, argumentation theory George.

Speaker 8 8:30 – 8:30

Hello. Thank you for the presentation. I'm not sure if anyone but, sent in well over on the previous week's seminar, but, Matthew there presented something similar where the split knowledge into arguments, inquiries, experiments, and some other. Do you have any taxonomy of knowledge? Do you use any estimate epistemic analysis to figure out what is just said and or how it's valid? Can't put the question better right now.

Speaker 1 8:45 – 8:45

Thanks. Who wants to take that? David, I see you came off mute.

Speaker 6 9:00 – 9:00

Yeah. We can take crack at it if nobody else wants to. The I I think the I the idea here is is is to sort of start from a naive position that that there's no, like, as little little presupposes as possible. And then build from there. So so to the extent that it would be useful to have sort of a categorization of knowledge. It can be developed. There's nothing in fact, that was one of the things that, you know, one of one of the use cases that our IDs was developed to handle. But there isn't there isn't one inherent in the system. I don't know if that answers, but

Speaker 1 9:15 – 9:15

Thank you.

Speaker 6 9:30 – 9:30

I think it also it also refers back to the, semantic web. The, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the, the in fact, that's in my experience and one of the difficulties to make those systems work is to sort of amass all of the relationships and and and and and structures that need to be amassed before you can use the technology. The hope here is that, some of that the effort involved in doing that can be can be automated away to some of these systems.

Speaker 1 9:45 – 9:45

There's a little bit of discussion in in the chat around interorganizational COI communication. It's kind of framed in terms of, like, bot battles. But, I mean, I know Orion Reed who has done some work at Block Science has set up a kind of a tail draw prototype where hoists between organizations can actually interface with each other. I was wondering if maybe you could kind of gesture a little bit in that direction of, like, we've been talking a lot about what COI means for a knowledge organization institution or knowledge organization, but, like, how do you see this kind of in a network sense? And, also, maybe maybe to kind of spice up the question a little bit. Like, you know, I mean, Daniel's pointing towards, like, bot battles, but, like, what are some of the, like, unintended side effects or consequences of this that that we that we should be thinking about in terms of having knowledge organizations interfacing with each other?

Speaker 5 10:00 – 10:00

If you can take a stab at this one. So interoperability is so I I also just wanna be clear about the the current state of the system. So we don't currently have any information sharing between block science and Medigap. This is part of the the research that we're doing in in spinning up to these instances. But our IDs are a big part of that. So one of the advantages of them, which I didn't really get to in the presentation, is that because we're not worried about sort of keeping like a ground truth or or restoring all this knowledge we're we're caring about in a single place, we're just pointing to where it lives kind of in the world. It becomes much easier to share, common reference points to what we're talking about. So let's let's say, for example, block science wants to share a set of RIDs that that can that comprise, a subject that they've they've curated. They could just send that set to Medigob or even send the RID of that set. Medigob could dereference that, once it's entered into their koi and retrieve the knowledge objects that are within that set. And Medigot may be able to directly dereference all of those knowledge objects on their side without having to go through block science. Obviously, there's gonna be cases where where organizations are gonna represent knowledge, which is more siloed or private or, like, in the medical slack, which is not technically a public space, but has a lot of people in it. And that and that's kind of a case where we're gonna wanna have kind of infrastructure to allow Medigap to ask block science for access to an an already pointing to something that they can't, access yet. But that's kind of the the direction that we're we're thinking about.

Speaker 1 10:15 – 10:15

Ali?

Speaker 2 10:30 – 10:30

Yeah. I think I think just to double click on that confidentiality issue, I think that's something that's super interesting in this project is the extent to which you competition off knowledge objects or sets of knowledge so that they're not accessible by the LLM or another organization or whatever it might be, something we're very interested in. But I think the the bigger potential here is really for organizations who we might call civil society or networks or dows or whatever to be able to share the things that might enhance the capacity of other organizations so that everybody isn't out there duplicating things, particularly organizations that don't have the kind of scalability or, you know, the the the kind of efficiencies of large firms. So how do we make small organizations or networks like MediGov more, not not really functional, but I I competitive is also not quite the word, but just enhance the capabilities in a network fashion.

Speaker 1 10:45 – 10:45

Great. Thank you. There's some technical questions in the chat that are getting answered async. I wanna remind everyone that you can ask questions to Koi in the Medigov Slack at Koi pond.

Speaker 3 11:00 – 11:00

Can I share the list of what we curated and invite people and maybe talk about possibly helping

Speaker 1 11:15 – 11:15

decide me? That's where I'm headed. What are people can people do on this call?

Speaker 3 11:30 – 11:30

Great. Let me first bump to Lee and the company about where do we how do we want people to share? Do we want them to write directly into the Google Doc we have, or do we wanna take an intake form? I can share what we have to date, but I I just wanted to make sure I'm not directing people in the wrong way.

Speaker 4 11:45 – 11:45

Well, I I'll jump in. I I while we're on the chat, I made another form for people to that's the same as the one that we have, but, like, for people to suggest. Because the one that we curated is more just Koi related objects. So I've created another doc here that I can share where people can suggest other sorts of documents that they're happy to be or that they'd like to see put up in Koi.

Speaker 3 12:00 – 12:00

Great. So, well, I'm just gonna do a quick overview of what's in there right now. So as a group, we just kinda, like, speed ran through and figured out

Speaker 1 12:15 – 12:15

Can you control plus that maybe two times? It's a little hard to see.

Speaker 3 12:30 – 12:30

Cool. So, we, you know, we made a we made a first draft of Koi specific object knowledge objects that we wanted to index and then got them out there. And so what you'll see is a couple Slack channels that were the originally targeted ones here, a bunch of objects that we called core, which are basically blogs and GitHub repos that were most directly related to this work itself. And, actually, one of the most fun parts about this was recognizing how much of our existing work was quote, predecessor. So we talked a bit about the telescope bot, which, this is the paper, about the telescope bot that Ellie and Luke and I, and along with many other folks at Medigob had worked on, Josh Tan, who's, I believe, also here, feeds into this research as well as some work on the block science side about what is computer aided governance and how do you think how do we think about governance in terms of its sense making and decision making processes that fed into this. There's actually just an incredible mix if you just look at the graph of authors and objects that in this list you can see kinda gives you a flavor of of where this comes from. But the point here is not for this to be Koi only knows about Koi. This was here to bootstrap. Strap. The goal in the long term is that Medigov itself needs to decide what belongs in the the knowledge base, and that could mean a bunch of papers by people who are community members. It could be a bunch of papers that we just refer to or cite frequently. I love, citing back to some work from, Suchman related to I learned about this from LA and her group. So shouldn't talk about it too much. Let her talk about it. But what I'm getting at here, sorry I'm rambling so much, is that what we need next is a a process to solicit contributions of what should be indexed and possibly some annotation. So even my initial heuristic here of core predecessor or application Koi work. This is stuff that we learned in the past and built on versus stuff that uses Koi, which is obviously the minority since this is the newest thing. And where we gotta go with this is decide both what should be included in these databases and how it should be annotated, possibly how it should be prioritized. So I'll I'll stop there, but this is the stuff that's in there now. It's not too long, but it's all pretty relevant to Koi itself.

Speaker 1 12:45 – 12:45

Okay. Great. We have one minute left. Let's go ahead and someone like, use the link that Brooke shared in the chat to engage with what Tarim was just sharing. We always like to give our speakers a round of applause. For some reason, we've been doing this for three years. It seems kinda weird, but we do it. So come off camera or come on camera. If you like, unmute. And on three, two, one, let's give a big round of applause, a cheers for our speakers. Three, two, one.

Speaker 3 13:00 – 13:00

I want a second round of applause for Ellie and Brooke because they're in Australia.

Speaker 1 13:15 – 13:15

Yes. Let's go. 2AM. Well, and applause to all of you for being here. Nice to see so many people here. I wanna point out that we have another seminar happening next week. It's also gonna be very good. It's by Denisea Cara. It's called experimental governance sandboxes and LLM agent based simulations. You definitely do not want to miss that one. Information is there in the chat. So see you next week. I'm gonna close out the recording, and then I'll, keep this open for a little bit if anyone wants to stay in chat longer. Where's the stop recording?

Introducing Koi Coco Miller Rennie Zargham

Top Keywords

Transcript

Listen