Bluenotes: Community Notes for ATProto

Uh take it away. Okay. Thank you everyone for being here. Um and thanks everyone for helping to organize this event. My name is Jonathan Warden. I'm an independent researcher. I am an engineer and co-founder of a small research organization called socialprotocols.org. My focus is on the problem of how the design of platforms, social platforms and algorithms has positive or negative social outcomes. And my big thesis is that if platforms can inadvertently create all these negative social outcomes, couldn't they be intentionally designed to produce positive social outcomes? So instead of misinformation and toxicity and division, couldn't they promote intelligent conversation and finding common ground.

I've been approaching this problem from the point of view of game theory, information theory, argumentation theory, collective intelligence, a whole lot of theory. And the reason I'm working on community notes now is it's one of the only examples I know of where the theory has been put into practice successfully and at scale. Community notes on Twitter has actually helped to decrease the spread of misinformation on Twitter. So I'm going to talk to you briefly about why I think community notes is great, both for fighting misinformation and as a unique and powerful scalable tool for peer moderation.

I'll give you a brief demonstration of Blue Notes, and then I'll talk mainly about the architecture, some of the architectural challenges with bringing community notes to AT Protocol and making it decentralized and composable. Some of the challenges include making annotations work via labeler infrastructure, privacy and anonymity, and of course bots and manipulation. So 300 years ago, Jonathan Swift said falsehood flies and truth comes limping after it, so that when men come to be undeceived, it is too late. And he hadn't even heard of Twitter. He um rumors have always misinformation or falsehood has always spread and it spreads largely because people think that it's true.

Some people are spreading misinformation maliciously, but it spreads faster because people don't know the truth. And so what what uh community notes does is it helps increase the velocity of truth in social networks. It helps the truth sometimes catch up to people. And so people, you know, where normally they just share something because they believe that it's true, it looks plausible, they kind of want to believe it's true. If somebody just tells them, hey, this is actually not true, this is just a rumor, this image is fake, well then maybe they wouldn't share it. Now, besides being a great tool for um for combating misinformation, community notes also is a very unique and powerful peer moderation protocol.

It is decentralized in the sense that there are no privileged roles, there are no administrators, there are no editors, there's just the community. And so it is moderation without moderators. And and that means that instead of trusting fact checkers or editors, um, instead of trusting people or in or organizations, you place your trust in the community and in the protocol that that community uses to make decisions. And there are some benefits in placing trust in the community. One is that people trust the results of moderation decisions more that way. People have a lot of trust in community notes across the political spectrum.

And one of the reasons for that is that it's perceived as being fairly neutral. Community notes was developed before Elon Musk took over. And it's still the the algorithm is open source, all the data is open, and it's still running using that same the same algorithm. And for the most part, people still trust community notes, even though they don't trust necessarily the person that's running it. And people perceive it as neutral because, in a sense, it really is neutral. It is not just a naive uh majority voting-based crowdsourced fact-checking system. It uses a very innovative bridging algorithm.

What this algorithm tries to do, it's not just trying to find what people think or what's most popular. It tries to find common ground. And so one way of looking at that is you can think of the algorithm as looking at people's votes and saying, okay, well, these votes, this person's upvoted this because they have a left-wing bias, and this person, all these upvotes are because this person has right-wing bias. And maybe this person just upvotes everything or just downvotes everything, so they just have this upvote bias. But these votes, there are some votes that cannot be explained away by bias.

And so this um this model uses discovers these latent factors that influence people's votes. And it says, look, this latent factor represents an area of common ground. And when a post or a community note has this factor, people are more likely to upvote that post or that community note across the political spectrum, across the spectrum of opinion. So in a sense, it extracts signal from noise. If you have, for example a group where the majority of people voting on community notes is right wing, which is actually what happened with what exists in X right now. If a community note gets a lot of helpful ratings, what does that mean?

Does it mean that it's helpful, or does it mean that it supports a right wing worldview? Well, it could be either. We don't know just by the fact that it's popular. We don't know why people upvoted it. But the community notes algorithm breaks that down into components, and it says maybe most of these upvotes are just because it supports a right wing worldview, but there's also a bunch of upvote votes that can't be explained away so easily. So in some what it does is it's it's a protocol that scales peer moderation in a way that it is robust to factual capture.

And I've written more about that on a blog post, um, Jonathan Warden.com understanding community notes if you're interested in that. So my goal is to bring atproto uh to bring community notes to atproto and to make it decentralized. I say pretty decentralized because I don't I don't know what the word decentralized means anymore after talking to all of you. I um but what I mean is that I want a system where the power of one any one individual is limited so that no one can uh can accumulate the power to censor the results. Um that is transparent, so all the data is open, anyone can inspect uh look at the data and run the algorithm and verify that it's um doing what it says it's supposed to be doing, and then the atproto ethos of algorithmic freeze freedom and composability.

Um the system I'm proposing allows you to mix and match your moderation apps, your algorithms, your labelers, and so on. Um I've designed this to be a decision uh generalized peer moderation protocol, not just for community notes, but for labels or any sort of moderation decision, and um I've designed this so they can work with other social networks. You can place labels or community notes on a Mastodone post, for example. So the the Bluesky team and a lot of people I've talked to here that are have their own uh social apps have indicated that they want to integrate some sort of community notes like feature.

Um, but nobody knows exactly how they want that to work. How exactly do annotations work in atproto, how do you make this decentralized? There's a lot of open questions, but I've gone ahead and just proposed some answers to those questions. I have an informal proposal called open community notes that I've um published at that URL, this URL. Um and then I've gone ahead and I've created an implementation that I um called Blue Notes. Blue Notes is uh based on a fork of the Bluesky Social app. I did this as a fork just to demonstrate how community notes could be integrated into a social app.

I don't necessarily want people to use this except for as beta testers because community notes is not useful unless there's some critical mass of users. There needs to be enough people submitting and rating notes for the algorithm to actually produce any output at all. And so, and then of course, people are not going to bother doing that unless the notes are actually going to be shown on post because that's the only way they do any good in fighting misinformation. And so my goal, my plan is to work with Bluesky and other skies, uh, other social apps to um to refine this spec into something that that they think is acceptable that they could integrate into their social app and um and then have a launch where there's a little bit of um publicity so that we can build this initial community of contributors and actually start producing some useful community notes.

However, in the meantime, anyone can integrate with this and it's fairly easy to integrate if you want your app to just show um community notes when they start to show up. So I'm gonna do a quick demo if I can make it work. Okay, okay, so to fork of the Bluesky app here, we have a section, community notes section, where you can browse existing community notes. So here I'll look at some community notes that have been rated helpful. These have been rated helpful because I manually went in and rated them helpful. Again, the algorithm doesn't produce any useful results yet.

So if a uh post has a helpful community note, you'll see that community note displayed underneath it. This looks just like it looks on X. You can then rate the note. I've already rated this one, but I'm going to delete my rating and resubmit. I'll say yes, this is helpful. I'll say why it's helpful. Right now, these check boxes, this data is not actually used by the algorithm, or it is used a little bit, but it's not very important. But I think it could be important in the future. So I've rated a note. Finally I can write a note.

Actually, I'm gonna pull up a post that's fairly recent. I'm gonna add a note to this post. Here's a video of Cash Patel dancing. I'm sure you all wanted to see that, but it's not actually him. It was posted, it's um misinformation. So I've actually copied this community note from the community note that was posted on X on the same post. You often see the same misinformation being repeated in both platforms. Factual area, this is not actually cache. Community notes require at least one source. You need to put in a URL. It's actually just plain text, um, but uh the UI checks that you've actually got a URL.

Um you need to say yes, I think this is a uh source that people would find trustworthy, and now I submit. Um and now here's uh here's a note. Um you'll see that this is a proposal, so it's just a proposed note, and I'll talk a little bit more about the lexicon in a moment. Um so that's it for the UI. It's um uh really quite a simple UI. And what's I think a little bit more interesting about community notes is the protocol is the the process for deciding how these notes or which notes are found helpful, which notes on which posts.

So first I'll talk about annotations. Um actually I'm gonna go back and show you one thing that I think is will be help you kind of see under the hood what's going on here. If you look at this same post on the Bluesky app, what you'll see is it has a tag. I've subscribed to the Blue Notes labeler. The yeah, it's got a label. Um I've subscribed to the Blue Notes labeler. And when someone first submits the community notes to Blue Notes, this label is emitted. Um, Bluesky doesn't know how to show community notes.

It knows there's a label and doesn't know how to show a community note. So what happens, Blue Note social app, um, the only difference is that it recognizes this tag, it recognizes the DID of the community notes labeler, and it says, Oh, okay, this is a community note, let me go fetch the content of that community note. And it makes an additional call. You can actually see that in the network tab for the nerds here to um get proposals. Um, well, get proposals. Um, let's see, that didn't work. So oh that's right, because it's not rated helpful.

So um, but that's what you would see. Um if there's an actual there's um if a note has been submitted but has not been rated helpful, you'll just see this prompt um to to go to go rate the note. Um, so uh that's how things work now. So it's a bit of a hack to put annotations on posts using the labeler infrastructure. The way it should work, and uh I think this is kind of obvious, is that you should be able to add generalized labels. So labels atproto labels are actually a type of annotation. Annotation is just information that one person adds to somebody else's posts.

And atproto labels actually map really well to the W3C annotations data model. An annotation in that data model is a target identified with a URI and then a body. And that body can just be a semantic tag, which is what natural proto label is, or it can be text, uh, or it can be some other things. Um so uh it makes sense to uh generalize labels and to allow them to the the label infrastructure uh to support other types of annotations. An obvious way to do that is to add a field, a ref field is what I'm proposing here that points to the URI of an app proto record with the content of the annotation.

And then what seems to make sense is that app views, when they are hydrating posts, right now, as many of you know, the the app view will return a hydrated post that has all the labels for that post, or at least the labels for the labelers that you've subscribed to. Well it would make sense if the hydrated post also included the actual content of the annotation. This would look like a lot like an embedded record, where if you if you quote tweet a post, you also get the the content of the quoted post in the hydrated record.

And then of course that means that the social apps need to know how to display these annotations. And if there's there might be there might not be a lot of different types of annotations. Some that I can think of are highlights, alt text, and then this context annotation is another type. And so there's some interesting discussions to be had about how do social apps know how to display your annotation, especially if you're using custom lexicons. There's some interesting discussions in the uh atproto discourse about lexicon embeds and embedding records from arbitrary custom lexicons in your posts.

So but I won't say more about that now. So the overall architecture, I actually based open community notes on an existing project this year, um, last year at this conference, Drew MacArthur, who was a student at UC Boulder, uh did a talk about PIMSCE, uh Pimski.social. PM is peer moderation. And he created this um the system in this lexicon for peer moderation that um was exactly um what I was envisioning for open community notes. So I've actually adopted his lexicon. He had he extended it a little bit for me. Um so there this lexicon has just two record types.

There's a proposal, um, and that's for a proposed moderation action. Um it looks a lot like a label record and so but it can be it doesn't have to be necessarily a label. It could be a pro proposed annotation or some other proposed action, such as I don't know, deleting a user or something like that. Or um and then there's a vote. People can vote on these proposals. They can agree or disagree, upvote or down vote. And that's it for the um lexicon. So the overall architecture looks like this. You have social apps that are able to display annotations.

They receive annotations as labelers, so individuals can subscribe to annotators. So you can subscribe to the Blue Notes Community Notes annotator or some other, or it can be an app labeler or a built-in app annotator, default app annotator. And then currently for community notes, you need to make it this additional call to the notes API to fetch the content of the annotation. So integrating with your apps, this with your apps is really quite easy. Um and then I'd like to talk to people that are interested in integrating with their apps on how I could make it easier, like for example, creating I don't know, a rack React component for for these notes or something like that.

Um the other piece, moderation apps, their job is to just do those things that I showed you in the demo. Um you can browse notes, you can submit community notes, and you can rate notes. And of course, moderation ops can also be used to to submit and rate proposed labels or other moderation actions. There's a submission service, which is an important piece of the architecture. Um, all the proposal and vote records are submitted through the submission service, and it has the responsibility of an anonymizing these records. And this is an important part of the architecture that I'll talk about in just a moment.

And um, and then there are scoring services, and you can there there's the community notes scoring service, but of course people can create their own algorithms for aggregating these votes and making decisions. And when the scoring service, when uh when a proposal uh reaches a certain threshold, when the score reaches a certain threshold, that proposal becomes a label or an annotation. Now, privacy and anonymity or pseudonymity is a very important component of community notes. Anonymity is a dangerous thing in social networks. Um if people can create a bunch of fake accounts and post a lot of anonymous crap, you can really destroy a social um an online community.

Um in some instances anonymity can be a good thing. And when it comes anonymous voting, for example, is a good thing. If everybody knows how you're gonna vote, that may influence how you vote and may make it harder for you to vote your conscience. And that's actually very important to the community notes algorithm. It kind of depends on people saying, like, ah, this time I this time I kind of agree with those people, right? Like, you know, I I don't like it, but this looks like it's really does look like it's misinformation. So I'm gonna honestly say, yeah, I think this is a helpful community note.

And so um so anonymity is important, but the algorithm needs some concept of identity. It needs to know you know these posts were made by this user. Um so the solution is um these anonymous IDs or my proposed solution, these anonymous IDs. So all these records are created by the submission service, the vote and proposal records. They're created using that submission service's ID. Um, and then the records can contain an additional field, an AID field or an anonymous ID field that corresponds to the user. Now this field is uh generated as a hash of the user's actual DID, which the submission service knows, and a secret salt.

And that secret salt is critical because it makes it so that you cannot easily map figure out somebody's actual user did based on their anonymous ID. Um, another challenge is of course bots. The community notes algorithm is naturally robust to bots. It is not as easy as you might think to manipulate the algorithm just by creating a bunch of fake accounts and you know voting um voting in one way. The the algorithm kind of filters that out. Um but with enough bots you can manipulate the algorithm. And so for community notes to work, what you need is a contribute contributor base that is mostly real people, mostly acting in good faith.

Ums or if you have too many people that really are not trying to rate things as helpful, it won't work. But you can have a lot of bias, right? You don't have to have a group of perfect people. You just need to have mostly real people, mostly acting in good faith. Um some possible solutions are proof of personhood, um, uh minimum account age requirements, for example, just general bot filtering tools. There's a lot of solutions that people have been talking about in this in this conference that we could apply. Um something called a rating impact score, which is what Twitter does, which requires you to have a certain reputation based on your ratings before you can actually write notes.

Okay, so finally, one of the reasons I'm excited about community notes is it is so there's so much unrealized potential. Twitter underutilizes that so much because it is I was inspired by Edith's um opening remarks yesterday. One of the things that she said is that social platforms should treat people as subjects with agency and not as objects to be manipulated. And I think one thing that people with agency want is to use these platforms to help discover truth. They want to share truth. Right now, so many social algorithms will show you what you want to see, show you what they think you will believe.

But that's not what I want a social platform. I don't want a social platform to show me what it thinks I will believe, especially if the platform knows or if there's a way to know that I wouldn't believe this if I knew the truth, if I had more information. And so I think there are so many ways if you think about how can we design that you we can use community notes as a base for a social platform that has a lot of affordances for helping people share and find accurate information. I would the there's a product called Roundabout that New Public has created, and they have, for example, a feature where when you submit a post before you submit it, it says, hey, um, do you really want to submit this?

This this violates some of our community norms. And it's just a prompt. Well, the same thing could happen for community notes. You could write something and the system could say, hey, look, someone else has posted a similar claim, and and and there is a community note saying that this might not be true. Are you sure you want to post this? I think the algorithm could be faster, it could cause it could um notes could achieve a helpful or unhelpful um status much much faster, which could help increase the velocity of truth and and help misinformation not spread as fast.

It can be more integrated with the ranking algorithms if the algorithm knows that this post is likely misinformation or that this post tends to this poster tends to spread a lot of misinformation, could downwork that content. A lot of possibility with correction and retraction if you've shared misinformation, the system might prompt you hey, look, you shared this, but now there's this community note to help undo some of the damage that happens when community misinformation is already spread. And then a lot of um other possibilities when, for example, a community note may itself be inaccurate, and what is the process for like putting a note on a note, which I think is um opens up some really interesting possibilities for like um almost a deliberative argumentation type process for making these decisions.

Okay, I took more time than I thought, and we've only got four minutes and thirty seconds left for questions. Um so any questions, anyone. Thank you very much for really super detailed uh and very thoughtful. I I just want to recognize the the amount of thought that you put into this. All right, any questions from the crowd here? Oh we got all right. Thank you. Uh yeah, super interesting. I'm curious if with the anonymity you could also integrate like reputation because it seems like that also becomes important over time. Yes. So the algorithm itself has a reputation score kind of built into it.

Um every the way it works is every user is assigned um two uh parameters. And these are two factors that kind of represents that person's bias. Um then this um this rating impact score is also a sort of reputation system. And there's some other things, but I uh in summary, yes, I think um reputation could really improve the quality of the system, right? Especially if you know like this person when they rate a note as helpful, it always ends up being rated as helpful by the community. And so you could you could say, look, provisionally, we're just gonna go ahead and display this community note as a helpful community note just based on you know one or two people's ratings that have a you know a high amount of trust.

Thanks. Really cool presentation and uh great work. Um it strikes me that some of the uh stuff that uh is worth building here uh that you've alluded to, like the anonymization service or you know, reputation um uh tracking algorithms or maybe like AI for deduplicating proposals, this kind of thing, is a centralizing vector um which is susceptible to potential like corruption, right? You could have uh censorship um of proposals during the anonymization phase or this kind of thing. So do you have any thoughts on how to uh build resistance to that kind of potential corruption. Yes.

I actually had a slide about that issue and I skipped because I thought nobody would notice, but um and so the the weakest you you've you've zoomed in right on the weakest piece piece of the system. The system the piece that requires the most trust, it's the submission service. So the submission service you have to trust, first of all, not to de-anonymize you, um, and you have to trust that it's going to submit what you say that it's going to submit. Now, some of that is verifiable. You can go and submit something, and then if you don't see it show up in the data, you can say, hey look, these people are not doing what they say they're going to be doing.

But that is the weakest point of the system. Um and it does require trust. So for example, if anyone were to um implement this in like in your own social app, um I would say run your own anonymizer service. Um, and that allows you also to control who can contributes and help you can um help control for for bots and just you know general good faith contribution in that way. Um in the long run, I think that weakness can be could be possibly addressed by with cryptographic solutions with zero knowledge proofs. Um but I don't think it's necessary, at least not for now.

Um I think it's important to recognize um that weakness, even though I was kind of trying to gloss over it, but it's important to recognize that and and deal with it maybe in the future. Excellent. All right, well, thank you once again, Jonathan, for an excellent presentation. We're gonna switch.