- From: Evan Prodromou <evan@prodromou.name>
- Date: Thu, 23 Jan 2025 14:32:02 -0500
- To: Adam Sobieski <adamsobieski@hotmail.com>, "Emelia S." <emelia@brandedcode.com>
- Cc: "public-swicg@w3c.org" <public-swicg@w3c.org>
- Message-ID: <913aed72-a137-49b2-8cbb-5151584bedd9@prodromou.name>
One possible interaction flow is this. Let's suppose an actor distributes the following activity: { "@context": "https://www.w3.org/ns/activitystreams", "id": "https://social.example/user/17/create/337", "actor": "https://social.example/user/17", "type": "Create", "to": "as:Public", "object": { "id": "https://social.example/user/17/note/1221", "type": "Note", "to": "as:Public", "attributedTo": "https://social.example/user/17", "content": "<p>Large Marge had died that very night ten years before!</p>", "published": "2025-01-23T20:15:00Z" }, "published": "2025-01-23T20:15:00Z" } This creates a public note (or short text) with some questionable facts included. Another actor could publish an annotation on that object, to give further context for readers: { "@context": [ "https://www.w3.org/ns/activitystreams", "https://annotate.example/ns" ], "id": "https://factcheck.example/user/338/annotate/47", "actor": "https://factcheck.example/user/338", "type": "Annotate", "to": "as:Public", "object": { "id": "https://factcheck.example/user/338/annotate/47", "type": "Note", "to": "as:Public", "attributedTo": "https://factcheck.example/user/338", "content": "<p>This is a variation of the <a href='https://w.wiki/CpRq'>vanishing hitchhiker</a> urban legend.</p>" "published": "2025-01-23T20:45:00Z" }, "target": "https://social.example/user/17/note/1221", "published": "2025-01-23T20:45:00Z" } "Annotate" is not a standard activity type in ActivityPub; I added a fictional "annotations" context document here. Servers that receive this annotation might include it when they redistribute the Note object to ActivityPub clients: { "id": "https://social.example/user/17/note/1221", "type": "Note", "to": "as:Public", "attributedTo": "https://social.example/user/17", "content": "<p>Large Marge had died that very night ten years before!</p>", "published": "2025-01-23T20:15:00Z", "annotations": { "type": "Collection", "id": "https://other.example/system/annotations/social.example/user/17/note/1221", "to": "as:Public", "items": { "id": "https://factcheck.example/user/338/annotate/47", "type": "Note", "to": "as:Public", "attributedTo": "https://factcheck.example/user/338", "content": "<p>This is a variation of the <a href='https://w.wiki/CpRq'>vanishing hitchhiker</a> urban legend.</p>" "published": "2025-01-23T20:45:00Z" } } } This is actually kind of a tricky situation, since usually the properties of the object as defined by the sending server, and available by fetching `https://social.example/user/17/note/1221`, would be considered canonical. The `annotations` property is managed by a different server, without the control or even knowledge of the original actor or their service. The annotations here are public; within the AP authorization model, it's also possible to restrict distribution and access to the annotations (with a different "to" property). I think the work needed here would be as follows: - Define a context doc for `Annotate` and `annotations` - A FEP or another document describing how these can be used Obviously, this is just the protocol layer; it doesn't even begin to explore the options for actually setting up a network of fact checkers or establishing trust in those fact checkers. Evan On 2025-01-23 11:33 a.m., Adam Sobieski wrote: > Evan, > > Ok. I will take a look at ActivityPub server-to-server interactions > and think about methods where fact-checking information, e.g., > annotations or community notes, are distributed via ActivityPub. > > > Best regards, > Adam > > ------------------------------------------------------------------------ > *From:* Evan Prodromou <evan@prodromou.name> > *Sent:* Thursday, January 23, 2025 10:38 AM > *To:* Adam Sobieski <adamsobieski@hotmail.com>; Emelia S. > <emelia@brandedcode.com> > *Cc:* public-swicg@w3c.org <public-swicg@w3c.org> > *Subject:* Re: Fact-checking and community notes on the Fediverse > > Hey, Adam. So, I'd prefer to use methods where the fact-checking > information is distributed via ActivityPub. > > > Evan > > > On 2025-01-14 6:31 p.m., Adam Sobieski wrote: > > Social Web Incubator Community Group, > > Hello. I am pleased to share some preliminary brainstorming and > ideas about decentralized fact-checking and argumentation using > P2P filesharing networks. > Hopefully some of the following ideas can be of use for the > Fediverse, e.g., for the discovery of existing annotations. > > > Introduction > > > With respect to sharing Web Annotations, uses of P2P networks > have been previously explored (Segawa, 2006). Providing users > with access to these kinds of networks from their Web browsers, > today, is possible with WebRTC (Werner & Vogt, 2014; Ersson & > Siri, 2015). > > P2P filesharing networks could be of use for decentralized > fact-checking and argumentation. Facts or claims could be stored > in entries, a special kind of file resource. > By creating and sharing digitally-signed user feedback, notes, > comments, or annotations with respect to those facts or claims in > entries, users could express their determinations with respect to > the veracity of facts or claims and could also present arguments > for or against them (Bex, Snaith, Lawrence, & Reed, 2014). > Entries could contain one or more references to paraphrases of > content from locations on the Fediverse (see: Appendix A). > Annotation objects from the Fediverse could be indexed and > redundantly stored on P2P filesharing networks. > > > Uses of Embedding Vectors > > Instead of, or in addition to, using cryptographic hashes to index > and address content on P2P networks, digitally-signed entries for > facts or claims could be indexed and addressed using embedding > vectors (Zaarour & Curry, 2022). > As considered, entries would be a special kind of file resource > where their embedding vectors, embedding vectors verifiably for > selections of other resources' contents, would be stored inside of > them (see: Appendix A) rather than obtained from processing them > with AI models. > Indexing and addressing entries thusly would allow them to be > merged or wrapped, e.g., to add paraphrases, digitally signing > them at each step, without having to reindex them. Modifications, > however, would result in changes to entries' cryptographic hashes. > Deep learning can be used to detect and identify sentential > paraphrases (Zhou, Qiu, Liang, & Acuna, 2022). More elaborate uses > of language models could be utilized for inquiring and reasoning > about whether sentences occurring in contexts were paraphrases. > With respect to fact-checking on the Web, scenarios to consider > include both fact-checking content which was expressly indicated > to be a fact or claim by their authors, e.g., using custom > elements, and fact-checking arbitrary selections of documents' > content. > Explorations with respect to fact-checking arbitrary selections of > content include the open-source Citation Needed project by the > Future Audiences team of the Wikimedia Foundation. > > > The Prompt API > > Exploration is underway into providing APIs for accessing language > models in Web browsers; the Web Machine Learning Working Group is > developing the Prompt API. > With access to language models in Web browsers, users might be > able to obtain embedding vectors for portions of content in Web > documents. These embedding vectors could be used to search for > other content, e.g., annotations, including on P2P networks. > > > Custom Elements > > > HTML5 custom elements could allow facts or claims to be > expressed in documents, e.g., to add visual indictors near them > or enable special context menus for them, while specifying > values for embedding vectors computed for them using AI models > (see: Appendix C). > > > Appendices > > Appendix A shows a markup sketch for an entry, a created entry > wrapped to add a paraphrase to it. > Appendix B shows that embedding vectors could be added to Magnet > URIs and Metalinks. > Appendix C shows that HTML5 custom elements could be used for > asserted facts or claims which refer to entries on P2P networks by > means of one or more embedding vectors. > Appendix D shows an approach involving shortcodes for authors > using content-management systems to be able to easily add facts or > claims to their content. > > > Bibliography > > Bex, Floris, Mark Snaith, John Lawrence, and Chris Reed. > "ArguBlogging: An application for the argument web." /Journal of > Web Semantics/ 25 (2014): 9-15. > https://www.sciencedirect.com/science/article/pii/S1570826814000079 > <https://www.sciencedirect.com/science/article/pii/S1570826814000079> > Ersson, Kerstin, and Persson Siri. "Peer-to-peer distribution of > web content using WebRTC within a web browser." (2015). > https://www.diva-portal.org/smash/get/diva2:819420/FULLTEXT01.pdf > <https://www.diva-portal.org/smash/get/diva2:819420/FULLTEXT01.pdf> > Segawa, Osamu. "Web annotation sharing using P2P." In /Proceedings > of the 15th international conference on World Wide Web/, pp. > 851-852. 2006. > http://ra.ethz.ch/CDstore/www2006/devel-www2006.ecs.soton.ac.uk/programme/files/pdf/p45.pdf > <http://ra.ethz.ch/CDstore/www2006/devel-www2006.ecs.soton.ac.uk/programme/files/pdf/p45.pdf> > Werner, Max Jonas, and Christian Vogt. "Implementation of a > browser-based P2P network using WebRTC." /Hamburg/ (2014). > https://inet.haw-hamburg.de/teaching/ws-2013-14/master-project/Prj1-report-werner-vogt.pdf > <https://inet.haw-hamburg.de/teaching/ws-2013-14/master-project/Prj1-report-werner-vogt.pdf> > Zaarour, Tarek, and Edward Curry. "SemanticPeer: A distributional > semantic peer-to-peer lookup protocol for large content spaces at > internet-scale." /Future Generation Computer Systems/ 132 (2022): > 239-253. > https://www.sciencedirect.com/science/article/pii/S0167739X22000590 > <https://www.sciencedirect.com/science/article/pii/S0167739X22000590> > Zhou, Chao, Cheng Qiu, Lizhen Liang, and Daniel E. Acuna. > "Paraphrase identification with deep learning: A review of > datasets and methods." /arXiv preprint arXiv:2212.06933/ (2022). > https://arxiv.org/pdf/2212.06933 <https://arxiv.org/pdf/2212.06933> > > > Appendix A: Sketch of an Entry for a Fact or Claim > > <action kind="add-paraphrase"> > > <base> > > <action kind="create"> > > <base /> > > <time>2024-01-14T00:01:00Z</time> > > <v id="v-1" model=" urn:ai:model:llama:3.2:90B">...</v> > > <metalink id="source-1"> > > <file name="article1.html"> > > <url>https://www.example1.com/user1/article1.html > <https://www.example1.com/user1/article1.html></url> > > </file> > > </metalink> > > <selection source="source-1"> > > ... <select v="v-1">A sentence.</select> ... > > </selection> > > <signature>...</signature> > > </action> > > </base> > > <time>2024-01-14T00:00:00Z</time> > > <v id="v-2" model="urn:ai:model:llama:3.3:70B">...</v> > > <metalink id="source-2"> > > <file name="article2.html"> > > <url>https://www.example2.com/user2/article2.html > <https://www.example2.com/user2/article2.html></url> > > </file> > > </metalink> > > <selection source="source-2"> > > ... <select v="v-1 v-2">A paraphrase.</select> ... > > </selection> > > <signature>...</signature> > > </action> > > > Appendix B: Adding Embedding Vectors to Magnet URIs and Metalinks > > > Embedding vectors could be added to Magnet URIs by means of > adding a key: xv. > > Embedding vectors could be new components of metalinks. > <metalink xmlns="urn:ietf:params:xml:ns:metalink"> > <published>2009-05-15T12:23:23Z</published> > <file name="example.txt"> > <url>http://www.example.com/example.txt > <http://www.example.com/example.txt></url> > <vector model="urn:ai:model:llama:3.3:70B">...</vector> > </file> > </metalink> > > > Appendix C: Custom Elements for Facts or Claims > > > A custom element could be used to signify an asserted fact or > claim, referring to an entry on a P2P network by means of > embedding vectors alongside other information. Via a JavaScript > library, and perhaps WebRTC, clients could participate in P2P > networks and retrieve entries, feedback on entries, or both. > > Notice that, for the special file type of entries, those embedding > vectors within them and not of the XML file itself are utilized > with respect to storing and addressing the resource on P2P networks. > <verifiable-claim see="magnet:?xv=...">Ut enim ad minim veniam, > quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea > commodo consequat.</verifiable-claim> > > > Appendix D: Content Authoring with Shortcodes > > > How might authors easily add facts or claims to their content? > With respect to popular content-management systems, the syntax > for so doing could resemble that of existing shortcodes like > [quote]. > > [claim]Ut enim ad minim veniam, quis nostrud exercitation ullamco > laboris nisi ut aliquip ex ea commodo consequat.[/claim] > During content-publishing processes, authors' content-management > systems (e.g., Drupal, WordPress) – or configurable plugins or > extensions for these systems – could handle searching for existing > paraphrases, adding new facts or claims (if needed) to P2P > filesharing networks, obtaining the data for use in the > see attributes, caching these data, and generating markup. > > ------------------------------------------------------------------------ > *From:* Emelia S. <emelia@brandedcode.com> > <mailto:emelia@brandedcode.com> > *Sent:* Monday, January 13, 2025 11:21 AM > *To:* Evan Prodromou <evan@prodromou.name> > <mailto:evan@prodromou.name> > *Cc:* public-swicg@w3c.org <mailto:public-swicg@w3c.org> > <public-swicg@w3c.org> <mailto:public-swicg@w3c.org> > *Subject:* Re: Fact-checking and community notes on the Fediverse > This is already something on the list of things that the > ActivityPub Trust & Safety Taskforce is working on: > > 4.png > <https://github.com/swicg/activitypub-trust-and-safety/issues/4> > Idea: Annotations / Labeling of content · Issue #4 · > swicg/activitypub-trust-and-safety > <https://github.com/swicg/activitypub-trust-and-safety/issues/4> > github.com > <https://github.com/swicg/activitypub-trust-and-safety/issues/4> > > > The Web Annotations model could work, but the discovery of > annotations that exist is the hardest part, I've started solving > that in https://github.com/ThisIsMissEm/annotations-service > <https://github.com/ThisIsMissEm/annotations-service> where I use > the sha256 hash of the Object ID as the annotation collection ID, > giving a very simple way to fetch all annotations for a given object. > > I do want to investigate what an Annotate activity would look > like, but I suspect this would just be an announcement of sorts > "hey, there's this web annotation over here for this target" > > Yours, > Emelia > > On 13 Jan 2025, at 04:23, Evan Prodromou <evan@prodromou.name> > <mailto:evan@prodromou.name> wrote: > > We don't have an easy way for remote actors to annotate > content on the Fediverse. > > The biggest use case for this is to have permissionless > fact-checking or community notes. A fact-checking service > could annotate a remote content object like a Note or a Video > with additional fact-checking information, and compliant > clients or servers could show the fact-checking information > when showing the Note. > > I think there are some tricky parts to this structure, which I > believe suggests that we should start working on it. > > Evan > > >
Received on Thursday, 23 January 2025 19:32:17 UTC