Re: Fact-checking and community notes on the Fediverse from Evan Prodromou on 2025-01-23 (public-swicg@w3.org from January 2025)

From: Evan Prodromou <evan@prodromou.name>
Date: Thu, 23 Jan 2025 10:38:55 -0500
To: Adam Sobieski <adamsobieski@hotmail.com>, "Emelia S." <emelia@brandedcode.com>
Cc: "public-swicg@w3c.org" <public-swicg@w3c.org>
Message-ID: <48b90625-1f77-45ef-b3a3-d31d7ebe9624@prodromou.name>
Hey, Adam. So, I'd prefer to use methods where the fact-checking 
information is distributed via ActivityPub.


Evan


On 2025-01-14 6:31 p.m., Adam Sobieski wrote:
> Social Web Incubator Community Group,
>
> Hello. I am pleased to share some preliminary brainstorming and ideas 
> about decentralized fact-checking and argumentation using P2P 
> filesharing networks.
> Hopefully some of the following ideas can be of use for the Fediverse, 
> e.g., for the discovery of existing annotations.
>
>
>   Introduction
>
>
>   With respect to sharing Web Annotations, uses of P2P networks have
>   been previously explored (Segawa, 2006). Providing users with access
>   to these kinds of networks from their Web browsers, today, is
>   possible with WebRTC (Werner & Vogt, 2014; Ersson & Siri, 2015).
>
> P2P filesharing networks could be of use for decentralized 
> fact-checking and argumentation. Facts or claims could be stored in 
> entries, a special kind of file resource.
> By creating and sharing digitally-signed user feedback, notes, 
> comments, or annotations with respect to those facts or claims in 
> entries, users could express their determinations with respect to the 
> veracity of facts or claims and could also present arguments for or 
> against them (Bex, Snaith, Lawrence, & Reed, 2014).
> Entries could contain one or more references to paraphrases of content 
> from locations on the Fediverse (see: Appendix A). Annotation objects 
> from the Fediverse could be indexed and redundantly stored on P2P 
> filesharing networks.
>
>
>   Uses of Embedding Vectors
>
> Instead of, or in addition to, using cryptographic hashes to index and 
> address content on P2P networks, digitally-signed entries for facts or 
> claims could be indexed and addressed using embedding vectors (Zaarour 
> & Curry, 2022).
> As considered, entries would be a special kind of file resource where 
> their embedding vectors, embedding vectors verifiably for selections 
> of other resources' contents, would be stored inside of them (see: 
> Appendix A) rather than obtained from processing them with AI models.
> Indexing and addressing entries thusly would allow them to be merged 
> or wrapped, e.g., to add paraphrases, digitally signing them at each 
> step, without having to reindex them. Modifications, however, would 
> result in changes to entries' cryptographic hashes.
> Deep learning can be used to detect and identify sentential 
> paraphrases (Zhou, Qiu, Liang, & Acuna, 2022). More elaborate uses of 
> language models could be utilized for inquiring and reasoning about 
> whether sentences occurring in contexts were paraphrases.
> With respect to fact-checking on the Web, scenarios to consider 
> include both fact-checking content which was expressly indicated to be 
> a fact or claim by their authors, e.g., using custom elements, and 
> fact-checking arbitrary selections of documents' content.
> Explorations with respect to fact-checking arbitrary selections of 
> content include the open-source Citation Needed project by the Future 
> Audiences team of the Wikimedia Foundation.
>
>
>   The Prompt API
>
> Exploration is underway into providing APIs for accessing language 
> models in Web browsers; the Web Machine Learning Working Group is 
> developing the Prompt API.
> With access to language models in Web browsers, users might be able to 
> obtain embedding vectors for portions of content in Web documents. 
> These embedding vectors could be used to search for other content, 
> e.g., annotations, including on P2P networks.
>
>
>   Custom Elements
>
>
>   HTML5 custom elements could allow facts or claims to be expressed in
>   documents, e.g., to add visual indictors near them or enable special
>   context menus for them, while specifying values for embedding
>   vectors computed for them using AI models (see: Appendix C).
>
>
>   Appendices
>
> Appendix A shows a markup sketch for an entry, a created entry wrapped 
> to add a paraphrase to it.
> Appendix B shows that embedding vectors could be added to Magnet URIs 
> and Metalinks.
> Appendix C shows that HTML5 custom elements could be used for asserted 
> facts or claims which refer to entries on P2P networks by means of one 
> or more embedding vectors.
> Appendix D shows an approach involving shortcodes for authors using 
> content-management systems to be able to easily add facts or claims to 
> their content.
>
>
>   Bibliography
>
> Bex, Floris, Mark Snaith, John Lawrence, and Chris Reed. 
> "ArguBlogging: An application for the argument web." /Journal of Web 
> Semantics/ 25 (2014): 9-15. 
> https://www.sciencedirect.com/science/article/pii/S1570826814000079 
> <https://www.sciencedirect.com/science/article/pii/S1570826814000079>
> Ersson, Kerstin, and Persson Siri. "Peer-to-peer distribution of web 
> content using WebRTC within a web browser." (2015). 
> https://www.diva-portal.org/smash/get/diva2:819420/FULLTEXT01.pdf 
> <https://www.diva-portal.org/smash/get/diva2:819420/FULLTEXT01.pdf>
> Segawa, Osamu. "Web annotation sharing using P2P." In /Proceedings of 
> the 15th international conference on World Wide Web/, pp. 851-852. 
> 2006. 
> http://ra.ethz.ch/CDstore/www2006/devel-www2006.ecs.soton.ac.uk/programme/files/pdf/p45.pdf 
> <http://ra.ethz.ch/CDstore/www2006/devel-www2006.ecs.soton.ac.uk/programme/files/pdf/p45.pdf>
> Werner, Max Jonas, and Christian Vogt. "Implementation of a 
> browser-based P2P network using WebRTC." /Hamburg/ (2014). 
> https://inet.haw-hamburg.de/teaching/ws-2013-14/master-project/Prj1-report-werner-vogt.pdf 
> <https://inet.haw-hamburg.de/teaching/ws-2013-14/master-project/Prj1-report-werner-vogt.pdf>
> Zaarour, Tarek, and Edward Curry. "SemanticPeer: A distributional 
> semantic peer-to-peer lookup protocol for large content spaces at 
> internet-scale." /Future Generation Computer Systems/ 132 (2022): 
> 239-253. 
> https://www.sciencedirect.com/science/article/pii/S0167739X22000590 
> <https://www.sciencedirect.com/science/article/pii/S0167739X22000590>
> Zhou, Chao, Cheng Qiu, Lizhen Liang, and Daniel E. Acuna. "Paraphrase 
> identification with deep learning: A review of datasets and methods." 
> /arXiv preprint arXiv:2212.06933/ (2022). 
> https://arxiv.org/pdf/2212.06933 <https://arxiv.org/pdf/2212.06933>
>
>
>   Appendix A: Sketch of an Entry for a Fact or Claim
>
> <action kind="add-paraphrase">
>
> <base>
>
> <action kind="create">
>
> <base />
>
> <time>2024-01-14T00:01:00Z</time>
>
> <v id="v-1" model=" urn:ai:model:llama:3.2:90B">...</v>
>
> <metalink id="source-1">
>
> <file name="article1.html">
>
> <url>https://www.example1.com/user1/article1.html</url>
>
> </file>
>
> </metalink>
>
> <selection source="source-1">
>
> ... <select v="v-1">A sentence.</select> ...
>
> </selection>
>
> <signature>...</signature>
>
> </action>
>
> </base>
>
> <time>2024-01-14T00:00:00Z</time>
>
> <v id="v-2" model="urn:ai:model:llama:3.3:70B">...</v>
>
> <metalink id="source-2">
>
> <file name="article2.html">
>
> <url>https://www.example2.com/user2/article2.html</url>
>
> </file>
>
> </metalink>
>
> <selection source="source-2">
>
> ... <select v="v-1 v-2">A paraphrase.</select> ...
>
> </selection>
>
> <signature>...</signature>
>
> </action>
>
>
>   Appendix B: Adding Embedding Vectors to Magnet URIs and Metalinks
>
>
>   Embedding vectors could be added to Magnet URIs by means of adding a
>   key: xv.
>
> Embedding vectors could be new components of metalinks.
> <metalink xmlns="urn:ietf:params:xml:ns:metalink">
>   <published>2009-05-15T12:23:23Z</published>
>   <file name="example.txt">
>     <url>http://www.example.com/example.txt</url>
>     <vector model="urn:ai:model:llama:3.3:70B">...</vector>
>   </file>
> </metalink>
>
>
>   Appendix C: Custom Elements for Facts or Claims
>
>
>   A custom element could be used to signify an asserted fact or claim,
>   referring to an entry on a P2P network by means of embedding vectors
>   alongside other information. Via a JavaScript library, and perhaps
>   WebRTC, clients could participate in P2P networks and retrieve
>   entries, feedback on entries, or both.
>
> Notice that, for the special file type of entries, those embedding 
> vectors within them and not of the XML file itself are utilized with 
> respect to storing and addressing the resource on P2P networks.
> <verifiable-claim see="magnet:?xv=...">Ut enim ad minim veniam, quis 
> nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo 
> consequat.</verifiable-claim>
>
>
>   Appendix D: Content Authoring with Shortcodes
>
>
>   How might authors easily add facts or claims to their content? With
>   respect to popular content-management systems, the syntax for so
>   doing could resemble that of existing shortcodes like [quote].
>
> [claim]Ut enim ad minim veniam, quis nostrud exercitation ullamco 
> laboris nisi ut aliquip ex ea commodo consequat.[/claim]
> During content-publishing processes, authors' content-management 
> systems (e.g., Drupal, WordPress) – or configurable plugins or 
> extensions for these systems – could handle searching for existing 
> paraphrases, adding new facts or claims (if needed) to P2P filesharing 
> networks, obtaining the data for use in the see attributes, caching 
> these data, and generating markup.
>
> ------------------------------------------------------------------------
> *From:* Emelia S. <emelia@brandedcode.com>
> *Sent:* Monday, January 13, 2025 11:21 AM
> *To:* Evan Prodromou <evan@prodromou.name>
> *Cc:* public-swicg@w3c.org <public-swicg@w3c.org>
> *Subject:* Re: Fact-checking and community notes on the Fediverse
> This is already something on the list of things that the ActivityPub 
> Trust  & Safety Taskforce is working on:
>
> 4.png
> Idea: Annotations / Labeling of content · Issue #4 · 
> swicg/activitypub-trust-and-safety 
> <https://github.com/swicg/activitypub-trust-and-safety/issues/4>
> github.com 
> <https://github.com/swicg/activitypub-trust-and-safety/issues/4>
>
> <https://github.com/swicg/activitypub-trust-and-safety/issues/4>
>
> The Web Annotations model could work, but the discovery of annotations 
> that exist is the hardest part, I've started solving that in 
> https://github.com/ThisIsMissEm/annotations-service where I use the 
> sha256 hash of the Object ID as the annotation collection ID, giving a 
> very simple way to fetch all annotations for a given object.
>
> I do want to investigate what an Annotate activity would look like, 
> but I suspect this would just be an announcement of sorts "hey, 
> there's this web annotation over here for this target"
>
> Yours,
> Emelia
>
>> On 13 Jan 2025, at 04:23, Evan Prodromou <evan@prodromou.name> wrote:
>>
>> We don't have an easy way for remote actors to annotate content on 
>> the Fediverse.
>>
>> The biggest use case for this is to have permissionless fact-checking 
>> or community notes. A fact-checking service could annotate a remote 
>> content object like a Note or a Video with additional fact-checking 
>> information, and compliant clients or servers could show the 
>> fact-checking information when showing the Note.
>>
>> I think there are some tricky parts to this structure, which I 
>> believe suggests that we should start working on it.
>>
>> Evan
>>
>>
>
Received on Thursday, 23 January 2025 15:39:10 UTC