Re: Fact-checking and community notes on the Fediverse from Evan Prodromou on 2025-01-23 (public-swicg@w3.org from January 2025)

From: Evan Prodromou <evan@prodromou.name>
Date: Thu, 23 Jan 2025 14:32:02 -0500
To: Adam Sobieski <adamsobieski@hotmail.com>, "Emelia S." <emelia@brandedcode.com>
Cc: "public-swicg@w3c.org" <public-swicg@w3c.org>
Message-ID: <913aed72-a137-49b2-8cbb-5151584bedd9@prodromou.name>
One possible interaction flow is this.

Let's suppose an actor distributes the following activity:

{

     "@context": "https://www.w3.org/ns/activitystreams",
     "id": "https://social.example/user/17/create/337",
     "actor": "https://social.example/user/17",

     "type": "Create",
     "to": "as:Public",

     "object": {
        "id": "https://social.example/user/17/note/1221",
        "type": "Note",
        "to": "as:Public",
        "attributedTo": "https://social.example/user/17",
        "content": "<p>Large Marge had died that very night ten years 
before!</p>",
        "published": "2025-01-23T20:15:00Z"
     },
     "published": "2025-01-23T20:15:00Z"

}


This creates a public note (or short text) with some questionable facts 
included.

Another actor could publish an annotation on that object, to give 
further context for readers:

{

     "@context": [
"https://www.w3.org/ns/activitystreams",
"https://annotate.example/ns"
     ],
     "id": "https://factcheck.example/user/338/annotate/47",
     "actor": "https://factcheck.example/user/338",

     "type": "Annotate",
     "to": "as:Public",

     "object": {
        "id": "https://factcheck.example/user/338/annotate/47",
        "type": "Note",
        "to": "as:Public",
        "attributedTo": "https://factcheck.example/user/338",
        "content": "<p>This is a variation of the <a 
href='https://w.wiki/CpRq'>vanishing hitchhiker</a> urban legend.</p>"
        "published": "2025-01-23T20:45:00Z"
     },
     "target": "https://social.example/user/17/note/1221",
     "published": "2025-01-23T20:45:00Z"

}

"Annotate" is not a standard activity type in ActivityPub; I added a 
fictional "annotations" context document here.

Servers that receive this annotation might include it when they 
redistribute the Note object to ActivityPub clients:

{
     "id": "https://social.example/user/17/note/1221",
     "type": "Note",
     "to": "as:Public",
     "attributedTo": "https://social.example/user/17",
     "content": "<p>Large Marge had died that very night ten years 
before!</p>",
     "published": "2025-01-23T20:15:00Z",
     "annotations": {
          "type": "Collection",
          "id": 
"https://other.example/system/annotations/social.example/user/17/note/1221",

          "to": "as:Public",
          "items": {
             "id": "https://factcheck.example/user/338/annotate/47",
             "type": "Note",
             "to": "as:Public",
             "attributedTo": "https://factcheck.example/user/338",
              "content": "<p>This is a variation of the <a 
href='https://w.wiki/CpRq'>vanishing hitchhiker</a> urban legend.</p>"
             "published": "2025-01-23T20:45:00Z"
          }
     }
}


This is actually kind of a tricky situation, since usually the 
properties of the object as defined by the sending server, and available 
by fetching `https://social.example/user/17/note/1221`, would be 
considered canonical. The `annotations` property is managed by a 
different server, without the control or even knowledge of the original 
actor or their service.

The annotations here are public; within the AP authorization model, it's 
also possible to restrict distribution and access to the annotations 
(with a different "to" property).

I think the work needed here would be as follows:

- Define a context doc for `Annotate` and `annotations`
- A FEP or another document describing how these can be used

Obviously, this is just the protocol layer; it doesn't even begin to 
explore the options for actually setting up a network of fact checkers 
or establishing trust in those fact checkers.

Evan

On 2025-01-23 11:33 a.m., Adam Sobieski wrote:
> Evan,
>
> Ok. I will take a look at ActivityPub server-to-server interactions 
> and think about methods where fact-checking information, e.g., 
> annotations or community notes, are distributed via ActivityPub.
>
>
> Best regards,
> Adam
>
> ------------------------------------------------------------------------
> *From:* Evan Prodromou <evan@prodromou.name>
> *Sent:* Thursday, January 23, 2025 10:38 AM
> *To:* Adam Sobieski <adamsobieski@hotmail.com>; Emelia S. 
> <emelia@brandedcode.com>
> *Cc:* public-swicg@w3c.org <public-swicg@w3c.org>
> *Subject:* Re: Fact-checking and community notes on the Fediverse
>
> Hey, Adam. So, I'd prefer to use methods where the fact-checking 
> information is distributed via ActivityPub.
>
>
> Evan
>
>
> On 2025-01-14 6:31 p.m., Adam Sobieski wrote:
>
>     Social Web Incubator Community Group,
>
>     Hello. I am pleased to share some preliminary brainstorming and
>     ideas about decentralized fact-checking and argumentation using
>     P2P filesharing networks.
>     Hopefully some of the following ideas can be of use for the
>     Fediverse, e.g., for the discovery of existing annotations.
>
>
>       Introduction
>
>
>       With respect to sharing Web Annotations, uses of P2P networks
>       have been previously explored (Segawa, 2006). Providing users
>       with access to these kinds of networks from their Web browsers,
>       today, is possible with WebRTC (Werner & Vogt, 2014; Ersson &
>       Siri, 2015).
>
>     P2P filesharing networks could be of use for decentralized
>     fact-checking and argumentation. Facts or claims could be stored
>     in entries, a special kind of file resource.
>     By creating and sharing digitally-signed user feedback, notes,
>     comments, or annotations with respect to those facts or claims in
>     entries, users could express their determinations with respect to
>     the veracity of facts or claims and could also present arguments
>     for or against them (Bex, Snaith, Lawrence, & Reed, 2014).
>     Entries could contain one or more references to paraphrases of
>     content from locations on the Fediverse (see: Appendix A).
>     Annotation objects from the Fediverse could be indexed and
>     redundantly stored on P2P filesharing networks.
>
>
>       Uses of Embedding Vectors
>
>     Instead of, or in addition to, using cryptographic hashes to index
>     and address content on P2P networks, digitally-signed entries for
>     facts or claims could be indexed and addressed using embedding
>     vectors (Zaarour & Curry, 2022).
>     As considered, entries would be a special kind of file resource
>     where their embedding vectors, embedding vectors verifiably for
>     selections of other resources' contents, would be stored inside of
>     them (see: Appendix A) rather than obtained from processing them
>     with AI models.
>     Indexing and addressing entries thusly would allow them to be
>     merged or wrapped, e.g., to add paraphrases, digitally signing
>     them at each step, without having to reindex them. Modifications,
>     however, would result in changes to entries' cryptographic hashes.
>     Deep learning can be used to detect and identify sentential
>     paraphrases (Zhou, Qiu, Liang, & Acuna, 2022). More elaborate uses
>     of language models could be utilized for inquiring and reasoning
>     about whether sentences occurring in contexts were paraphrases.
>     With respect to fact-checking on the Web, scenarios to consider
>     include both fact-checking content which was expressly indicated
>     to be a fact or claim by their authors, e.g., using custom
>     elements, and fact-checking arbitrary selections of documents'
>     content.
>     Explorations with respect to fact-checking arbitrary selections of
>     content include the open-source Citation Needed project by the
>     Future Audiences team of the Wikimedia Foundation.
>
>
>       The Prompt API
>
>     Exploration is underway into providing APIs for accessing language
>     models in Web browsers; the Web Machine Learning Working Group is
>     developing the Prompt API.
>     With access to language models in Web browsers, users might be
>     able to obtain embedding vectors for portions of content in Web
>     documents. These embedding vectors could be used to search for
>     other content, e.g., annotations, including on P2P networks.
>
>
>       Custom Elements
>
>
>       HTML5 custom elements could allow facts or claims to be
>       expressed in documents, e.g., to add visual indictors near them
>       or enable special context menus for them, while specifying
>       values for embedding vectors computed for them using AI models
>       (see: Appendix C).
>
>
>       Appendices
>
>     Appendix A shows a markup sketch for an entry, a created entry
>     wrapped to add a paraphrase to it.
>     Appendix B shows that embedding vectors could be added to Magnet
>     URIs and Metalinks.
>     Appendix C shows that HTML5 custom elements could be used for
>     asserted facts or claims which refer to entries on P2P networks by
>     means of one or more embedding vectors.
>     Appendix D shows an approach involving shortcodes for authors
>     using content-management systems to be able to easily add facts or
>     claims to their content.
>
>
>       Bibliography
>
>     Bex, Floris, Mark Snaith, John Lawrence, and Chris Reed.
>     "ArguBlogging: An application for the argument web." /Journal of
>     Web Semantics/ 25 (2014): 9-15.
>     https://www.sciencedirect.com/science/article/pii/S1570826814000079
>     <https://www.sciencedirect.com/science/article/pii/S1570826814000079>
>     Ersson, Kerstin, and Persson Siri. "Peer-to-peer distribution of
>     web content using WebRTC within a web browser." (2015).
>     https://www.diva-portal.org/smash/get/diva2:819420/FULLTEXT01.pdf
>     <https://www.diva-portal.org/smash/get/diva2:819420/FULLTEXT01.pdf>
>     Segawa, Osamu. "Web annotation sharing using P2P." In /Proceedings
>     of the 15th international conference on World Wide Web/, pp.
>     851-852. 2006.
>     http://ra.ethz.ch/CDstore/www2006/devel-www2006.ecs.soton.ac.uk/programme/files/pdf/p45.pdf
>     <http://ra.ethz.ch/CDstore/www2006/devel-www2006.ecs.soton.ac.uk/programme/files/pdf/p45.pdf>
>     Werner, Max Jonas, and Christian Vogt. "Implementation of a
>     browser-based P2P network using WebRTC." /Hamburg/ (2014).
>     https://inet.haw-hamburg.de/teaching/ws-2013-14/master-project/Prj1-report-werner-vogt.pdf
>     <https://inet.haw-hamburg.de/teaching/ws-2013-14/master-project/Prj1-report-werner-vogt.pdf>
>     Zaarour, Tarek, and Edward Curry. "SemanticPeer: A distributional
>     semantic peer-to-peer lookup protocol for large content spaces at
>     internet-scale." /Future Generation Computer Systems/ 132 (2022):
>     239-253.
>     https://www.sciencedirect.com/science/article/pii/S0167739X22000590
>     <https://www.sciencedirect.com/science/article/pii/S0167739X22000590>
>     Zhou, Chao, Cheng Qiu, Lizhen Liang, and Daniel E. Acuna.
>     "Paraphrase identification with deep learning: A review of
>     datasets and methods." /arXiv preprint arXiv:2212.06933/ (2022).
>     https://arxiv.org/pdf/2212.06933 <https://arxiv.org/pdf/2212.06933>
>
>
>       Appendix A: Sketch of an Entry for a Fact or Claim
>
>     <action kind="add-paraphrase">
>
>     <base>
>
>     <action kind="create">
>
>     <base />
>
>     <time>2024-01-14T00:01:00Z</time>
>
>     <v id="v-1" model=" urn:ai:model:llama:3.2:90B">...</v>
>
>     <metalink id="source-1">
>
>     <file name="article1.html">
>
>             <url>https://www.example1.com/user1/article1.html
>     <https://www.example1.com/user1/article1.html></url>
>
>     </file>
>
>     </metalink>
>
>     <selection source="source-1">
>
>     ... <select v="v-1">A sentence.</select> ...
>
>     </selection>
>
>     <signature>...</signature>
>
>     </action>
>
>     </base>
>
>     <time>2024-01-14T00:00:00Z</time>
>
>     <v id="v-2" model="urn:ai:model:llama:3.3:70B">...</v>
>
>     <metalink id="source-2">
>
>     <file name="article2.html">
>
>         <url>https://www.example2.com/user2/article2.html
>     <https://www.example2.com/user2/article2.html></url>
>
>     </file>
>
>     </metalink>
>
>     <selection source="source-2">
>
>     ... <select v="v-1 v-2">A paraphrase.</select> ...
>
>     </selection>
>
>     <signature>...</signature>
>
>     </action>
>
>
>       Appendix B: Adding Embedding Vectors to Magnet URIs and Metalinks
>
>
>       Embedding vectors could be added to Magnet URIs by means of
>       adding a key: xv.
>
>     Embedding vectors could be new components of metalinks.
>     <metalink xmlns="urn:ietf:params:xml:ns:metalink">
>       <published>2009-05-15T12:23:23Z</published>
>       <file name="example.txt">
>         <url>http://www.example.com/example.txt
>     <http://www.example.com/example.txt></url>
>         <vector model="urn:ai:model:llama:3.3:70B">...</vector>
>       </file>
>     </metalink>
>
>
>       Appendix C: Custom Elements for Facts or Claims
>
>
>       A custom element could be used to signify an asserted fact or
>       claim, referring to an entry on a P2P network by means of
>       embedding vectors alongside other information. Via a JavaScript
>       library, and perhaps WebRTC, clients could participate in P2P
>       networks and retrieve entries, feedback on entries, or both.
>
>     Notice that, for the special file type of entries, those embedding
>     vectors within them and not of the XML file itself are utilized
>     with respect to storing and addressing the resource on P2P networks.
>     <verifiable-claim see="magnet:?xv=...">Ut enim ad minim veniam,
>     quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
>     commodo consequat.</verifiable-claim>
>
>
>       Appendix D: Content Authoring with Shortcodes
>
>
>       How might authors easily add facts or claims to their content?
>       With respect to popular content-management systems, the syntax
>       for so doing could resemble that of existing shortcodes like
>       [quote].
>
>     [claim]Ut enim ad minim veniam, quis nostrud exercitation ullamco
>     laboris nisi ut aliquip ex ea commodo consequat.[/claim]
>     During content-publishing processes, authors' content-management
>     systems (e.g., Drupal, WordPress) – or configurable plugins or
>     extensions for these systems – could handle searching for existing
>     paraphrases, adding new facts or claims (if needed) to P2P
>     filesharing networks, obtaining the data for use in the
>     see attributes, caching these data, and generating markup.
>
>     ------------------------------------------------------------------------
>     *From:* Emelia S. <emelia@brandedcode.com>
>     <mailto:emelia@brandedcode.com>
>     *Sent:* Monday, January 13, 2025 11:21 AM
>     *To:* Evan Prodromou <evan@prodromou.name>
>     <mailto:evan@prodromou.name>
>     *Cc:* public-swicg@w3c.org <mailto:public-swicg@w3c.org>
>     <public-swicg@w3c.org> <mailto:public-swicg@w3c.org>
>     *Subject:* Re: Fact-checking and community notes on the Fediverse
>     This is already something on the list of things that the
>     ActivityPub Trust  & Safety Taskforce is working on:
>
>     4.png
>     <https://github.com/swicg/activitypub-trust-and-safety/issues/4>
>     Idea: Annotations / Labeling of content · Issue #4 ·
>     swicg/activitypub-trust-and-safety
>     <https://github.com/swicg/activitypub-trust-and-safety/issues/4>
>     github.com
>     <https://github.com/swicg/activitypub-trust-and-safety/issues/4>
>
>
>     The Web Annotations model could work, but the discovery of
>     annotations that exist is the hardest part, I've started solving
>     that in https://github.com/ThisIsMissEm/annotations-service
>     <https://github.com/ThisIsMissEm/annotations-service> where I use
>     the sha256 hash of the Object ID as the annotation collection ID,
>     giving a very simple way to fetch all annotations for a given object.
>
>     I do want to investigate what an Annotate activity would look
>     like, but I suspect this would just be an announcement of sorts
>     "hey, there's this web annotation over here for this target"
>
>     Yours,
>     Emelia
>
>         On 13 Jan 2025, at 04:23, Evan Prodromou <evan@prodromou.name>
>         <mailto:evan@prodromou.name> wrote:
>
>         We don't have an easy way for remote actors to annotate
>         content on the Fediverse.
>
>         The biggest use case for this is to have permissionless
>         fact-checking or community notes. A fact-checking service
>         could annotate a remote content object like a Note or a Video
>         with additional fact-checking information, and compliant
>         clients or servers could show the fact-checking information
>         when showing the Note.
>
>         I think there are some tricky parts to this structure, which I
>         believe suggests that we should start working on it.
>
>         Evan
>
>
>
Received on Thursday, 23 January 2025 19:32:17 UTC