- From: Evan Prodromou <evan@prodromou.name>
- Date: Thu, 23 Jan 2025 14:32:02 -0500
- To: Adam Sobieski <adamsobieski@hotmail.com>, "Emelia S." <emelia@brandedcode.com>
- Cc: "public-swicg@w3c.org" <public-swicg@w3c.org>
- Message-ID: <913aed72-a137-49b2-8cbb-5151584bedd9@prodromou.name>
One possible interaction flow is this.
Let's suppose an actor distributes the following activity:
{
"@context": "https://www.w3.org/ns/activitystreams",
"id": "https://social.example/user/17/create/337",
"actor": "https://social.example/user/17",
"type": "Create",
"to": "as:Public",
"object": {
"id": "https://social.example/user/17/note/1221",
"type": "Note",
"to": "as:Public",
"attributedTo": "https://social.example/user/17",
"content": "<p>Large Marge had died that very night ten years
before!</p>",
"published": "2025-01-23T20:15:00Z"
},
"published": "2025-01-23T20:15:00Z"
}
This creates a public note (or short text) with some questionable facts
included.
Another actor could publish an annotation on that object, to give
further context for readers:
{
"@context": [
"https://www.w3.org/ns/activitystreams",
"https://annotate.example/ns"
],
"id": "https://factcheck.example/user/338/annotate/47",
"actor": "https://factcheck.example/user/338",
"type": "Annotate",
"to": "as:Public",
"object": {
"id": "https://factcheck.example/user/338/annotate/47",
"type": "Note",
"to": "as:Public",
"attributedTo": "https://factcheck.example/user/338",
"content": "<p>This is a variation of the <a
href='https://w.wiki/CpRq'>vanishing hitchhiker</a> urban legend.</p>"
"published": "2025-01-23T20:45:00Z"
},
"target": "https://social.example/user/17/note/1221",
"published": "2025-01-23T20:45:00Z"
}
"Annotate" is not a standard activity type in ActivityPub; I added a
fictional "annotations" context document here.
Servers that receive this annotation might include it when they
redistribute the Note object to ActivityPub clients:
{
"id": "https://social.example/user/17/note/1221",
"type": "Note",
"to": "as:Public",
"attributedTo": "https://social.example/user/17",
"content": "<p>Large Marge had died that very night ten years
before!</p>",
"published": "2025-01-23T20:15:00Z",
"annotations": {
"type": "Collection",
"id":
"https://other.example/system/annotations/social.example/user/17/note/1221",
"to": "as:Public",
"items": {
"id": "https://factcheck.example/user/338/annotate/47",
"type": "Note",
"to": "as:Public",
"attributedTo": "https://factcheck.example/user/338",
"content": "<p>This is a variation of the <a
href='https://w.wiki/CpRq'>vanishing hitchhiker</a> urban legend.</p>"
"published": "2025-01-23T20:45:00Z"
}
}
}
This is actually kind of a tricky situation, since usually the
properties of the object as defined by the sending server, and available
by fetching `https://social.example/user/17/note/1221`, would be
considered canonical. The `annotations` property is managed by a
different server, without the control or even knowledge of the original
actor or their service.
The annotations here are public; within the AP authorization model, it's
also possible to restrict distribution and access to the annotations
(with a different "to" property).
I think the work needed here would be as follows:
- Define a context doc for `Annotate` and `annotations`
- A FEP or another document describing how these can be used
Obviously, this is just the protocol layer; it doesn't even begin to
explore the options for actually setting up a network of fact checkers
or establishing trust in those fact checkers.
Evan
On 2025-01-23 11:33 a.m., Adam Sobieski wrote:
> Evan,
>
> Ok. I will take a look at ActivityPub server-to-server interactions
> and think about methods where fact-checking information, e.g.,
> annotations or community notes, are distributed via ActivityPub.
>
>
> Best regards,
> Adam
>
> ------------------------------------------------------------------------
> *From:* Evan Prodromou <evan@prodromou.name>
> *Sent:* Thursday, January 23, 2025 10:38 AM
> *To:* Adam Sobieski <adamsobieski@hotmail.com>; Emelia S.
> <emelia@brandedcode.com>
> *Cc:* public-swicg@w3c.org <public-swicg@w3c.org>
> *Subject:* Re: Fact-checking and community notes on the Fediverse
>
> Hey, Adam. So, I'd prefer to use methods where the fact-checking
> information is distributed via ActivityPub.
>
>
> Evan
>
>
> On 2025-01-14 6:31 p.m., Adam Sobieski wrote:
>
> Social Web Incubator Community Group,
>
> Hello. I am pleased to share some preliminary brainstorming and
> ideas about decentralized fact-checking and argumentation using
> P2P filesharing networks.
> Hopefully some of the following ideas can be of use for the
> Fediverse, e.g., for the discovery of existing annotations.
>
>
> Introduction
>
>
> With respect to sharing Web Annotations, uses of P2P networks
> have been previously explored (Segawa, 2006). Providing users
> with access to these kinds of networks from their Web browsers,
> today, is possible with WebRTC (Werner & Vogt, 2014; Ersson &
> Siri, 2015).
>
> P2P filesharing networks could be of use for decentralized
> fact-checking and argumentation. Facts or claims could be stored
> in entries, a special kind of file resource.
> By creating and sharing digitally-signed user feedback, notes,
> comments, or annotations with respect to those facts or claims in
> entries, users could express their determinations with respect to
> the veracity of facts or claims and could also present arguments
> for or against them (Bex, Snaith, Lawrence, & Reed, 2014).
> Entries could contain one or more references to paraphrases of
> content from locations on the Fediverse (see: Appendix A).
> Annotation objects from the Fediverse could be indexed and
> redundantly stored on P2P filesharing networks.
>
>
> Uses of Embedding Vectors
>
> Instead of, or in addition to, using cryptographic hashes to index
> and address content on P2P networks, digitally-signed entries for
> facts or claims could be indexed and addressed using embedding
> vectors (Zaarour & Curry, 2022).
> As considered, entries would be a special kind of file resource
> where their embedding vectors, embedding vectors verifiably for
> selections of other resources' contents, would be stored inside of
> them (see: Appendix A) rather than obtained from processing them
> with AI models.
> Indexing and addressing entries thusly would allow them to be
> merged or wrapped, e.g., to add paraphrases, digitally signing
> them at each step, without having to reindex them. Modifications,
> however, would result in changes to entries' cryptographic hashes.
> Deep learning can be used to detect and identify sentential
> paraphrases (Zhou, Qiu, Liang, & Acuna, 2022). More elaborate uses
> of language models could be utilized for inquiring and reasoning
> about whether sentences occurring in contexts were paraphrases.
> With respect to fact-checking on the Web, scenarios to consider
> include both fact-checking content which was expressly indicated
> to be a fact or claim by their authors, e.g., using custom
> elements, and fact-checking arbitrary selections of documents'
> content.
> Explorations with respect to fact-checking arbitrary selections of
> content include the open-source Citation Needed project by the
> Future Audiences team of the Wikimedia Foundation.
>
>
> The Prompt API
>
> Exploration is underway into providing APIs for accessing language
> models in Web browsers; the Web Machine Learning Working Group is
> developing the Prompt API.
> With access to language models in Web browsers, users might be
> able to obtain embedding vectors for portions of content in Web
> documents. These embedding vectors could be used to search for
> other content, e.g., annotations, including on P2P networks.
>
>
> Custom Elements
>
>
> HTML5 custom elements could allow facts or claims to be
> expressed in documents, e.g., to add visual indictors near them
> or enable special context menus for them, while specifying
> values for embedding vectors computed for them using AI models
> (see: Appendix C).
>
>
> Appendices
>
> Appendix A shows a markup sketch for an entry, a created entry
> wrapped to add a paraphrase to it.
> Appendix B shows that embedding vectors could be added to Magnet
> URIs and Metalinks.
> Appendix C shows that HTML5 custom elements could be used for
> asserted facts or claims which refer to entries on P2P networks by
> means of one or more embedding vectors.
> Appendix D shows an approach involving shortcodes for authors
> using content-management systems to be able to easily add facts or
> claims to their content.
>
>
> Bibliography
>
> Bex, Floris, Mark Snaith, John Lawrence, and Chris Reed.
> "ArguBlogging: An application for the argument web." /Journal of
> Web Semantics/ 25 (2014): 9-15.
> https://www.sciencedirect.com/science/article/pii/S1570826814000079
> <https://www.sciencedirect.com/science/article/pii/S1570826814000079>
> Ersson, Kerstin, and Persson Siri. "Peer-to-peer distribution of
> web content using WebRTC within a web browser." (2015).
> https://www.diva-portal.org/smash/get/diva2:819420/FULLTEXT01.pdf
> <https://www.diva-portal.org/smash/get/diva2:819420/FULLTEXT01.pdf>
> Segawa, Osamu. "Web annotation sharing using P2P." In /Proceedings
> of the 15th international conference on World Wide Web/, pp.
> 851-852. 2006.
> http://ra.ethz.ch/CDstore/www2006/devel-www2006.ecs.soton.ac.uk/programme/files/pdf/p45.pdf
> <http://ra.ethz.ch/CDstore/www2006/devel-www2006.ecs.soton.ac.uk/programme/files/pdf/p45.pdf>
> Werner, Max Jonas, and Christian Vogt. "Implementation of a
> browser-based P2P network using WebRTC." /Hamburg/ (2014).
> https://inet.haw-hamburg.de/teaching/ws-2013-14/master-project/Prj1-report-werner-vogt.pdf
> <https://inet.haw-hamburg.de/teaching/ws-2013-14/master-project/Prj1-report-werner-vogt.pdf>
> Zaarour, Tarek, and Edward Curry. "SemanticPeer: A distributional
> semantic peer-to-peer lookup protocol for large content spaces at
> internet-scale." /Future Generation Computer Systems/ 132 (2022):
> 239-253.
> https://www.sciencedirect.com/science/article/pii/S0167739X22000590
> <https://www.sciencedirect.com/science/article/pii/S0167739X22000590>
> Zhou, Chao, Cheng Qiu, Lizhen Liang, and Daniel E. Acuna.
> "Paraphrase identification with deep learning: A review of
> datasets and methods." /arXiv preprint arXiv:2212.06933/ (2022).
> https://arxiv.org/pdf/2212.06933 <https://arxiv.org/pdf/2212.06933>
>
>
> Appendix A: Sketch of an Entry for a Fact or Claim
>
> <action kind="add-paraphrase">
>
> <base>
>
> <action kind="create">
>
> <base />
>
> <time>2024-01-14T00:01:00Z</time>
>
> <v id="v-1" model=" urn:ai:model:llama:3.2:90B">...</v>
>
> <metalink id="source-1">
>
> <file name="article1.html">
>
> <url>https://www.example1.com/user1/article1.html
> <https://www.example1.com/user1/article1.html></url>
>
> </file>
>
> </metalink>
>
> <selection source="source-1">
>
> ... <select v="v-1">A sentence.</select> ...
>
> </selection>
>
> <signature>...</signature>
>
> </action>
>
> </base>
>
> <time>2024-01-14T00:00:00Z</time>
>
> <v id="v-2" model="urn:ai:model:llama:3.3:70B">...</v>
>
> <metalink id="source-2">
>
> <file name="article2.html">
>
> <url>https://www.example2.com/user2/article2.html
> <https://www.example2.com/user2/article2.html></url>
>
> </file>
>
> </metalink>
>
> <selection source="source-2">
>
> ... <select v="v-1 v-2">A paraphrase.</select> ...
>
> </selection>
>
> <signature>...</signature>
>
> </action>
>
>
> Appendix B: Adding Embedding Vectors to Magnet URIs and Metalinks
>
>
> Embedding vectors could be added to Magnet URIs by means of
> adding a key: xv.
>
> Embedding vectors could be new components of metalinks.
> <metalink xmlns="urn:ietf:params:xml:ns:metalink">
> <published>2009-05-15T12:23:23Z</published>
> <file name="example.txt">
> <url>http://www.example.com/example.txt
> <http://www.example.com/example.txt></url>
> <vector model="urn:ai:model:llama:3.3:70B">...</vector>
> </file>
> </metalink>
>
>
> Appendix C: Custom Elements for Facts or Claims
>
>
> A custom element could be used to signify an asserted fact or
> claim, referring to an entry on a P2P network by means of
> embedding vectors alongside other information. Via a JavaScript
> library, and perhaps WebRTC, clients could participate in P2P
> networks and retrieve entries, feedback on entries, or both.
>
> Notice that, for the special file type of entries, those embedding
> vectors within them and not of the XML file itself are utilized
> with respect to storing and addressing the resource on P2P networks.
> <verifiable-claim see="magnet:?xv=...">Ut enim ad minim veniam,
> quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
> commodo consequat.</verifiable-claim>
>
>
> Appendix D: Content Authoring with Shortcodes
>
>
> How might authors easily add facts or claims to their content?
> With respect to popular content-management systems, the syntax
> for so doing could resemble that of existing shortcodes like
> [quote].
>
> [claim]Ut enim ad minim veniam, quis nostrud exercitation ullamco
> laboris nisi ut aliquip ex ea commodo consequat.[/claim]
> During content-publishing processes, authors' content-management
> systems (e.g., Drupal, WordPress) – or configurable plugins or
> extensions for these systems – could handle searching for existing
> paraphrases, adding new facts or claims (if needed) to P2P
> filesharing networks, obtaining the data for use in the
> see attributes, caching these data, and generating markup.
>
> ------------------------------------------------------------------------
> *From:* Emelia S. <emelia@brandedcode.com>
> <mailto:emelia@brandedcode.com>
> *Sent:* Monday, January 13, 2025 11:21 AM
> *To:* Evan Prodromou <evan@prodromou.name>
> <mailto:evan@prodromou.name>
> *Cc:* public-swicg@w3c.org <mailto:public-swicg@w3c.org>
> <public-swicg@w3c.org> <mailto:public-swicg@w3c.org>
> *Subject:* Re: Fact-checking and community notes on the Fediverse
> This is already something on the list of things that the
> ActivityPub Trust & Safety Taskforce is working on:
>
> 4.png
> <https://github.com/swicg/activitypub-trust-and-safety/issues/4>
> Idea: Annotations / Labeling of content · Issue #4 ·
> swicg/activitypub-trust-and-safety
> <https://github.com/swicg/activitypub-trust-and-safety/issues/4>
> github.com
> <https://github.com/swicg/activitypub-trust-and-safety/issues/4>
>
>
> The Web Annotations model could work, but the discovery of
> annotations that exist is the hardest part, I've started solving
> that in https://github.com/ThisIsMissEm/annotations-service
> <https://github.com/ThisIsMissEm/annotations-service> where I use
> the sha256 hash of the Object ID as the annotation collection ID,
> giving a very simple way to fetch all annotations for a given object.
>
> I do want to investigate what an Annotate activity would look
> like, but I suspect this would just be an announcement of sorts
> "hey, there's this web annotation over here for this target"
>
> Yours,
> Emelia
>
> On 13 Jan 2025, at 04:23, Evan Prodromou <evan@prodromou.name>
> <mailto:evan@prodromou.name> wrote:
>
> We don't have an easy way for remote actors to annotate
> content on the Fediverse.
>
> The biggest use case for this is to have permissionless
> fact-checking or community notes. A fact-checking service
> could annotate a remote content object like a Note or a Video
> with additional fact-checking information, and compliant
> clients or servers could show the fact-checking information
> when showing the Note.
>
> I think there are some tricky parts to this structure, which I
> believe suggests that we should start working on it.
>
> Evan
>
>
>
Received on Thursday, 23 January 2025 19:32:17 UTC