Re: Fact-checking and community notes on the Fediverse from Aaron Gray on 2025-01-20 (public-swicg@w3.org from January 2025)

From: Aaron Gray <aaronngray@gmail.com>
Date: Mon, 20 Jan 2025 14:53:28 +0000
To: Adam Sobieski <adamsobieski@hotmail.com>
Cc: "Emelia S." <emelia@brandedcode.com>, Evan Prodromou <evan@prodromou.name>, "public-swicg@w3c.org" <public-swicg@w3c.org>
Message-ID: <CAKXmGHAb2robs+VGqHn1eMfRxHWVZBiiAX=NHev8wYhG7Kk7tw@mail.gmail.com>
On Mon, 20 Jan 2025 at 13:45, Adam Sobieski <adamsobieski@hotmail.com>
wrote:

> Aaron,
> All,
>
> Should these broader topics interest you, there is a new *Decentralized
> Fact-checking & Provenance Organization (DeFacto) Community Group: *
> https://www.w3.org/community/defacto/ . The Chair of the new Group is
> interested in blockchain-based solutions [1].
>

Adam,

I have to say being green and concerned about energy usage and the
environment that I am highly against the use and proliferation of
blockchain technology as an answer and really feel it should not be
encouraged at all as as it grows along with BitCoin it will have a larger
carbon footprint than most smaller countries. We saw the chaos it caused
with BitCoin mining.

Basically doing multiple concurrent repeated operations of Merkle Trees of
double SHA256's for each block is insanity. The usage of Proof of Work in a
concurrent race state is like using a hammer to crack a nut when we could
have a system like the DNS root server system in place instead providing a
trusted system built on verified procedures. A chain of RSA signatures of
SHA512 of blocks should suffice for the nbase mechanism, and done by
dedicated hardware would have a very small carbon footprint that could be
offset by using a green electricity source like Google does.

Onto getting back to the fact checking issue at hand, yes employing a
decentralized approach is good, and putting the standards, protocols,
tools, and procedures in place is a step in the right direction.

However I can see one very simple way of fact checking and that is the use
of links to subsuming articles or scientific papers. By subsuming or
subsumption in the formal sense of the word I mean that the subsuming item
or items covers the material the original post article does but has a wider
and more formal basis in terms of the area of concern. This limits it to
simply matching either a URL link to a URL link or links, or a cut and
paste piece of text that can be SHA256'ed to a URL or URL's. Which can be
automated.

Are, but the devil 'time' raises his head ! There is always the event of
new information coming to light so this system has to be dynamic and not
simply deposit recorded URL's. Ideally this is done at serving time or a
time stamp that can be checked to see if the fact checked URL's no longer
suffice.

I see this as a possible mechanism and the URL's could always point to
constructed fact checked articles that are kept uptodate and have a
recorded modification history too.

Kind regards,

Aaron


>
> Best regards,
> Adam
>
> [1] https://fact.technology/
>
> ------------------------------
> *From:* Aaron Gray <aaronngray@gmail.com>
> *Sent:* Monday, January 20, 2025 12:39 AM
> *To:* Adam Sobieski <adamsobieski@hotmail.com>
> *Cc:* Emelia S. <emelia@brandedcode.com>; Evan Prodromou <
> evan@prodromou.name>; public-swicg@w3c.org <public-swicg@w3c.org>
> *Subject:* Re: Fact-checking and community notes on the Fediverse
>
>
> Adam,
>
> I am seriously of the opinion in any complex domain it takes an expert to
> make the real value judgements. Having said that detailed dissection of a
> text or text from audio and analysis by some form of grounding may allow
> analysis.
>
> But you have to remember everything is only a construct and even if all
> the facts are correct,  we have to remember that even science does not have
> facts, only theories that are checked against experiment. Given this,
> therefore what we are actually dealing with hypothetical constructs to
> burrow sciences way of analysing the world and applying that.
>
> This puts us in a situation where something might "have all the facts
> correct" but may not be correct in itself, it's a construct, and it may
> have been constructed to mislead or may be constructed by someone who is
> not aligned with reality or suffers from the alignment problem, to burrow
> from AI. Or they might quite simply not have all the facts.
>
> Now does the fact checker have all the facts, can we even check all the
> facts, and who delineates the truth in the end. If we claim the ultimate
> truth and we are not aligned with reality then we are only misleading.
>
> To reiterate, I am seriously of the opinion in any complex domain it takes
> an expert. And if an expert system like science and scientists make the
> wrong call, either because they are owned, it bought or influenced by
> politics or circumstance, then the whole system maybe devalued by the
> general public, who ever they are now
>
> I rest my case, this thing is really complicated and we need to tread
> carefully tools can be misused and are a double edged sword.
>
> Sorry I did not answer your question but stepped back a bit into science
> and the edge of philosophy, but I think we need to bear in mind the wider
> context before and as we step forward.
>
> Regards,
>
> Aaron
>
> On Mon, 20 Jan 2025, 02:15 Adam Sobieski, <adamsobieski@hotmail.com>
> wrote:
>
> Aaron,
>
> Yes, the pandemic did trigger much interest in fact-checking. I don't know
> whether interest is waning or not or, for that matter, in which situations
> that end-users would choose to make use any of these features that we're
> brainstorming and discussing.
>
> Beyond the pandemic and the related topics of the accuracy of information
> during crises and emergencies, interesting use cases include assuring the
> accuracy of public-sector speeches, debates, and meetings.
>
> Maybe, someday, there will be real-time fact-checking for orators'
> debates? Maybe, someday, legislators or their staffers will be able to make
> use of real-time fact-checking technologies using their smartphones?
>
> P2P-based approaches for annotations might answer some questions that were
> presented (searching for annotations) while creating yet more questions.
> For instance, with respect to fact-checking, I'm not yet sure about what
> the UX would be when a fact or claim were contested, when there were
> thousands of annotations supporting a fact or claim and thousands opposing
> it simultaneously. This might display, instead of a green checkmark or a
> red x, a yellow warning indicator. Mindful of the pandemic and the points
> that you raised, what sorts of dashboards can be envisiond for end-users to
> explore contested or disputed facts or claims?
>
> Meanwhile, the *Citation Needed* project [1] presents an entirely
> different approach to fact-checking, one involving AI and Wikipedia. Which
> kinds of responses should such a system provide to end-users, I wonder,
> when it can find content both supporting and opposing facts or claims on
> Wikipedia? This might segue from fact-checking to argumentation and to
> hedging, listing alternatives (e.g., true, false) and providing support for
> each alternative.
>
> Thank you. Any thoughts on these points?
>
>
> Best regards,
> Adam
>
> [1]
> https://meta.wikimedia.org/wiki/Future_Audiences/Experiment:Citation_Needed
>
> ------------------------------
> *From:* Aaron Gray <aaronngray@gmail.com>
> *Sent:* Sunday, January 19, 2025 6:57 PM
> *To:* Adam Sobieski <adamsobieski@hotmail.com>
> *Cc:* Emelia S. <emelia@brandedcode.com>; Evan Prodromou <
> evan@prodromou.name>; public-swicg@w3c.org <public-swicg@w3c.org>
> *Subject:* Re: Fact-checking and community notes on the Fediverse
>
> I think a lot of the issues we are dealing with need to be addressed with
> at source and are educational, social, political, nutritional, and drug
> related.
>
> Putting fact checking on things means :-
>
> a) your fact checking has to be correct, which often it's not.
> b) it has to be objective and not oppionated.
> c) it has to be well researched and well presented to _any_ audience.
> d) it has to be read, understood, and accepted.
>
> All of these are subject to cognitive biases. Wikipedia gives a good long
> list that all need to be considered :-
>
> https://en.m.wikipedia.org/wiki/List_of_cognitive_biases
>
> Quite frankly I think you are wasting your time most people don't read the
> stuff and it's got a reputation for being incorrect whether it is or not.
> So most of your target audience are either already educated and aware
> anyway or are not and just ignore it anyway. Most people on social media
> use emotions over intellect to judge things anyway and are subject to both
> confirmation bias and an echo chambered existence.
>
> The problems with COVID-19 for example were :-
> a) most people did not have sufficiently high enough levels of Vitamin D.
> b) the authorities wanted us to stay in and not get enough sunlight and
> fresh air
> c) most people drink milk and animal fats. Lactic and animal fats
> harbour Coronavirus.
> d) most people in ICU's had either  comorbidities, were overweight, or had
> genetic disposition with hACE2 receptors.
> e) were black or Hispanic nurses pushed to the attack surface in ICU's in
> hospitals on their feet for excessive periods dealing with COVID-19
> patients with airborne SARS-CoV-2 virii in close conditions with
> insufficient PPE.
> f) the people we were trying to protect were the elderly, people with
> comorbidities, people with immune conditions, or on immunosuppressants, or
> had genetic predispositions like the black population with hACE2 alleles.
> g) There are simple ways to help combat mRNA virii, like being young and
> having lots of siRNA's in your cell cytoplasm, having sex often and having
> lots of siRNA in your cellular cytoplasm, taking Vitamin C, D, Alpha Lipoic
> Acid and Quercetin if you have COVID-19.
>
> Now fact check that for example, you would not have found out this
> information without having run a COVID-19 group and/or read all the
> scientific literature on COVID-19 and SARS-CoV2. BTW this list is actually
> a lot lot longer but you get the idea. Now if you post that list you will
> get fact checked incorrectly despite it all being well researched mainly
> from PubMed accessible leading peer reviewed papers.
>
> This is what triggered all the fact checking in the first place.
>
> My 2 cents worth.
>
> Aaron
>
> On Tue, 14 Jan 2025, 23:32 Adam Sobieski, <adamsobieski@hotmail.com>
> wrote:
>
> Social Web Incubator Community Group,
>
> Hello. I am pleased to share some preliminary brainstorming and ideas
> about decentralized fact-checking and argumentation using P2P filesharing
> networks.
> Hopefully some of the following ideas can be of use for the Fediverse,
> e.g., for the discovery of existing annotations.
>
> Introduction With respect to sharing Web Annotations, uses of P2P
> networks have been previously explored (Segawa, 2006). Providing users with
> access to these kinds of networks from their Web browsers, today, is
> possible with WebRTC (Werner & Vogt, 2014; Ersson & Siri, 2015).
> P2P filesharing networks could be of use for decentralized fact-checking
> and argumentation. Facts or claims could be stored in entries, a special
> kind of file resource.
> By creating and sharing digitally-signed user feedback, notes, comments,
> or annotations with respect to those facts or claims in entries, users
> could express their determinations with respect to the veracity of facts or
> claims and could also present arguments for or against them (Bex, Snaith,
> Lawrence, & Reed, 2014).
> Entries could contain one or more references to paraphrases of content
> from locations on the Fediverse (see: Appendix A). Annotation objects from
> the Fediverse could be indexed and redundantly stored on P2P filesharing
> networks.
> Uses of Embedding Vectors
> Instead of, or in addition to, using cryptographic hashes to index and
> address content on P2P networks, digitally-signed entries for facts or
> claims could be indexed and addressed using embedding vectors (Zaarour &
> Curry, 2022).
> As considered, entries would be a special kind of file resource where
> their embedding vectors, embedding vectors verifiably for selections of
> other resources' contents, would be stored inside of them (see: Appendix A)
> rather than obtained from processing them with AI models.
> Indexing and addressing entries thusly would allow them to be merged or
> wrapped, e.g., to add paraphrases, digitally signing them at each step,
> without having to reindex them. Modifications, however, would result in
> changes to entries' cryptographic hashes.
> Deep learning can be used to detect and identify sentential paraphrases
> (Zhou, Qiu, Liang, & Acuna, 2022). More elaborate uses of language models
> could be utilized for inquiring and reasoning about whether sentences
> occurring in contexts were paraphrases.
> With respect to fact-checking on the Web, scenarios to consider include
> both fact-checking content which was expressly indicated to be a fact or
> claim by their authors, e.g., using custom elements, and fact-checking
> arbitrary selections of documents' content.
> Explorations with respect to fact-checking arbitrary selections of content
> include the open-source Citation Needed project by the Future Audiences
> team of the Wikimedia Foundation.
> The Prompt API
> Exploration is underway into providing APIs for accessing language models
> in Web browsers; the Web Machine Learning Working Group is developing the
> Prompt API.
> With access to language models in Web browsers, users might be able to
> obtain embedding vectors for portions of content in Web documents. These
> embedding vectors could be used to search for other content, e.g.,
> annotations, including on P2P networks.
> Custom Elements HTML5 custom elements could allow facts or claims to be
> expressed in documents, e.g., to add visual indictors near them or enable
> special context menus for them, while specifying values for embedding
> vectors computed for them using AI models (see: Appendix C). Appendices
> Appendix A shows a markup sketch for an entry, a created entry wrapped to
> add a paraphrase to it.
> Appendix B shows that embedding vectors could be added to Magnet URIs and
> Metalinks.
> Appendix C shows that HTML5 custom elements could be used for asserted
> facts or claims which refer to entries on P2P networks by means of one or
> more embedding vectors.
> Appendix D shows an approach involving shortcodes for authors using
> content-management systems to be able to easily add facts or claims to
> their content.
> Bibliography
> Bex, Floris, Mark Snaith, John Lawrence, and Chris Reed. "ArguBlogging: An
> application for the argument web." *Journal of Web Semantics* 25 (2014):
> 9-15. https://www.sciencedirect.com/science/article/pii/S1570826814000079
> Ersson, Kerstin, and Persson Siri. "Peer-to-peer distribution of web
> content using WebRTC within a web browser." (2015).
> https://www.diva-portal.org/smash/get/diva2:819420/FULLTEXT01.pdf
> Segawa, Osamu. "Web annotation sharing using P2P." In *Proceedings of the
> 15th international conference on World Wide Web*, pp. 851-852. 2006.
> http://ra.ethz.ch/CDstore/www2006/devel-www2006.ecs.soton.ac.uk/programme/files/pdf/p45.pdf
> Werner, Max Jonas, and Christian Vogt. "Implementation of a browser-based
> P2P network using WebRTC." *Hamburg* (2014).
> https://inet.haw-hamburg.de/teaching/ws-2013-14/master-project/Prj1-report-werner-vogt.pdf
> Zaarour, Tarek, and Edward Curry. "SemanticPeer: A distributional semantic
> peer-to-peer lookup protocol for large content spaces at internet-scale." *Future
> Generation Computer Systems* 132 (2022): 239-253.
> https://www.sciencedirect.com/science/article/pii/S0167739X22000590
> Zhou, Chao, Cheng Qiu, Lizhen Liang, and Daniel E. Acuna. "Paraphrase
> identification with deep learning: A review of datasets and methods." *arXiv
> preprint arXiv:2212.06933* (2022). https://arxiv.org/pdf/2212.06933
>
>
> Appendix A: Sketch of an Entry for a Fact or Claim
>
> <action kind="add-paraphrase">
>
>   <base>
>
>     <action kind="create">
>
>       <base />
>
>       <time>2024-01-14T00:01:00Z</time>
>
>       <v id="v-1" model=" urn:ai:model:llama:3.2:90B">...</v>
>
>       <metalink id="source-1">
>
>         <file name="article1.html">
>
>           <url>https://www.example1.com/user1/article1.html</url>
>
>         </file>
>
>       </metalink>
>
>       <selection source="source-1">
>
>         ... <select v="v-1">A sentence.</select> ...
>
>       </selection>
>
>       <signature>...</signature>
>
>     </action>
>
>   </base>
>
>   <time>2024-01-14T00:00:00Z</time>
>
>   <v id="v-2" model="urn:ai:model:llama:3.3:70B">...</v>
>
>   <metalink id="source-2">
>
>     <file name="article2.html">
>
>       <url>https://www.example2.com/user2/article2.html</url>
>
>     </file>
>
>   </metalink>
>
>   <selection source="source-2">
>
>     ... <select v="v-1 v-2">A paraphrase.</select> ...
>
>   </selection>
>
>   <signature>...</signature>
>
> </action>
>
>
> Appendix B: Adding Embedding Vectors to Magnet URIs and Metalinks Embedding
> vectors could be added to Magnet URIs by means of adding a key: xv.
> Embedding vectors could be new components of metalinks.
> <metalink xmlns="urn:ietf:params:xml:ns:metalink">
>   <published>2009-05-15T12:23:23Z</published>
>   <file name="example.txt">
>     <url>http://www.example.com/example.txt</url>
>     <vector model="urn:ai:model:llama:3.3:70B">...</vector>
>   </file>
> </metalink>
>
> Appendix C: Custom Elements for Facts or Claims A custom element could be
> used to signify an asserted fact or claim, referring to an entry on a P2P
> network by means of embedding vectors alongside other information. Via a
> JavaScript library, and perhaps WebRTC, clients could participate in P2P
> networks and retrieve entries, feedback on entries, or both.
> Notice that, for the special file type of entries, those embedding vectors
> within them and not of the XML file itself are utilized with respect to
> storing and addressing the resource on P2P networks.
> <verifiable-claim see="magnet:?xv=...">Ut enim ad minim veniam, quis
> nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
> consequat.</verifiable-claim>
> Appendix D: Content Authoring with Shortcodes How might authors easily
> add facts or claims to their content? With respect to popular
> content-management systems, the syntax for so doing could resemble that of
> existing shortcodes like [quote].
> [claim]Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
> nisi ut aliquip ex ea commodo consequat.[/claim]
> During content-publishing processes, authors' content-management systems
> (e.g., Drupal, WordPress) – or configurable plugins or extensions for these
> systems – could handle searching for existing paraphrases, adding new facts
> or claims (if needed) to P2P filesharing networks, obtaining the data for
> use in the see attributes, caching these data, and generating markup.
>
> ------------------------------
> *From:* Emelia S. <emelia@brandedcode.com>
> *Sent:* Monday, January 13, 2025 11:21 AM
> *To:* Evan Prodromou <evan@prodromou.name>
> *Cc:* public-swicg@w3c.org <public-swicg@w3c.org>
> *Subject:* Re: Fact-checking and community notes on the Fediverse
>
> This is already something on the list of things that the ActivityPub Trust
>  & Safety Taskforce is working on:
>
> [image: 4.png]
> <https://github.com/swicg/activitypub-trust-and-safety/issues/4>
>
> Idea: Annotations / Labeling of content · Issue #4 ·
> swicg/activitypub-trust-and-safety
> <https://github.com/swicg/activitypub-trust-and-safety/issues/4>
> github.com
> <https://github.com/swicg/activitypub-trust-and-safety/issues/4>
>
> The Web Annotations model could work, but the discovery of annotations
> that exist is the hardest part, I've started solving that in
> https://github.com/ThisIsMissEm/annotations-service where I use the
> sha256 hash of the Object ID as the annotation collection ID, giving a very
> simple way to fetch all annotations for a given object.
>
> I do want to investigate what an Annotate activity would look like, but I
> suspect this would just be an announcement of sorts "hey, there's this web
> annotation over here for this target"
>
> Yours,
> Emelia
>
> On 13 Jan 2025, at 04:23, Evan Prodromou <evan@prodromou.name> wrote:
>
> We don't have an easy way for remote actors to annotate content on the
> Fediverse.
>
> The biggest use case for this is to have permissionless fact-checking or
> community notes. A fact-checking service could annotate a remote content
> object like a Note or a Video with additional fact-checking information,
> and compliant clients or servers could show the fact-checking information
> when showing the Note.
>
> I think there are some tricky parts to this structure, which I believe
> suggests that we should start working on it.
>
> Evan
>
>
>
>

-- 
Aaron Gray - @AaronNGray@fosstodon.org | @aaronngray@threads.net |
@AaronNGray@Twitter.com

Independent Open Source Software Engineer, Computer Language Researcher and
Designer, Amateur Type Theorist, Amateur Computer Scientist,
Environmentalist and Climate Science Researcher and Disseminator.
Received on Monday, 20 January 2025 14:53:51 UTC