Author, content and data authentication

After reading dozens of articles,  reports and commentaries on the current
deteriorating state of scientific, academic publishing, caused by the use
of artificial intelligence,  paper mills and the increasing overload of
peer reviewers, I was wondering if current available technologies could be
customized to solve (parts of) the problem.

I printed the full page of W3C proposed and current standards and found
three main angles to tackle this problem.

Digital IDs for authors and documents, credentialing and enhanced documents
(in casu ePubs) with semantic tagging.

But the current crisis with intellectual property issues, non-human works,
authenticity of content, and verifiable data in scientific publications,
text based publishing, images, audiovisual works, online news and streaming
media, including social media and text based messaging (email, SMS etc.)
caused by the utilization of artificial intelligence calls for a universal
approach for author, content and data authentication.

For scientific publishing and online repositories and libraries this is the
most urgent, but the entertainment industry, news and digital print
publishing also stand to benefit greatly by introducing such a general,
universal approach.

When content is submitted,  the question of authorship is obvious, but in
scientific publishing the authentication of research data is equally
important.
Most quality scientific publishers provide online publishing of articles
plus additional files, e.g. research data, but without proper
authentication.
And articles produced by paper mills regularly include names of alleged
co-authors or non-existent references.

When content is submitted for review, e.g an article for scientific
publication, a news item online, a post of text or audiovisual content or a
link to a live event for streaming content,  author and form of content
generation should be made explicit,  the latter indicating if, and what
artificial intelligence based tools were used to produce which aspects of
the content, and in case of data the ownership,  origin and authenticity of
the data.

Because artificial intelligence is blamed for creating low quality content,
of which authorship, origin and authenticity are often questionable,  such
a universal scheme could serve,  if adopted to separate quality content
from so-called "slop" improving the training data quality of large language
models, among others.

Adding credentials,  authenticator tags,  digital watermarks, digital
fingerprints to content for authorship,  content generation details and
data authentication could resolve a lot of problems.

And with the current arms race between traditional search engines and AI
plugged in browsers or AI browsers centering exactly on the quality of
searched and presented results to users, an Author, content and data
authentication set of standards could result in an improved use of the
current Internet and lead to in the extreme case quarantining, more
preferable ignoring, and even promoting the reduction of the production of
poor quality AI generated content for use on the Internet.

I am wondering if we could in the W3 constellation assume the task of
proposing such a framework and create the appropriate standards in
collaboration with other actors involved in the Internet ecosystem and
related industry sectors.

Milton Ponson
Rainbow Warriors Core Foundation
CIAMSD Institute-ICT4D Program
+2977459312
PO Box 1154, Oranjestad
Aruba, Dutch Caribbean

Received on Friday, 9 January 2026 20:40:18 UTC