Re: Trust.txt: Why another random .txt when we've got WebFinger and well-known URIs? from Adeel on 2021-08-02 (public-credibility@w3.org from August 2021)

From: Adeel <aahmad1811@gmail.com>
Date: Mon, 2 Aug 2021 01:27:15 +0100
To: Bob Wyman <bob@wyman.us>
Cc: Scott Yates <scott@journallist.net>, Credible Web CG <public-credibility@w3.org>
Message-ID: <CALpEXW0Wdf=t7AYOukOsZM=5Bie0fYHfYmzxfqtLUnrD1Wfu2w@mail.gmail.com>
Hello,

Why not take the credibility indicators and credibility signals into a
schema.org extension that way it can be used as an enrichment source for
web pages. Obviously, this may require further refinement of the
specification. Personally, I prefer more thoroughness vs more conciseness
of a specification coverage so I liked the credibility indicators a lot
more. Following on from that, try to collaborate more with other W3C
community efforts so there is synergy for a credibility standard. The
greater the uptake the more feedback it can receive from the wider
community, the better abstractions that can be attained for further
refinement. Then further efforts for a skos/jsonld based implementation.
And, possibly from the original discussions towards a browser extension.
Although, browser extension like for ads may be tricky as users could set
filters to avoid it altogether. Or, separate the approach between a sparse
credibility case vs a dense domain-specific credibility case. Perhaps,
building something similar to mediacloud where the credibility indicators
and signals can add a mechanism for analytics, discourse analysis, fact
checking, collaborative annotations and a cumulative credibility trust
score. Then later down the line adding in a blockchain authorship of trust
to map out the entire timeline of a credibility narrative for a given
event, topic, or as schema.org defines a "creativework" of which article is
one type. This will further assist anyone that needs to do a social network
analysis on a given credibility case or use it with something like
pagerank. There also needs to be a distinction made between disinformation
and misinformation. As this is a W3C community group, the efforts in the
credibility cases should refer back to utilizing the W3C standards when
creating a specification and a reference implementation so they are
extensible and easier to integrate for multiple use cases. I already find
ads.txt and robots.txt annoying. I certainly wouldn't want another .txt
file added into the mix. Plus, they get tricky with embedded linkage when
one looks at it from the point of view of a crawler - inlinks and outlinks.

Thanks,

Adeel

On Mon, 2 Aug 2021 at 00:20, Bob Wyman <bob@wyman.us> wrote:

> You wrote:
>
>> "On page 8 of the spec., we encourage the use of "/well-known" so we are
>> clearly not against that." (link added)
>
> The trust.txt spec
> <https://journallist.net/reference-document-for-trust-txt-specifications>
> says, on page 8:
>
>> "In addition to the access method noted above, use of the “Well Known
>> Uniform Resource Identifiers” is recommended."
>
> So, the spec says that providing the file with a "/.well-known/" prefix is
> optional and should only be done if the file has also been provided without
> a prefix. As a result, there is absolutely no utility in having a copy of
> the file prefixed by "/.well-known/." Any smart coder would simply ignore
> that there might be a second copy of the file. In fact, one might argue
> that if a "well-known" file is found, but an unprefixed one is not found,
> the prefixed copy should be ignored since it may be that the site's intent
> was to delete the file, and they simply forgot to delete its copy. In any
> case, it is generally not a good idea, when defining protocols, to require
> or even recommend that data be provided in more than one place. The typical
> statement is something like: "If data is found in more than one place, it
> is probably wrong in all of them..."
>
> It would be very useful if the spec could be updated to *require* that
> only one copy of the file should be provided and that it should be provided
> with the "/.well-known/" prefix.
>
> Also, you wrote:
>
>> "The short answer to why we went with a text file is that we are working
>> with some extremely unsophisticated publishers."
>
> I sympathize with your concern for the unsophisticated publisher. However,
> any difficulty that might exist in the production of a more complex file
> would be easily overcome by providing a trivial web form that allowed
> "fill-in-the-blank" simplicity. The produced file could then be simply
> copied to the appropriate location. After all, we've moved beyond the time
> when everyone was expected to be able to edit files manually. Publishers
> deal daily with xml, html, css, js, pdf, etc. files that only a masochist
> would seek to edit by hand. Anyone with enough capacity to maintain the Hays
> Free Press <https://haysfreepress.com/> site, is savvy enough to either
> produce a WebFinger file on their own or to copy the output from a simple
> web form.
>
> Allowing protocols to be limited to the low-bar of "unsophisticated" users
> means that we're not able to provide "sophisticated" solutions when they
> are needed. Over decades of experience with protocol and data format
> design, we've learned that simple approaches inevitably lose their charm
> after they have been in the field for some time. Users inevitably discover
> new capabilities that they want to support. Requirements that were once
> quite simple and well understood tend to become more complex and subtle as
> time passes. Rather than waiting to discover the inadequacies of simple
> formats, it makes a great deal of sense to initially rely on well-known
> standard formats that allow extension, versioning, etc. Most "protocol
> definers" should be focused on how to extend or exploit existing formats
> while leaving the job of format definition to others who specialize in such
> problems.
>
> For instance, the W3C Credible Web Community Group
> <https://www.w3.org/community/credibility/> has defined a number of
> signals that, I assume, a site might wish to self-assert in a discoverable,
> well-known location. However, none of these signals are supported by the
> trust.txt format. It seems to me that these signals could be usefully
> included in a WebFinger file. These signals include:
>
>    - Date Website First Archived
>    - Corrections Policy
>    - Any Award
>    - Pulitzer Prize Recognition
>    - RNG Awards
>
> The choice here is: Should we call for trust.txt to be updated to include
> these signals, and any others that might be defined in the future, or,
> should we simply provide definitions of the JSON Resource Descriptors
> (JRD's) or other encodings and thus, by implication enable those signals to
> be supported in any format that supports those encodings? I suggest that
> the more useful approach is to define what the signals mean and how they
> should be encoded and then rely others to find the various places where
> those encodings would be most useful. I this approach had been used in
> defining trust.txt, then all the various signals supported there, which are
> not defined by CredWeb, would be easily used by anyone who is also using
> CredWeb signals. (Being able to say: "I control the website xxx.xxx." is
> useful in more contexts than just that defined by trust.txt.)
>
> bob wyman
>
>
> On Sun, Aug 1, 2021 at 2:58 PM Scott Yates <scott@journallist.net> wrote:
>
>> Bob, and the group...
>>
>> Just to be clear, I am not running on a platform of trust.txt.
>>
>> On page 8 of the spec., we encourage the use of "/well-known" so we are
>> clearly not against that.
>>
>> The short answer to why we went with a text file is that we are working
>> with some extremely unsophisticated publishers. Take, for instance, the
>> publisher of the Hays Free Press, whom I met recently in Texas. She prints
>> news from her town on paper once a week, and maintains a website. As we all
>> know, when local news dies, news consumers fill in that vacuum with crap.
>> If she stops publishing, well, it would be bad, so we want to make things
>> as easy as possible for her and those like her doing the esteemable work of
>> keeping local journalism alive.
>>
>> In my conversation with her, she was willing to post a file
>> <https://haysfreepress.com/trust.txt> in part because she already knew
>> about ads.txt, and so this was familiar to her. If I tried to start telling
>> her about RFC 7033, I would have lost her for sure. You are certainly right
>> that JRDs would be technically superior, but robots.txt has been around for
>> 20+ years and the most entry-level web publisher knows about how it works.
>>
>>
>> Thank you for looking into trust.txt, and while I don't want people to
>> vote for me based on what they think of trust.txt, I think your question
>> serves as a useful model of why I am running. If you think that any new
>> proposal that is working to fix disinformation should follow, for example,
>> the most current standardized systems, you should voice that to the group.
>> If the group agrees, then that will be a part of how trust.txt -- and every
>> other effort out there -- will be evaluated.
>>
>> -Scott Yates
>> Founder
>> JournalList.net, caretaker of the trust.txt framework
>> 202-742-6842
>> Short Video Explanation of trust.txt <https://youtu.be/lunOBapQxpU>
>>
>>
>> On Sun, Aug 1, 2021 at 11:53 AM Bob Wyman <bob@wyman.us> wrote:
>>
>>> Scott Yates, in his statement of candidacy
>>> <https://lists.w3.org/Archives/Public/public-credibility/2021Aug/0000.html>,
>>> includes a description of the trust.txt file
>>> <https://journallist.net/reference-document-for-trust-txt-specifications>
>>> .
>>>
>>> Please explain why it makes sense to introduce yet-another .txt file (in
>>> addition to robots.txt and ads.txt) when we have established procedures to
>>> allow those who control URIs to make statements supported by that control.
>>> For instance, RFC 5785 <https://datatracker.ietf.org/doc/html/rfc5785> defines
>>> the "/.well-known/" path prefix for "well-known locations" which are
>>> accessed via URIs. It seems to me that if one were to publish a trust.txt
>>> file, then it should be at the location "/.well-known/trust.txt" That does
>>> not seem to be the current proposal. Why are existing standards not being
>>> followed?
>>>
>>> It also seems to me that the proposed file format is an unnecessary
>>> departure from existing standards such as RFC 7033
>>> <https://datatracker.ietf.org/doc/html/rfc7033>, which defined
>>> WebFinger, a mechanism that could be easily used to carry the data which
>>> the proponents of trust.txt seek to make available. To make WebFinger do
>>> what trust.txt intends, it would be only necessary to register a few new
>>> JSON Resource Descriptors (JRDs), properties, or link-relations (i.e.
>>> belong-to, control, social, member, etc.). This sort of extension is
>>> provided for in the definition of RFC 7033 and in RFC 5988
>>> <https://datatracker.ietf.org/doc/html/rfc5988>, which defines "Web
>>> Linking" mechanisms. Note: The existing set of defined link-relations can
>>> be found in the IANA maintained link-relations registry
>>> <https://www.iana.org/assignments/link-relations/link-relations.xhtml>.
>>>
>>> While there will be a never-ending need to add support for new kinds of
>>> standardized statements, discoverable in well-known locations, I think we
>>> should be careful to ensure that new kinds of statements make use of
>>> existing standards rather than define entirely new mechanisms. I can't see
>>> anything in the trust.txt specification that actually requires a unique,
>>> non-standard approach that is not already supported by the various
>>> standards referenced above.
>>>
>>> bob wyman
>>>
>>>
Received on Monday, 2 August 2021 00:27:41 UTC