Re: Trust.txt: Why another random .txt when we've got WebFinger and well-known URIs? from Bob Wyman on 2021-08-02 (public-credibility@w3.org from August 2021)

From: Bob Wyman <bob@wyman.us>
Date: Sun, 1 Aug 2021 23:19:08 -0400
To: Adeel <aahmad1811@gmail.com>, Credible Web CG <public-credibility@w3.org>
Message-ID: <CAA1s49XwCSBtzVNhjY+ZbxtJQojC+4j4xV_eFzcYYD6qse0G8A@mail.gmail.com>
Adee.
You wrote:

> "try to collaborate more with other W3C community efforts so there is
> synergy for a credibility standard."

I think you are right that greater cooperation with other W3C efforts would
be useful. I'm sure that there has been some such cooperation, but I'm new
to the group and thus don't know the full history. Hopefully, any such
cooperation will continue and grow in the future. What other efforts do you
think this group should work with? I suggest that the Credentials, Social,
and Annotation groups would all be important.

I've been somewhat concerned about the current Credibility Signals document
since it defines a number of signals that seem to be things that should be
communicated by reference to the sort of verifiable credential that is
being defined by the Credentials group
<https://www.w3.org/community/credentials/>. I may be missing something,
but it doesn't seem obvious how one would use the Award, Pulitzer Price, or
RNG Prize signals to reference a Credential if some organization had issued
one. (I see that on could do a search of the files referenced in the
document's examples, but that is kind of crude. e.g.
https://rngfoundation.com/awards/pastawards/2018.html ) It would be much
more elegant to link to or embed a credential if available.

Another example of the use of credentials comes from studying the trust.txt
specification. It is fine to allow a site to declare that it controls some
other site, but that claim is not really credible if the controlled site
hasn't published a "controlledby" property to confirm the statement. An
alternative method for handling the statement of this relationship would be
to have the sites publish credentials that express that relationship. Such
credentials would be more useful than the trust.txt mechanism since those
credentials would still have informative value when copied outside the
well-known locations where they might initially be found. (i.e. I could
write an email saying: "I believe site X controls site Y because I found
the enclosed credential on 1 Aug 2021.")

There are many other groups that can contribute to addressing the
credibility problem. You're right to stress that point.

bob wyman


On Sun, Aug 1, 2021 at 8:27 PM Adeel <aahmad1811@gmail.com> wrote:

> Hello,
>
> Why not take the credibility indicators and credibility signals into a
> schema.org extension that way it can be used as an enrichment source for
> web pages. Obviously, this may require further refinement of the
> specification. Personally, I prefer more thoroughness vs more conciseness
> of a specification coverage so I liked the credibility indicators a lot
> more. Following on from that, try to collaborate more with other W3C
> community efforts so there is synergy for a credibility standard. The
> greater the uptake the more feedback it can receive from the wider
> community, the better abstractions that can be attained for further
> refinement. Then further efforts for a skos/jsonld based implementation.
> And, possibly from the original discussions towards a browser extension.
> Although, browser extension like for ads may be tricky as users could set
> filters to avoid it altogether. Or, separate the approach between a sparse
> credibility case vs a dense domain-specific credibility case. Perhaps,
> building something similar to mediacloud where the credibility indicators
> and signals can add a mechanism for analytics, discourse analysis, fact
> checking, collaborative annotations and a cumulative credibility trust
> score. Then later down the line adding in a blockchain authorship of trust
> to map out the entire timeline of a credibility narrative for a given
> event, topic, or as schema.org defines a "creativework" of which article
> is one type. This will further assist anyone that needs to do a social
> network analysis on a given credibility case or use it with something like
> pagerank. There also needs to be a distinction made between disinformation
> and misinformation. As this is a W3C community group, the efforts in the
> credibility cases should refer back to utilizing the W3C standards when
> creating a specification and a reference implementation so they are
> extensible and easier to integrate for multiple use cases. I already find
> ads.txt and robots.txt annoying. I certainly wouldn't want another .txt
> file added into the mix. Plus, they get tricky with embedded linkage when
> one looks at it from the point of view of a crawler - inlinks and outlinks.
>
> Thanks,
>
> Adeel
>
> On Mon, 2 Aug 2021 at 00:20, Bob Wyman <bob@wyman.us> wrote:
>
>> You wrote:
>>
>>> "On page 8 of the spec., we encourage the use of "/well-known" so we are
>>> clearly not against that." (link added)
>>
>> The trust.txt spec
>> <https://journallist.net/reference-document-for-trust-txt-specifications>
>> says, on page 8:
>>
>>> "In addition to the access method noted above, use of the “Well Known
>>> Uniform Resource Identifiers” is recommended."
>>
>> So, the spec says that providing the file with a "/.well-known/" prefix
>> is optional and should only be done if the file has also been provided
>> without a prefix. As a result, there is absolutely no utility in having a
>> copy of the file prefixed by "/.well-known/." Any smart coder would simply
>> ignore that there might be a second copy of the file. In fact, one might
>> argue that if a "well-known" file is found, but an unprefixed one is not
>> found, the prefixed copy should be ignored since it may be that the site's
>> intent was to delete the file, and they simply forgot to delete its copy.
>> In any case, it is generally not a good idea, when defining protocols, to
>> require or even recommend that data be provided in more than one place. The
>> typical statement is something like: "If data is found in more than one
>> place, it is probably wrong in all of them..."
>>
>> It would be very useful if the spec could be updated to *require* that
>> only one copy of the file should be provided and that it should be provided
>> with the "/.well-known/" prefix.
>>
>> Also, you wrote:
>>
>>> "The short answer to why we went with a text file is that we are working
>>> with some extremely unsophisticated publishers."
>>
>> I sympathize with your concern for the unsophisticated publisher.
>> However, any difficulty that might exist in the production of a more
>> complex file would be easily overcome by providing a trivial web form that
>> allowed "fill-in-the-blank" simplicity. The produced file could then be
>> simply copied to the appropriate location. After all, we've moved beyond
>> the time when everyone was expected to be able to edit files manually.
>> Publishers deal daily with xml, html, css, js, pdf, etc. files that only a
>> masochist would seek to edit by hand. Anyone with enough capacity to
>> maintain the Hays Free Press <https://haysfreepress.com/> site, is savvy
>> enough to either produce a WebFinger file on their own or to copy
>> the output from a simple web form.
>>
>> Allowing protocols to be limited to the low-bar of "unsophisticated"
>> users means that we're not able to provide "sophisticated" solutions when
>> they are needed. Over decades of experience with protocol and data format
>> design, we've learned that simple approaches inevitably lose their charm
>> after they have been in the field for some time. Users inevitably discover
>> new capabilities that they want to support. Requirements that were once
>> quite simple and well understood tend to become more complex and subtle as
>> time passes. Rather than waiting to discover the inadequacies of simple
>> formats, it makes a great deal of sense to initially rely on well-known
>> standard formats that allow extension, versioning, etc. Most "protocol
>> definers" should be focused on how to extend or exploit existing formats
>> while leaving the job of format definition to others who specialize in such
>> problems.
>>
>> For instance, the W3C Credible Web Community Group
>> <https://www.w3.org/community/credibility/> has defined a number of
>> signals that, I assume, a site might wish to self-assert in a discoverable,
>> well-known location. However, none of these signals are supported by the
>> trust.txt format. It seems to me that these signals could be usefully
>> included in a WebFinger file. These signals include:
>>
>>    - Date Website First Archived
>>    - Corrections Policy
>>    - Any Award
>>    - Pulitzer Prize Recognition
>>    - RNG Awards
>>
>> The choice here is: Should we call for trust.txt to be updated to include
>> these signals, and any others that might be defined in the future, or,
>> should we simply provide definitions of the JSON Resource Descriptors
>> (JRD's) or other encodings and thus, by implication enable those signals to
>> be supported in any format that supports those encodings? I suggest that
>> the more useful approach is to define what the signals mean and how they
>> should be encoded and then rely others to find the various places where
>> those encodings would be most useful. I this approach had been used in
>> defining trust.txt, then all the various signals supported there, which are
>> not defined by CredWeb, would be easily used by anyone who is also using
>> CredWeb signals. (Being able to say: "I control the website xxx.xxx." is
>> useful in more contexts than just that defined by trust.txt.)
>>
>> bob wyman
>>
>>
>> On Sun, Aug 1, 2021 at 2:58 PM Scott Yates <scott@journallist.net> wrote:
>>
>>> Bob, and the group...
>>>
>>> Just to be clear, I am not running on a platform of trust.txt.
>>>
>>> On page 8 of the spec., we encourage the use of "/well-known" so we are
>>> clearly not against that.
>>>
>>> The short answer to why we went with a text file is that we are working
>>> with some extremely unsophisticated publishers. Take, for instance, the
>>> publisher of the Hays Free Press, whom I met recently in Texas. She prints
>>> news from her town on paper once a week, and maintains a website. As we all
>>> know, when local news dies, news consumers fill in that vacuum with crap.
>>> If she stops publishing, well, it would be bad, so we want to make things
>>> as easy as possible for her and those like her doing the esteemable work of
>>> keeping local journalism alive.
>>>
>>> In my conversation with her, she was willing to post a file
>>> <https://haysfreepress.com/trust.txt> in part because she already knew
>>> about ads.txt, and so this was familiar to her. If I tried to start telling
>>> her about RFC 7033, I would have lost her for sure. You are certainly right
>>> that JRDs would be technically superior, but robots.txt has been around for
>>> 20+ years and the most entry-level web publisher knows about how it works.
>>>
>>>
>>> Thank you for looking into trust.txt, and while I don't want people to
>>> vote for me based on what they think of trust.txt, I think your question
>>> serves as a useful model of why I am running. If you think that any new
>>> proposal that is working to fix disinformation should follow, for example,
>>> the most current standardized systems, you should voice that to the group.
>>> If the group agrees, then that will be a part of how trust.txt -- and every
>>> other effort out there -- will be evaluated.
>>>
>>> -Scott Yates
>>> Founder
>>> JournalList.net, caretaker of the trust.txt framework
>>> 202-742-6842
>>> Short Video Explanation of trust.txt <https://youtu.be/lunOBapQxpU>
>>>
>>>
>>> On Sun, Aug 1, 2021 at 11:53 AM Bob Wyman <bob@wyman.us> wrote:
>>>
>>>> Scott Yates, in his statement of candidacy
>>>> <https://lists.w3.org/Archives/Public/public-credibility/2021Aug/0000.html>,
>>>> includes a description of the trust.txt file
>>>> <https://journallist.net/reference-document-for-trust-txt-specifications>
>>>> .
>>>>
>>>> Please explain why it makes sense to introduce yet-another .txt file
>>>> (in addition to robots.txt and ads.txt) when we have established procedures
>>>> to allow those who control URIs to make statements supported by that
>>>> control. For instance, RFC 5785
>>>> <https://datatracker.ietf.org/doc/html/rfc5785> defines the
>>>> "/.well-known/" path prefix for "well-known locations" which are accessed
>>>> via URIs. It seems to me that if one were to publish a trust.txt file, then
>>>> it should be at the location "/.well-known/trust.txt" That does not seem to
>>>> be the current proposal. Why are existing standards not being followed?
>>>>
>>>> It also seems to me that the proposed file format is an unnecessary
>>>> departure from existing standards such as RFC 7033
>>>> <https://datatracker.ietf.org/doc/html/rfc7033>, which defined
>>>> WebFinger, a mechanism that could be easily used to carry the data which
>>>> the proponents of trust.txt seek to make available. To make WebFinger do
>>>> what trust.txt intends, it would be only necessary to register a few new
>>>> JSON Resource Descriptors (JRDs), properties, or link-relations (i.e.
>>>> belong-to, control, social, member, etc.). This sort of extension is
>>>> provided for in the definition of RFC 7033 and in RFC 5988
>>>> <https://datatracker.ietf.org/doc/html/rfc5988>, which defines "Web
>>>> Linking" mechanisms. Note: The existing set of defined link-relations can
>>>> be found in the IANA maintained link-relations registry
>>>> <https://www.iana.org/assignments/link-relations/link-relations.xhtml>.
>>>>
>>>> While there will be a never-ending need to add support for new kinds of
>>>> standardized statements, discoverable in well-known locations, I think we
>>>> should be careful to ensure that new kinds of statements make use of
>>>> existing standards rather than define entirely new mechanisms. I can't see
>>>> anything in the trust.txt specification that actually requires a unique,
>>>> non-standard approach that is not already supported by the various
>>>> standards referenced above.
>>>>
>>>> bob wyman
>>>>
>>>>
>
>
>
>
Received on Monday, 2 August 2021 03:19:35 UTC