what do we mean by "signal", was Re: CredWeb Plans, meeting tomorrow from Sandro Hawke on 2020-01-20 (public-credibility@w3.org from January 2020)

From: Sandro Hawke <sandro@w3.org>
Date: Mon, 20 Jan 2020 15:44:18 -0500
To: Leonard Rosenthol <lrosenth@adobe.com>, Greg Mcverry <jgregmcverry@gmail.com>
Cc: Credible Web CG <public-credibility@w3.org>
Message-ID: <e6ba83af-88cd-2f13-1193-33eb45a52ec3@w3.org>
Mostly we've just been using the email list for announcements, but in 
the interest of getting more discussion going, I'm going to offer a 
substantive reply, below.  Hopefully the Subject line above lets you 
know if you want to read this.

On 1/20/20 2:32 PM, Leonard Rosenthol wrote:
>
> > that "signal" has a long history in credibility research as a 
> "marker" to the consumer to indicate if a source is credible or not.
>
> >
>
> So that sounds like an alternative term for what I have been calling a 
> “claim” (based on the terminology from the Verifiable Claims (now 
> Credentials) WG at the W3C - https://www.w3.org/2017/vc/WG/). Yes?
>
> Does a “signal” use any sort of technology to ensure authenticity (eg. 
> hashes or signatures)?  Or is that out of scope or TBD??
>

I think the term "signal" is used somewhat loosely, a bit like 
"information".  Here's draft text from Credibility Signals: 
<https://credweb.org/signals/#h.94xsck7qz3ho>

    Our basic model is that an entity (human and/or machine) is
    attempting to make a credibility assessment — to predict whether
    something will mislead them or others — by carefully examining many
    different observable features of that thing and things connected
    with it, as well as information provided by various related or
    trusted sources.

    To simplify and unify this complex situation, with its many
    different roles, we model the situation as a set of observers, each
    using imperfect instruments to learn about the situation and then
    recording their observations using simple declarative statements
    agreed upon in advance. Because those statements are inputs to a
    credibility assessment process, we call them *credibility signals.* 
    (The term *credibility indicators* is sometimes also used.)

That's just draft text, and I don't know that everyone agrees with it. 
It also intentionally glosses over several levels of meaning that could 
have more precise terms.  Some of these levels will speak more to 
different kinds of people than others (#2 uses TCP, #4 uses RDF). For 
example, consider:

1. The age of an internet domain can related to its credibility. So we 
can talk about this being a signal, maybe called "age of domain". This 
kind of information, this "signal", is used informally in practice now, 
and perhaps in some automated systems. It's a signal that's moderately 
useful because it stops someone from trivially setting up hundreds of 
websites, but it's also not all that useful because (a) attackers can 
buy old domains on the aftermarket or buy them in advance and let them 
sit before using them, and (b) it could penalize legitimate sources. In 
this sense, a "signal" is general concept of a type of information.

2. There are some protocols for finding out the age of a domain. For 
instance, I can open a TCP connection to port 43 of 
whois.verisign-grs.com, send "nytimes.com<CRLF>" and get back text that 
includes the line "   Creation Date: 1994-01-18T05:00:00Z". We might 
call this "credibility data", "signal instance data", or perhaps "an 
observation". Loosely, we could call this a signal, leaving implicit 
that it's a signal about nytimes.com.

3. Somewhere between these two, we might define the creation date of a 
domain as the isodate timestamp of the moment the domain registrar 
originally recorded as the creation of the domain, when it was most 
recently created. Or something like that. Trying to be precise. Call 
this a signal definition, maybe, or a signal specification. If we want 
interoperability between multiple systems producing and consuming 
credibility data, I think we need to get to this level. In practice, for 
age of domain, we'd want this to line up with the available data 
sources. No point in specifying something we can't have.

4. Finally, there is some kind of standardized format for this data. In 
JSON, maybe it's something like

{"@context": "https://www.w3.org/ns/cred",
   "domain": "nytimes.com",
   "created": "1994-01-18T05:00:00Z"
}

which might come out in Turtle as

[] cred:domain "nytimes.com"; cred:created "1994-01-18T05:00:00Z"^^xsd:dateTimeStamp.


Alternatively, one might define the RDF mapping so we get something like:

<https://nytimes.com> cred:domainCreated "1994-01-18T05:00:00Z"^^xsd:dateTimeStamp

although to me that feels like bad modeling.

I lean towards something like:

[] cred:domain "nytimes.com"; cred:created "1994-01-18T05:00:00Z"^^xsd:dateTimeStamp;
    a cred:Observation, a cred:DomainAgeObservation.


So, that last bit brings us back to sense 1.  I'm suggesting the object 
which connects a domain name and its creation time is a kind of 
observation, a domain name observation. We can also attach to it who did 
this observation, and when, and how. Then we can say "credibility 
signal" is a class of observations relevant to credibility assessments. 
The observations can then be encoded in data formats like Turtle, or 
CSV, or whatever.

I haven't thought through exactly how the observations relate to 
Verifiable Credentials. VCs go much deeper into a narrower use case. 
There may also be some impedance mismatch with VC's crypto-centric view 
of the world. I'm reminded of the Knuth quote, "Beware of bugs in the 
above code; I have only proved it correct, not tried it." That is, 
crypto wont help if the institutions behind it are corrupt. CredWeb is 
aiming at a social notion of trustworthiness. (I'm by no means 
dismissing crypto, which is incredibly useful at fending off a whole 
range of threats. There are others it's not so useful against, however.)

Hoping this made things more clear, not less,

      -- Sandro

> Leonard
>
> *From: *Greg Mcverry <jgregmcverry@gmail.com>
> *Date: *Monday, January 20, 2020 at 2:08 PM
> *To: *Leonard Rosenthol <lrosenth@adobe.com>
> *Cc: *Sandro Hawke <sandro@w3.org>, Credible Web CG 
> <public-credibility@w3.org>
> *Subject: *Re: CredWeb Plans, meeting tomorrow
>
> I believe, though class schedules kept me from meetings last semester 
> and I am not a developer, that "signal" has a long history in 
> credibility research as a "marker" to the consumer to indicate if a 
> source is credible or not.
>
> So the RDF would be a bit of machine readable data as a vocabulary of 
> of traditional human readable and disagreeable signals of credibility. 
> For example you may have a currency "signal" that equates to an RDF 
> vocabulary for publication date.
>
> Please others correct any misconceptions I may have.
>
> Should be there tomorrow, first day of semester, never know what 
> random meetings pop up.
>
> On Mon, Jan 20, 2020 at 12:32 PM Leonard Rosenthol <lrosenth@adobe.com 
> <mailto:lrosenth@adobe.com>> wrote:
>
>     I apologize in advance if this is explained elsewhere – but I
>     don’t understand the difference you are making between a “signal”
>     and the “data format” that an API would use (or might be embedded
>     in an asset).
>
>     I realize that I am coming at this from the side of assets (image,
>     audio, video, documents) as opposed to web pages – but to me they
>     are one and the same.
>
>     Thanks,
>
>     Leonard
>
>     *From: *Sandro Hawke <sandro@w3.org <mailto:sandro@w3.org>>
>     *Date: *Monday, January 20, 2020 at 11:31 AM
>     *To: *Credible Web CG <public-credibility@w3.org
>     <mailto:public-credibility@w3.org>>
>     *Subject: *CredWeb Plans, meeting tomorrow
>     *Resent-From: *<public-credibility@w3.org
>     <mailto:public-credibility@w3.org>>
>     *Resent-Date: *Monday, January 20, 2020 at 11:30 AM
>
>     Hey folks,
>
>     It's a new year, and we've had some quiet weeks.  I'm trying to
>     settle on some next steps for the group. Here's what I'm thinking:
>
>     1. Let's not try to update the report right now. Let's just
>     convert it to a "final report", to make it properly archival, with
>     a clear note that it was written in 2018. Maybe a short name like
>     "Credibility Tech 2018
>     <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Freport%2F20181011&data=02%7C01%7Clrosenth%40adobe.com%7C0a471e9941494156069a08d79ddc2512%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151441150502645&sdata=W%2BgOWK2R%2FkkXlJHduxo9jmwGSR1hZLIrVhf6omT6SoY%3D&reserved=0>".
>     If there's sufficient interest in a revision or new reports that
>     are more focused later, that's fine, but I don't think it's the
>     best use of group time right now.
>
>     2. Instead of Credibility Signals
>     <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Fsignals-beta%2F&data=02%7C01%7Clrosenth%40adobe.com%7C0a471e9941494156069a08d79ddc2512%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151441150507639&sdata=wcUE2lNTYP6asrTRtJWGYV93pRZ6rt%2BNG1d2Ti%2BVGEE%3D&reserved=0>
>     trying to include everything about signals while also highlighting
>     the good stuff, let's split it into three different resources:
>
>     * *Credibility APIs*, a technical guide for how computers should
>     talk to other computers to exchange credibility data. Included
>     data formats, protocols, RESTful APIs, browser APIs, etc. Not a
>     spec for any of these, but an overview of options that are
>     specified elsewhere. I'm thinking we can publish a small draft and
>     start to gather input.
>
>     * A *Credibility Data Exchange*, a website for exploring all the
>     signal definitions and signal instance data people are willing to
>     make public, with clear attribution back to the sources and no
>     endorsement from us. I've made a few prototypes over the years
>     (like https://data.credweb.org
>     <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdata.credweb.org%2F&data=02%7C01%7Clrosenth%40adobe.com%7C0a471e9941494156069a08d79ddc2512%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151441150512626&sdata=b79zN74uZ5jMPcD%2B3ZHps7C%2B6tTWTpOP1ptmESlO4rk%3D&reserved=0>)
>     but none I was happy with, yet. Maybe this should just be my
>     thing, not the group's; that's topic for discussion. (It might
>     help if someone wanted to fund this.)
>
>     * *Endorsed Credibility Signals*.  This would be a relatively
>     small document, describing 5-20 signals where we have consensus
>     within the group that they are pretty good. I'd expect it to
>     change over time with new data. The RDF schema for these signals
>     would be published on w3.org
>     <https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fw3.org%2F&data=02%7C01%7Clrosenth%40adobe.com%7C0a471e9941494156069a08d79ddc2512%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151441150522610&sdata=kPoGUO5rJqe%2FF6ZN0vMj%2FIhskDbZN7ptNH1RLrbMmmY%3D&reserved=0>.
>     It would intentionally be kept small enough to be manageable,
>     unlike the Exchange as past "Signals" drafts. I think some of the
>     NewsQ highlight signals
>     <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Fsignals-beta%2F%23newsq-highlight&data=02%7C01%7Clrosenth%40adobe.com%7C0a471e9941494156069a08d79ddc2512%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151441150527598&sdata=82j1qnsYiiOZbp4u4Jpkkl9SRLVTEDcF%2BejDMjuU5Io%3D&reserved=0>
>     are good options here, and there are also some that are doable by
>     hand (like these
>     <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1ADJX57-xMHIIHrnzEycFrn4fUGQ63SD8hyEHqScYnTY%2Fedit&data=02%7C01%7Clrosenth%40adobe.com%7C0a471e9941494156069a08d79ddc2512%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151441150532589&sdata=k7Q%2BmGiFbUhdYTxhus6Y5iWw%2F55sBXorO%2BTCSPK3dtc%3D&reserved=0>).
>
>     So, agenda for tomorrow is to talk about this plan, and if there's
>     time, talk about the actual signals we might be ready to endorse.
>
>     If you can't make it to the meeting and have thoughts on all this,
>     email could be helpful.
>
>     Meeting, as usual: 21
>     <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.timeanddate.com%2Fworldclock%2Ffixedtime.html%3Fmsg%3DCredWeb%26iso%3D20200121T13%26p1%3D43%26ah%3D1&data=02%7C01%7Clrosenth%40adobe.com%7C0a471e9941494156069a08d79ddc2512%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151441150537576&sdata=dhAWiSjuOh%2FcjO404NFL7dsew5QWpSn9ilqoIDqhFsM%3D&reserved=0>
>
>     January 2020 1pm ET
>     <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.timeanddate.com%2Fworldclock%2Ffixedtime.html%3Fmsg%3DCredWeb%26iso%3D20200121T13%26p1%3D43%26ah%3D1&data=02%7C01%7Clrosenth%40adobe.com%7C0a471e9941494156069a08d79ddc2512%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151441150542575&sdata=F4f5rLAooPbirj1%2FTxqraPL873FoEZ%2FNC8jW%2FKA8sFw%3D&reserved=0>,
>
>     https://zoom.us/j/706868147
>     <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fj%2F706868147&data=02%7C01%7Clrosenth%40adobe.com%7C0a471e9941494156069a08d79ddc2512%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151441150547561&sdata=SuVRiLpKut9m3kN1LK5axRAYcUh8dtwEU1w%2FKbyEgJ0%3D&reserved=0>,
>     agenda/record
>     <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1Zegy2ASbsRtkz8vNVYUXHopZjjXbZweJ5Co8TEW_8w0%2Fedit%23&data=02%7C01%7Clrosenth%40adobe.com%7C0a471e9941494156069a08d79ddc2512%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151441150552558&sdata=I%2BtD8u%2BrbV0IM6wmAVvsxxHAjDmwKzMIpMrxE5j6eYw%3D&reserved=0>
>
>          -- Sandro
>
>
>
> -- 
>
> J. Gregory McVerry, PhD
> Assistant Professor
> Southern Connecticut State University
> twitter: jgmac1106
>
>
Received on Monday, 20 January 2020 20:44:23 UTC