Re: what do we mean by "signal", was Re: CredWeb Plans, meeting tomorrow from Leonard Rosenthol on 2020-01-20 (public-credibility@w3.org from January 2020)

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Mon, 20 Jan 2020 21:45:40 +0000
To: Sandro Hawke <sandro@w3.org>, Greg Mcverry <jgregmcverry@gmail.com>
CC: Credible Web CG <public-credibility@w3.org>
Message-ID: <D0465454-077A-447E-A5CC-94CD061CE0C3@adobe.com>
> CredWeb is aiming at a social notion of trustworthiness
>
Interesting.  While that might work for web sites/pages – it won’t be for the assets contained on those pages (or found elsewhere on the Web).

Is that direction locked in stone for CredWeb?   Would the group consider expanding its definition of “Credible” and “Web”?


Concerning VC – I think if you forget about the specifics, the idea that your claim/signal/whatever *must be* “integrity protected” is key.  This is especially true if/when it is separate from the people (and their trust/reputations) involved.

Leonard

From: Sandro Hawke <sandro@w3.org>
Date: Monday, January 20, 2020 at 3:44 PM
To: Leonard Rosenthol <lrosenth@adobe.com>, Greg Mcverry <jgregmcverry@gmail.com>
Cc: Credible Web CG <public-credibility@w3.org>
Subject: what do we mean by "signal", was Re: CredWeb Plans, meeting tomorrow

Mostly we've just been using the email list for announcements, but in the interest of getting more discussion going, I'm going to offer a substantive reply, below.  Hopefully the Subject line above lets you know if you want to read this.

On 1/20/20 2:32 PM, Leonard Rosenthol wrote:

> that "signal" has a long history in credibility research as a "marker" to the consumer to indicate if a source is credible or not.
>
So that sounds like an alternative term for what I have been calling a “claim” (based on the terminology from the Verifiable Claims (now Credentials) WG at the W3C - https://www.w3.org/2017/vc/WG/<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2F2017%2Fvc%2FWG%2F&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628061745&sdata=6OB29t8iZOP4lbjFUGeU6lwgyXqAXOxsaj7akL%2Frb%2Bk%3D&reserved=0>).  Yes?

Does a “signal” use any sort of technology to ensure authenticity (eg. hashes or signatures)?  Or is that out of scope or TBD??

I think the term "signal" is used somewhat loosely, a bit like "information".  Here's draft text from Credibility Signals:<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Fsignals%2F%23h.94xsck7qz3ho&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628071751&sdata=thFM%2BN4FillMl2mcVHoLDHfwonSiD8HJJ8ODtVjwIS8%3D&reserved=0>
Our basic model is that an entity (human and/or machine) is attempting to make a credibility assessment — to predict whether something will mislead them or others — by carefully examining many different observable features of that thing and things connected with it, as well as information provided by various related or trusted sources.

To simplify and unify this complex situation, with its many different roles, we model the situation as a set of observers, each using imperfect instruments to learn about the situation and then recording their observations using simple declarative statements agreed upon in advance. Because those statements are inputs to a credibility assessment process, we call them credibility signals.  (The term credibility indicators is sometimes also used.)
That's just draft text, and I don't know that everyone agrees with it. It also intentionally glosses over several levels of meaning that could have more precise terms.  Some of these levels will speak more to different kinds of people than others (#2 uses TCP, #4 uses RDF). For example, consider:

1. The age of an internet domain can related to its credibility. So we can talk about this being a signal, maybe called "age of domain". This kind of information, this "signal", is used informally in practice now, and perhaps in some automated systems. It's a signal that's moderately useful because it stops someone from trivially setting up hundreds of websites, but it's also not all that useful because (a) attackers can buy old domains on the aftermarket or buy them in advance and let them sit before using them, and (b) it could penalize legitimate sources. In this sense, a "signal" is general concept of a type of information.

2. There are some protocols for finding out the age of a domain.  For instance, I can open a TCP connection to port 43 of whois.verisign-grs.com, send "nytimes.com<CRLF>" and get back text that includes the line "   Creation Date: 1994-01-18T05:00:00Z". We might call this "credibility data", "signal instance data", or perhaps "an observation". Loosely, we could call this a signal, leaving implicit that it's a signal about nytimes.com.

3. Somewhere between these two, we might define the creation date of a domain as the isodate timestamp of the moment the domain registrar originally recorded as the creation of the domain, when it was most recently created. Or something like that. Trying to be precise. Call this a signal definition, maybe, or a signal specification. If we want interoperability between multiple systems producing and consuming credibility data, I think we need to get to this level. In practice, for age of domain, we'd want this to line up with the available data sources. No point in specifying something we can't have.

4. Finally, there is some kind of standardized format for this data. In JSON, maybe it's something like

{"@context": "https://www.w3.org/ns/cred"<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2Fns%2Fcred&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628071751&sdata=XmpfoIzjDNbcWs8m2fKcKx9Tbf0N%2BuUleQvoh8WPU4U%3D&reserved=0>,

  "domain": "nytimes.com",

  "created": "1994-01-18T05:00:00Z"

}
which might come out in Turtle as

[] cred:domain "nytimes.com"; cred:created "1994-01-18T05:00:00Z"^^xsd:dateTimeStamp.

Alternatively, one might define the RDF mapping so we get something like:

<https://nytimes.com><https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnytimes.com%2F&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628081734&sdata=%2FRKjr2onvey53G0mgZmr%2BU1UHdnAkT2QNmqOBve2kow%3D&reserved=0> cred:domainCreated "1994-01-18T05:00:00Z"^^xsd:dateTimeStamp
although to me that feels like bad modeling.

I lean towards something like:

[] cred:domain "nytimes.com"; cred:created "1994-01-18T05:00:00Z"^^xsd:dateTimeStamp;

   a cred:Observation, a cred:DomainAgeObservation.

So, that last bit brings us back to sense 1.  I'm suggesting the object which connects a domain name and its creation time is a kind of observation, a domain name observation. We can also attach to it who did this observation, and when, and how. Then we can say "credibility signal" is a class of observations relevant to credibility assessments. The observations can then be encoded in data formats like Turtle, or CSV, or whatever.

I haven't thought through exactly how the observations relate to Verifiable Credentials. VCs go much deeper into a narrower use case. There may also be some impedance mismatch with VC's crypto-centric view of the world. I'm reminded of the Knuth quote, "Beware of bugs in the above code; I have only proved it correct, not tried it." That is, crypto wont help if the institutions behind it are corrupt. CredWeb is aiming at a social notion of trustworthiness. (I'm by no means dismissing crypto, which is incredibly useful at fending off a whole range of threats. There are others it's not so useful against, however.)

Hoping this made things more clear, not less,

     -- Sandro


Leonard

From: Greg Mcverry <jgregmcverry@gmail.com><mailto:jgregmcverry@gmail.com>
Date: Monday, January 20, 2020 at 2:08 PM
To: Leonard Rosenthol <lrosenth@adobe.com><mailto:lrosenth@adobe.com>
Cc: Sandro Hawke <sandro@w3.org><mailto:sandro@w3.org>, Credible Web CG <public-credibility@w3.org><mailto:public-credibility@w3.org>
Subject: Re: CredWeb Plans, meeting tomorrow

I believe, though class schedules kept me from meetings last semester and I am not a developer, that "signal" has a long history in credibility research as a "marker" to the consumer to indicate if a source is credible or not.

So the RDF would be a bit of machine readable data as a vocabulary of of traditional human readable and disagreeable signals of credibility. For example you may have a currency "signal" that equates to an RDF vocabulary for publication date.

Please others correct any misconceptions I may have.

Should be there tomorrow, first day of semester, never know what random meetings pop up.


On Mon, Jan 20, 2020 at 12:32 PM Leonard Rosenthol <lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
I apologize in advance if this is explained elsewhere – but I don’t understand the difference you are making between a “signal” and the “data format” that an API would use (or might be embedded in an asset).

I realize that I am coming at this from the side of assets (image, audio, video, documents) as opposed to web pages – but to me they are one and the same.

Thanks,
Leonard

From: Sandro Hawke <sandro@w3.org<mailto:sandro@w3.org>>
Date: Monday, January 20, 2020 at 11:31 AM
To: Credible Web CG <public-credibility@w3.org<mailto:public-credibility@w3.org>>
Subject: CredWeb Plans, meeting tomorrow
Resent-From: <public-credibility@w3.org<mailto:public-credibility@w3.org>>
Resent-Date: Monday, January 20, 2020 at 11:30 AM

Hey folks,

It's a new year, and we've had some quiet weeks.  I'm trying to settle on some next steps for the group. Here's what I'm thinking:

1. Let's not try to update the report right now. Let's just convert it to a "final report", to make it properly archival, with a clear note that it was written in 2018. Maybe a short name like "Credibility Tech 2018<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Freport%2F20181011&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628081734&sdata=bJRVKRcowxKvXgRj1ZzloMyAZPCADiNUmGNdrVI6d0Q%3D&reserved=0>". If there's sufficient interest in a revision or new reports that are more focused later, that's fine, but I don't think it's the best use of group time right now.

2. Instead of Credibility Signals<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Fsignals-beta%2F&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628091733&sdata=NBEAC5VTQy45aASuw%2BXeNN8NA%2FXOnB5HEhVmDe8WWpE%3D&reserved=0> trying to include everything about signals while also highlighting the good stuff, let's split it into three different resources:

* Credibility APIs, a technical guide for how computers should talk to other computers to exchange credibility data. Included data formats, protocols, RESTful APIs, browser APIs, etc. Not a spec for any of these, but an overview of options that are specified elsewhere. I'm thinking we can publish a small draft and start to gather input.

* A Credibility Data Exchange, a website for exploring all the signal definitions and signal instance data people are willing to make public, with clear attribution back to the sources and no endorsement from us. I've made a few prototypes over the years (like https://data.credweb.org<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdata.credweb.org%2F&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628091733&sdata=sKNv%2BvQwJDaJ3aJagJgW69%2FLHWwGETwKTTA8phSKt9o%3D&reserved=0>) but none I was happy with, yet. Maybe this should just be my thing, not the group's; that's topic for discussion. (It might help if someone wanted to fund this.)

* Endorsed Credibility Signals.  This would be a relatively small document, describing 5-20 signals where we have consensus within the group that they are pretty good. I'd expect it to change over time with new data. The RDF schema for these signals would be published on w3.org<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fw3.org%2F&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628101733&sdata=YwAzXUPHvdS4qbwvGADSfeVgJzBKNz3SmEMM9A6bPPo%3D&reserved=0>. It would intentionally be kept small enough to be manageable, unlike the Exchange as past "Signals" drafts. I think some of the NewsQ highlight signals<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Fsignals-beta%2F%23newsq-highlight&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628101733&sdata=BetmqRDU3LHcsL5Y5JMh2cjX2whQvhD84a%2FeAQmCBbY%3D&reserved=0> are good options here, and there are also some that are doable by hand (like these<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1ADJX57-xMHIIHrnzEycFrn4fUGQ63SD8hyEHqScYnTY%2Fedit&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628111727&sdata=NjJZSW8tQ2nw1h2T%2B13Wv4JYxL3jrm%2BT5l0jm2UjxAs%3D&reserved=0>).

So, agenda for tomorrow is to talk about this plan, and if there's time, talk about the actual signals we might be ready to endorse.

If you can't make it to the meeting and have thoughts on all this, email could be helpful.

Meeting, as usual: 21<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.timeanddate.com%2Fworldclock%2Ffixedtime.html%3Fmsg%3DCredWeb%26iso%3D20200121T13%26p1%3D43%26ah%3D1&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628111727&sdata=i81HaxWGSHLa9v6x7MiniqVzn6zMhuECkc%2BTZ94v524%3D&reserved=0>
January 2020 1pm ET<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.timeanddate.com%2Fworldclock%2Ffixedtime.html%3Fmsg%3DCredWeb%26iso%3D20200121T13%26p1%3D43%26ah%3D1&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628121723&sdata=aGOdMaXIjJRiSjCvWw5dbeQU2Ro1uWeQDh5iFKm%2BBF0%3D&reserved=0>,
https://zoom.us/j/706868147<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fj%2F706868147&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628121723&sdata=SAnnA6Z2KBqkwA6tV32cvBEpLFOgMc9PgVCoI%2FqBUG8%3D&reserved=0>, agenda/record<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1Zegy2ASbsRtkz8vNVYUXHopZjjXbZweJ5Co8TEW_8w0%2Fedit%23&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628131717&sdata=THJhQEUfsTGfNa0uxnvYBXSo0EymBCfj1HZaVi5TD5I%3D&reserved=0>

     -- Sandro


--
J. Gregory McVerry, PhD
Assistant Professor
Southern Connecticut State University
twitter: jgmac1106
Received on Monday, 20 January 2020 21:45:47 UTC