Re: what do we mean by "signal", was Re: CredWeb Plans, meeting tomorrow from Sandro Hawke on 2020-01-20 (public-credibility@w3.org from January 2020)

From: Sandro Hawke <sandro@w3.org>
Date: Mon, 20 Jan 2020 18:30:23 -0500
To: Leonard Rosenthol <lrosenth@adobe.com>,Greg Mcverry <jgregmcverry@gmail.com>
CC: Credible Web CG <public-credibility@w3.org>
Message-ID: <658572F5-A423-4190-AF26-7E42D7ED194B@w3.org>
On January 20, 2020 4:45:40 PM EST, Leonard Rosenthol <lrosenth@adobe.com> wrote:
>> CredWeb is aiming at a social notion of trustworthiness
>>
>Interesting.  While that might work for web sites/pages – it won’t be
>for the assets contained on those pages (or found elsewhere on the
>Web).
>
>Is that direction locked in stone for CredWeb?   Would the group
>consider expanding its definition of “Credible” and “Web”?
>

We don't have a formally approved charter,  so nothing is locked in stone. 

Can you give me a concrete example of the kind of functionality you're looking for?  I'm not quite sure what it means for a digital asset to be trustworthy. 

What comes to mind for me is digital cameras attaching time and location and camera ID information to photos in a way that's potentially quite hard to forge (and hopefully also doesn't reveal private information). Is that the kind of thing you're talking about? 

     - Sandro


>
>Concerning VC – I think if you forget about the specifics, the idea
>that your claim/signal/whatever *must be* “integrity protected” is key.
>This is especially true if/when it is separate from the people (and
>their trust/reputations) involved.
>
>Leonard
>
>From: Sandro Hawke <sandro@w3.org>
>Date: Monday, January 20, 2020 at 3:44 PM
>To: Leonard Rosenthol <lrosenth@adobe.com>, Greg Mcverry
><jgregmcverry@gmail.com>
>Cc: Credible Web CG <public-credibility@w3.org>
>Subject: what do we mean by "signal", was Re: CredWeb Plans, meeting
>tomorrow
>
>Mostly we've just been using the email list for announcements, but in
>the interest of getting more discussion going, I'm going to offer a
>substantive reply, below.  Hopefully the Subject line above lets you
>know if you want to read this.
>
>On 1/20/20 2:32 PM, Leonard Rosenthol wrote:
>
>> that "signal" has a long history in credibility research as a
>"marker" to the consumer to indicate if a source is credible or not.
>>
>So that sounds like an alternative term for what I have been calling a
>“claim” (based on the terminology from the Verifiable Claims (now
>Credentials) WG at the W3C -
>https://www.w3.org/2017/vc/WG/<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2F2017%2Fvc%2FWG%2F&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628061745&sdata=6OB29t8iZOP4lbjFUGeU6lwgyXqAXOxsaj7akL%2Frb%2Bk%3D&reserved=0>).
> Yes?
>
>Does a “signal” use any sort of technology to ensure authenticity (eg.
>hashes or signatures)?  Or is that out of scope or TBD??
>
>I think the term "signal" is used somewhat loosely, a bit like
>"information".  Here's draft text from Credibility
>Signals:<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Fsignals%2F%23h.94xsck7qz3ho&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628071751&sdata=thFM%2BN4FillMl2mcVHoLDHfwonSiD8HJJ8ODtVjwIS8%3D&reserved=0>
>Our basic model is that an entity (human and/or machine) is attempting
>to make a credibility assessment — to predict whether something will
>mislead them or others — by carefully examining many different
>observable features of that thing and things connected with it, as well
>as information provided by various related or trusted sources.
>
>To simplify and unify this complex situation, with its many different
>roles, we model the situation as a set of observers, each using
>imperfect instruments to learn about the situation and then recording
>their observations using simple declarative statements agreed upon in
>advance. Because those statements are inputs to a credibility
>assessment process, we call them credibility signals.  (The term
>credibility indicators is sometimes also used.)
>That's just draft text, and I don't know that everyone agrees with it.
>It also intentionally glosses over several levels of meaning that could
>have more precise terms.  Some of these levels will speak more to
>different kinds of people than others (#2 uses TCP, #4 uses RDF). For
>example, consider:
>
>1. The age of an internet domain can related to its credibility. So we
>can talk about this being a signal, maybe called "age of domain". This
>kind of information, this "signal", is used informally in practice now,
>and perhaps in some automated systems. It's a signal that's moderately
>useful because it stops someone from trivially setting up hundreds of
>websites, but it's also not all that useful because (a) attackers can
>buy old domains on the aftermarket or buy them in advance and let them
>sit before using them, and (b) it could penalize legitimate sources. In
>this sense, a "signal" is general concept of a type of information.
>
>2. There are some protocols for finding out the age of a domain.  For
>instance, I can open a TCP connection to port 43 of
>whois.verisign-grs.com, send "nytimes.com<CRLF>" and get back text that
>includes the line "   Creation Date: 1994-01-18T05:00:00Z". We might
>call this "credibility data", "signal instance data", or perhaps "an
>observation". Loosely, we could call this a signal, leaving implicit
>that it's a signal about nytimes.com.
>
>3. Somewhere between these two, we might define the creation date of a
>domain as the isodate timestamp of the moment the domain registrar
>originally recorded as the creation of the domain, when it was most
>recently created. Or something like that. Trying to be precise. Call
>this a signal definition, maybe, or a signal specification. If we want
>interoperability between multiple systems producing and consuming
>credibility data, I think we need to get to this level. In practice,
>for age of domain, we'd want this to line up with the available data
>sources. No point in specifying something we can't have.
>
>4. Finally, there is some kind of standardized format for this data. In
>JSON, maybe it's something like
>
>{"@context":
>"https://www.w3.org/ns/cred"<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2Fns%2Fcred&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628071751&sdata=XmpfoIzjDNbcWs8m2fKcKx9Tbf0N%2BuUleQvoh8WPU4U%3D&reserved=0>,
>
>  "domain": "nytimes.com",
>
>  "created": "1994-01-18T05:00:00Z"
>
>}
>which might come out in Turtle as
>
>[] cred:domain "nytimes.com"; cred:created
>"1994-01-18T05:00:00Z"^^xsd:dateTimeStamp.
>
>Alternatively, one might define the RDF mapping so we get something
>like:
>
><https://nytimes.com><https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnytimes.com%2F&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628081734&sdata=%2FRKjr2onvey53G0mgZmr%2BU1UHdnAkT2QNmqOBve2kow%3D&reserved=0>
>cred:domainCreated "1994-01-18T05:00:00Z"^^xsd:dateTimeStamp
>although to me that feels like bad modeling.
>
>I lean towards something like:
>
>[] cred:domain "nytimes.com"; cred:created
>"1994-01-18T05:00:00Z"^^xsd:dateTimeStamp;
>
>   a cred:Observation, a cred:DomainAgeObservation.
>
>So, that last bit brings us back to sense 1.  I'm suggesting the object
>which connects a domain name and its creation time is a kind of
>observation, a domain name observation. We can also attach to it who
>did this observation, and when, and how. Then we can say "credibility
>signal" is a class of observations relevant to credibility assessments.
>The observations can then be encoded in data formats like Turtle, or
>CSV, or whatever.
>
>I haven't thought through exactly how the observations relate to
>Verifiable Credentials. VCs go much deeper into a narrower use case.
>There may also be some impedance mismatch with VC's crypto-centric view
>of the world. I'm reminded of the Knuth quote, "Beware of bugs in the
>above code; I have only proved it correct, not tried it." That is,
>crypto wont help if the institutions behind it are corrupt. CredWeb is
>aiming at a social notion of trustworthiness. (I'm by no means
>dismissing crypto, which is incredibly useful at fending off a whole
>range of threats. There are others it's not so useful against,
>however.)
>
>Hoping this made things more clear, not less,
>
>     -- Sandro
>
>
>Leonard
>
>From: Greg Mcverry
><jgregmcverry@gmail.com><mailto:jgregmcverry@gmail.com>
>Date: Monday, January 20, 2020 at 2:08 PM
>To: Leonard Rosenthol <lrosenth@adobe.com><mailto:lrosenth@adobe.com>
>Cc: Sandro Hawke <sandro@w3.org><mailto:sandro@w3.org>, Credible Web CG
><public-credibility@w3.org><mailto:public-credibility@w3.org>
>Subject: Re: CredWeb Plans, meeting tomorrow
>
>I believe, though class schedules kept me from meetings last semester
>and I am not a developer, that "signal" has a long history in
>credibility research as a "marker" to the consumer to indicate if a
>source is credible or not.
>
>So the RDF would be a bit of machine readable data as a vocabulary of
>of traditional human readable and disagreeable signals of credibility.
>For example you may have a currency "signal" that equates to an RDF
>vocabulary for publication date.
>
>Please others correct any misconceptions I may have.
>
>Should be there tomorrow, first day of semester, never know what random
>meetings pop up.
>
>
>On Mon, Jan 20, 2020 at 12:32 PM Leonard Rosenthol
><lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
>I apologize in advance if this is explained elsewhere – but I don’t
>understand the difference you are making between a “signal” and the
>“data format” that an API would use (or might be embedded in an asset).
>
>I realize that I am coming at this from the side of assets (image,
>audio, video, documents) as opposed to web pages – but to me they are
>one and the same.
>
>Thanks,
>Leonard
>
>From: Sandro Hawke <sandro@w3.org<mailto:sandro@w3.org>>
>Date: Monday, January 20, 2020 at 11:31 AM
>To: Credible Web CG
><public-credibility@w3.org<mailto:public-credibility@w3.org>>
>Subject: CredWeb Plans, meeting tomorrow
>Resent-From:
><public-credibility@w3.org<mailto:public-credibility@w3.org>>
>Resent-Date: Monday, January 20, 2020 at 11:30 AM
>
>Hey folks,
>
>It's a new year, and we've had some quiet weeks.  I'm trying to settle
>on some next steps for the group. Here's what I'm thinking:
>
>1. Let's not try to update the report right now. Let's just convert it
>to a "final report", to make it properly archival, with a clear note
>that it was written in 2018. Maybe a short name like "Credibility Tech
>2018<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Freport%2F20181011&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628081734&sdata=bJRVKRcowxKvXgRj1ZzloMyAZPCADiNUmGNdrVI6d0Q%3D&reserved=0>".
>If there's sufficient interest in a revision or new reports that are
>more focused later, that's fine, but I don't think it's the best use of
>group time right now.
>
>2. Instead of Credibility
>Signals<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Fsignals-beta%2F&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628091733&sdata=NBEAC5VTQy45aASuw%2BXeNN8NA%2FXOnB5HEhVmDe8WWpE%3D&reserved=0>
>trying to include everything about signals while also highlighting the
>good stuff, let's split it into three different resources:
>
>* Credibility APIs, a technical guide for how computers should talk to
>other computers to exchange credibility data. Included data formats,
>protocols, RESTful APIs, browser APIs, etc. Not a spec for any of
>these, but an overview of options that are specified elsewhere. I'm
>thinking we can publish a small draft and start to gather input.
>
>* A Credibility Data Exchange, a website for exploring all the signal
>definitions and signal instance data people are willing to make public,
>with clear attribution back to the sources and no endorsement from us.
>I've made a few prototypes over the years (like
>https://data.credweb.org<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdata.credweb.org%2F&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628091733&sdata=sKNv%2BvQwJDaJ3aJagJgW69%2FLHWwGETwKTTA8phSKt9o%3D&reserved=0>)
>but none I was happy with, yet. Maybe this should just be my thing, not
>the group's; that's topic for discussion. (It might help if someone
>wanted to fund this.)
>
>* Endorsed Credibility Signals.  This would be a relatively small
>document, describing 5-20 signals where we have consensus within the
>group that they are pretty good. I'd expect it to change over time with
>new data. The RDF schema for these signals would be published on
>w3.org<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fw3.org%2F&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628101733&sdata=YwAzXUPHvdS4qbwvGADSfeVgJzBKNz3SmEMM9A6bPPo%3D&reserved=0>.
>It would intentionally be kept small enough to be manageable, unlike
>the Exchange as past "Signals" drafts. I think some of the NewsQ
>highlight
>signals<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Fsignals-beta%2F%23newsq-highlight&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628101733&sdata=BetmqRDU3LHcsL5Y5JMh2cjX2whQvhD84a%2FeAQmCBbY%3D&reserved=0>
>are good options here, and there are also some that are doable by hand
>(like
>these<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1ADJX57-xMHIIHrnzEycFrn4fUGQ63SD8hyEHqScYnTY%2Fedit&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628111727&sdata=NjJZSW8tQ2nw1h2T%2B13Wv4JYxL3jrm%2BT5l0jm2UjxAs%3D&reserved=0>).
>
>So, agenda for tomorrow is to talk about this plan, and if there's
>time, talk about the actual signals we might be ready to endorse.
>
>If you can't make it to the meeting and have thoughts on all this,
>email could be helpful.
>
>Meeting, as usual:
>21<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.timeanddate.com%2Fworldclock%2Ffixedtime.html%3Fmsg%3DCredWeb%26iso%3D20200121T13%26p1%3D43%26ah%3D1&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628111727&sdata=i81HaxWGSHLa9v6x7MiniqVzn6zMhuECkc%2BTZ94v524%3D&reserved=0>
>January 2020 1pm
>ET<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.timeanddate.com%2Fworldclock%2Ffixedtime.html%3Fmsg%3DCredWeb%26iso%3D20200121T13%26p1%3D43%26ah%3D1&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628121723&sdata=aGOdMaXIjJRiSjCvWw5dbeQU2Ro1uWeQDh5iFKm%2BBF0%3D&reserved=0>,
>https://zoom.us/j/706868147<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fj%2F706868147&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628121723&sdata=SAnnA6Z2KBqkwA6tV32cvBEpLFOgMc9PgVCoI%2FqBUG8%3D&reserved=0>,
>agenda/record<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1Zegy2ASbsRtkz8vNVYUXHopZjjXbZweJ5Co8TEW_8w0%2Fedit%23&data=02%7C01%7Clrosenth%40adobe.com%7C65afd27a2ab94c69947908d79de9867d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151498628131717&sdata=THJhQEUfsTGfNa0uxnvYBXSo0EymBCfj1HZaVi5TD5I%3D&reserved=0>
>
>     -- Sandro
>
>
>--
>J. Gregory McVerry, PhD
>Assistant Professor
>Southern Connecticut State University
>twitter: jgmac1106
Received on Monday, 20 January 2020 23:30:29 UTC