Re: what do we mean by "signal", was Re: CredWeb Plans, meeting tomorrow from Leonard Rosenthol on 2020-01-21 (public-credibility@w3.org from January 2020)

From: Leonard Rosenthol <lrosenth@adobe.com>
Date: Tue, 21 Jan 2020 00:04:43 +0000
To: Sandro Hawke <sandro@w3.org>, Greg Mcverry <jgregmcverry@gmail.com>
CC: Credible Web CG <public-credibility@w3.org>
Message-ID: <7FCC12EC-61D1-4978-A68B-671E3059DA68@adobe.com>
> I'm not quite sure what it means for a digital asset to be trustworthy
>
Well, as part of the Content Authenticity Initiative that Adobe, Twitter and the NYTimes announced, we're trying to figure that out.  Although we're more around "authenticity" than trustworthiness right now - since trust introduces a whole host of things that we're not (yet) ready to introduce...


> What comes to mind for me is digital cameras attaching time and location and camera ID information to photos 
>in a way that's potentially quite hard to forge (and hopefully also doesn't reveal private information).
>
This concept - called Secure Capture - is definitely the starting point...but it's just that.  All aspects of the flow of the image from that camera to the final publication (be it on a web page, social media feed, etc.) need to be considered (tracked/managed/etc.) in order for someone to know whether it is "credible"...

I got involved here to see how we can align our efforts with those of this group (and others like RWOT) which are looking at other parts of the larger ecosystem necessary.

Leonard


On 1/20/20, 6:30 PM, "Sandro Hawke" <sandro@w3.org> wrote:

    
    
    On January 20, 2020 4:45:40 PM EST, Leonard Rosenthol <lrosenth@adobe.com> wrote:
    >> CredWeb is aiming at a social notion of trustworthiness
    >>
    >Interesting.  While that might work for web sites/pages – it won’t be
    >for the assets contained on those pages (or found elsewhere on the
    >Web).
    >
    >Is that direction locked in stone for CredWeb?   Would the group
    >consider expanding its definition of “Credible” and “Web”?
    >
    
    We don't have a formally approved charter,  so nothing is locked in stone. 
    
    Can you give me a concrete example of the kind of functionality you're looking for?  I'm not quite sure what it means for a digital asset to be trustworthy. 
    
    What comes to mind for me is digital cameras attaching time and location and camera ID information to photos in a way that's potentially quite hard to forge (and hopefully also doesn't reveal private information). Is that the kind of thing you're talking about? 
    
         - Sandro
    
    
    >
    >Concerning VC – I think if you forget about the specifics, the idea
    >that your claim/signal/whatever *must be* “integrity protected” is key.
    >This is especially true if/when it is separate from the people (and
    >their trust/reputations) involved.
    >
    >Leonard
    >
    >From: Sandro Hawke <sandro@w3.org>
    >Date: Monday, January 20, 2020 at 3:44 PM
    >To: Leonard Rosenthol <lrosenth@adobe.com>, Greg Mcverry
    ><jgregmcverry@gmail.com>
    >Cc: Credible Web CG <public-credibility@w3.org>
    >Subject: what do we mean by "signal", was Re: CredWeb Plans, meeting
    >tomorrow
    >
    >Mostly we've just been using the email list for announcements, but in
    >the interest of getting more discussion going, I'm going to offer a
    >substantive reply, below.  Hopefully the Subject line above lets you
    >know if you want to read this.
    >
    >On 1/20/20 2:32 PM, Leonard Rosenthol wrote:
    >
    >> that "signal" has a long history in credibility research as a
    >"marker" to the consumer to indicate if a source is credible or not.
    >>
    >So that sounds like an alternative term for what I have been calling a
    >“claim” (based on the terminology from the Verifiable Claims (now
    >Credentials) WG at the W3C -
    >https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2F2017%2Fvc%2FWG%2F&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285842582&amp;sdata=VTgYhsPyK02ZbDAD57tcbxHSctCe%2BOxJtkvSHFCHGOw%3D&amp;reserved=0<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2F2017%2Fvc%2FWG%2F&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285842582&amp;sdata=VTgYhsPyK02ZbDAD57tcbxHSctCe%2BOxJtkvSHFCHGOw%3D&amp;reserved=0>).
    > Yes?
    >
    >Does a “signal” use any sort of technology to ensure authenticity (eg.
    >hashes or signatures)?  Or is that out of scope or TBD??
    >
    >I think the term "signal" is used somewhat loosely, a bit like
    >"information".  Here's draft text from Credibility
    >Signals:<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Fsignals%2F%23h.94xsck7qz3ho&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285842582&amp;sdata=RftoQ39hI3yPmPVNFqeU%2B%2FDWeb4Tv67W71PYlUsX7CQ%3D&amp;reserved=0>
    >Our basic model is that an entity (human and/or machine) is attempting
    >to make a credibility assessment — to predict whether something will
    >mislead them or others — by carefully examining many different
    >observable features of that thing and things connected with it, as well
    >as information provided by various related or trusted sources.
    >
    >To simplify and unify this complex situation, with its many different
    >roles, we model the situation as a set of observers, each using
    >imperfect instruments to learn about the situation and then recording
    >their observations using simple declarative statements agreed upon in
    >advance. Because those statements are inputs to a credibility
    >assessment process, we call them credibility signals.  (The term
    >credibility indicators is sometimes also used.)
    >That's just draft text, and I don't know that everyone agrees with it.
    >It also intentionally glosses over several levels of meaning that could
    >have more precise terms.  Some of these levels will speak more to
    >different kinds of people than others (#2 uses TCP, #4 uses RDF). For
    >example, consider:
    >
    >1. The age of an internet domain can related to its credibility. So we
    >can talk about this being a signal, maybe called "age of domain". This
    >kind of information, this "signal", is used informally in practice now,
    >and perhaps in some automated systems. It's a signal that's moderately
    >useful because it stops someone from trivially setting up hundreds of
    >websites, but it's also not all that useful because (a) attackers can
    >buy old domains on the aftermarket or buy them in advance and let them
    >sit before using them, and (b) it could penalize legitimate sources. In
    >this sense, a "signal" is general concept of a type of information.
    >
    >2. There are some protocols for finding out the age of a domain.  For
    >instance, I can open a TCP connection to port 43 of
    >whois.verisign-grs.com, send "nytimes.com<CRLF>" and get back text that
    >includes the line "   Creation Date: 1994-01-18T05:00:00Z". We might
    >call this "credibility data", "signal instance data", or perhaps "an
    >observation". Loosely, we could call this a signal, leaving implicit
    >that it's a signal about nytimes.com.
    >
    >3. Somewhere between these two, we might define the creation date of a
    >domain as the isodate timestamp of the moment the domain registrar
    >originally recorded as the creation of the domain, when it was most
    >recently created. Or something like that. Trying to be precise. Call
    >this a signal definition, maybe, or a signal specification. If we want
    >interoperability between multiple systems producing and consuming
    >credibility data, I think we need to get to this level. In practice,
    >for age of domain, we'd want this to line up with the available data
    >sources. No point in specifying something we can't have.
    >
    >4. Finally, there is some kind of standardized format for this data. In
    >JSON, maybe it's something like
    >
    >{"@context":
    >"https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2Fns%2Fcred&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285842582&amp;sdata=HpBC1la2sTpDJ7Yi366s1aDYH7NVcWgsXcjrWygG90s%3D&amp;reserved=0"<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2Fns%2Fcred&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285842582&amp;sdata=HpBC1la2sTpDJ7Yi366s1aDYH7NVcWgsXcjrWygG90s%3D&amp;reserved=0>,
    >
    >  "domain": "nytimes.com",
    >
    >  "created": "1994-01-18T05:00:00Z"
    >
    >}
    >which might come out in Turtle as
    >
    >[] cred:domain "nytimes.com"; cred:created
    >"1994-01-18T05:00:00Z"^^xsd:dateTimeStamp.
    >
    >Alternatively, one might define the RDF mapping so we get something
    >like:
    >
    ><https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnytimes.com&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285842582&amp;sdata=SxD0VYrJ5hZ7pDmXv%2B268aPQPd3TvUnGy5eGkWoQ7ds%3D&amp;reserved=0><https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnytimes.com%2F&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285842582&amp;sdata=SrSnWvjg%2BIuM9hj3YN5NmorUNlhfMCWATfx8POvaONg%3D&amp;reserved=0>
    >cred:domainCreated "1994-01-18T05:00:00Z"^^xsd:dateTimeStamp
    >although to me that feels like bad modeling.
    >
    >I lean towards something like:
    >
    >[] cred:domain "nytimes.com"; cred:created
    >"1994-01-18T05:00:00Z"^^xsd:dateTimeStamp;
    >
    >   a cred:Observation, a cred:DomainAgeObservation.
    >
    >So, that last bit brings us back to sense 1.  I'm suggesting the object
    >which connects a domain name and its creation time is a kind of
    >observation, a domain name observation. We can also attach to it who
    >did this observation, and when, and how. Then we can say "credibility
    >signal" is a class of observations relevant to credibility assessments.
    >The observations can then be encoded in data formats like Turtle, or
    >CSV, or whatever.
    >
    >I haven't thought through exactly how the observations relate to
    >Verifiable Credentials. VCs go much deeper into a narrower use case.
    >There may also be some impedance mismatch with VC's crypto-centric view
    >of the world. I'm reminded of the Knuth quote, "Beware of bugs in the
    >above code; I have only proved it correct, not tried it." That is,
    >crypto wont help if the institutions behind it are corrupt. CredWeb is
    >aiming at a social notion of trustworthiness. (I'm by no means
    >dismissing crypto, which is incredibly useful at fending off a whole
    >range of threats. There are others it's not so useful against,
    >however.)
    >
    >Hoping this made things more clear, not less,
    >
    >     -- Sandro
    >
    >
    >Leonard
    >
    >From: Greg Mcverry
    ><jgregmcverry@gmail.com><mailto:jgregmcverry@gmail.com>
    >Date: Monday, January 20, 2020 at 2:08 PM
    >To: Leonard Rosenthol <lrosenth@adobe.com><mailto:lrosenth@adobe.com>
    >Cc: Sandro Hawke <sandro@w3.org><mailto:sandro@w3.org>, Credible Web CG
    ><public-credibility@w3.org><mailto:public-credibility@w3.org>
    >Subject: Re: CredWeb Plans, meeting tomorrow
    >
    >I believe, though class schedules kept me from meetings last semester
    >and I am not a developer, that "signal" has a long history in
    >credibility research as a "marker" to the consumer to indicate if a
    >source is credible or not.
    >
    >So the RDF would be a bit of machine readable data as a vocabulary of
    >of traditional human readable and disagreeable signals of credibility.
    >For example you may have a currency "signal" that equates to an RDF
    >vocabulary for publication date.
    >
    >Please others correct any misconceptions I may have.
    >
    >Should be there tomorrow, first day of semester, never know what random
    >meetings pop up.
    >
    >
    >On Mon, Jan 20, 2020 at 12:32 PM Leonard Rosenthol
    ><lrosenth@adobe.com<mailto:lrosenth@adobe.com>> wrote:
    >I apologize in advance if this is explained elsewhere – but I don’t
    >understand the difference you are making between a “signal” and the
    >“data format” that an API would use (or might be embedded in an asset).
    >
    >I realize that I am coming at this from the side of assets (image,
    >audio, video, documents) as opposed to web pages – but to me they are
    >one and the same.
    >
    >Thanks,
    >Leonard
    >
    >From: Sandro Hawke <sandro@w3.org<mailto:sandro@w3.org>>
    >Date: Monday, January 20, 2020 at 11:31 AM
    >To: Credible Web CG
    ><public-credibility@w3.org<mailto:public-credibility@w3.org>>
    >Subject: CredWeb Plans, meeting tomorrow
    >Resent-From:
    ><public-credibility@w3.org<mailto:public-credibility@w3.org>>
    >Resent-Date: Monday, January 20, 2020 at 11:30 AM
    >
    >Hey folks,
    >
    >It's a new year, and we've had some quiet weeks.  I'm trying to settle
    >on some next steps for the group. Here's what I'm thinking:
    >
    >1. Let's not try to update the report right now. Let's just convert it
    >to a "final report", to make it properly archival, with a clear note
    >that it was written in 2018. Maybe a short name like "Credibility Tech
    >2018<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Freport%2F20181011&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285842582&amp;sdata=fAMxqGIAz3aGGOgNafBkhXFzEJ3C8OZUIIfu7jTPi8A%3D&amp;reserved=0>".
    >If there's sufficient interest in a revision or new reports that are
    >more focused later, that's fine, but I don't think it's the best use of
    >group time right now.
    >
    >2. Instead of Credibility
    >Signals<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Fsignals-beta%2F&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285842582&amp;sdata=N%2FtUDFjTK%2BCEPIs6LnjVwIfCSw9LBRmkihjOHup2H1s%3D&amp;reserved=0>
    >trying to include everything about signals while also highlighting the
    >good stuff, let's split it into three different resources:
    >
    >* Credibility APIs, a technical guide for how computers should talk to
    >other computers to exchange credibility data. Included data formats,
    >protocols, RESTful APIs, browser APIs, etc. Not a spec for any of
    >these, but an overview of options that are specified elsewhere. I'm
    >thinking we can publish a small draft and start to gather input.
    >
    >* A Credibility Data Exchange, a website for exploring all the signal
    >definitions and signal instance data people are willing to make public,
    >with clear attribution back to the sources and no endorsement from us.
    >I've made a few prototypes over the years (like
    >https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdata.credweb.org&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285842582&amp;sdata=HQnGIiUr5viFqAXlSvNH7WJ1hnodTmP932LZlLIrqfA%3D&amp;reserved=0<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdata.credweb.org%2F&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285842582&amp;sdata=GbXfe1ary%2FefWSQ3W35fz4CXC2buAiwMrPU7KUr31bY%3D&amp;reserved=0>)
    >but none I was happy with, yet. Maybe this should just be my thing, not
    >the group's; that's topic for discussion. (It might help if someone
    >wanted to fund this.)
    >
    >* Endorsed Credibility Signals.  This would be a relatively small
    >document, describing 5-20 signals where we have consensus within the
    >group that they are pretty good. I'd expect it to change over time with
    >new data. The RDF schema for these signals would be published on
    >w3.org<https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fw3.org%2F&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285842582&amp;sdata=9%2Fc2ujQSxCwlySPdHi7QDkUcg2ebMdpF8th1iDrrZtA%3D&amp;reserved=0>.
    >It would intentionally be kept small enough to be manageable, unlike
    >the Exchange as past "Signals" drafts. I think some of the NewsQ
    >highlight
    >signals<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcredweb.org%2Fsignals-beta%2F%23newsq-highlight&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285852540&amp;sdata=T0KXVbFzp0vdzfA35v1GQCLebMU1CrOiyKvIVLdSWFI%3D&amp;reserved=0>
    >are good options here, and there are also some that are doable by hand
    >(like
    >these<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1ADJX57-xMHIIHrnzEycFrn4fUGQ63SD8hyEHqScYnTY%2Fedit&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285852540&amp;sdata=TsP7lnpng0SejzsIvApSeMLaU6WXqk6x6dW5dDVbWyA%3D&amp;reserved=0>).
    >
    >So, agenda for tomorrow is to talk about this plan, and if there's
    >time, talk about the actual signals we might be ready to endorse.
    >
    >If you can't make it to the meeting and have thoughts on all this,
    >email could be helpful.
    >
    >Meeting, as usual:
    >21<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.timeanddate.com%2Fworldclock%2Ffixedtime.html%3Fmsg%3DCredWeb%26iso%3D20200121T13%26p1%3D43%26ah%3D1&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285852540&amp;sdata=6ul9ULzL9LSk4PqGhUu9W1ZkOROQumkHQlPpzNWvV2c%3D&amp;reserved=0>
    >January 2020 1pm
    >ET<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.timeanddate.com%2Fworldclock%2Ffixedtime.html%3Fmsg%3DCredWeb%26iso%3D20200121T13%26p1%3D43%26ah%3D1&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285852540&amp;sdata=6ul9ULzL9LSk4PqGhUu9W1ZkOROQumkHQlPpzNWvV2c%3D&amp;reserved=0>,
    >https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fj%2F706868147&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285852540&amp;sdata=0q7WmAUlCBW3UAzJKzYaehJrikh0z%2BpCCgAUYhocsQQ%3D&amp;reserved=0<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fzoom.us%2Fj%2F706868147&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285852540&amp;sdata=0q7WmAUlCBW3UAzJKzYaehJrikh0z%2BpCCgAUYhocsQQ%3D&amp;reserved=0>,
    >agenda/record<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1Zegy2ASbsRtkz8vNVYUXHopZjjXbZweJ5Co8TEW_8w0%2Fedit%23&amp;data=02%7C01%7Clrosenth%40adobe.com%7C443989f3d19948d5220908d79e00bb0d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637151598285852540&amp;sdata=wRtZijcoiV0W%2BslILGQroQ2%2F67W8ontElXQpZ7X6I%2Bo%3D&amp;reserved=0>
    >
    >     -- Sandro
    >
    >
    >--
    >J. Gregory McVerry, PhD
    >Assistant Professor
    >Southern Connecticut State University
    >twitter: jgmac1106
Received on Tuesday, 21 January 2020 00:04:50 UTC