- From: Sandro Hawke <sandro@w3.org>
- Date: Fri, 7 Feb 2020 10:22:12 -0500
- To: Pat Hayes <phayes@ihmc.us>
- Cc: Credible Web CG <public-credibility@w3.org>
- Message-ID: <977d7417-eeff-c5ce-1846-84176ed28aed@w3.org>
[Moving a gdoc thread to email]
On 2/6/20 12:41 PM, Sandro Hawke wrote:
> Today's initial meeting of the Data Access Task Force was not a great
> success, since no one else showed up. (I'm interested in hearing
> (perhaps privately) why this was. Perhaps just too-short notice or
> otherwise poorly announced. Ten people confirmed that time slot in
> general.)
>
> On the upside, I took the time to write up the issue, and maybe this
> is better done in writing anyway.
>
> I'm starting with what basic graph shape to use to represent the n-ary
> relation inherent in the first signal.
>
> See Options for RDF Expression of “Date Website First Archived”
> <https://docs.google.com/document/d/1f7hWNybjYcSFFRT48LcLdmH3KBBoYUeH-lKJypu6zQw/edit#heading=h.cn7f188h2acs>.
> Comments in email or the doc welcome. I think this might just come
> down to taste, but if there are any actual problems with any of the
> options (beyond zero), it would be good to highlight those before
> making a decision.
>
> -- Sandro
>
On 2/6/20 2:26AM Pay Hayes wrote (in a google docs comment
<https://docs.google.com/document/d/1f7hWNybjYcSFFRT48LcLdmH3KBBoYUeH-lKJypu6zQw/edit?disco=AAAAGMpCFhg>):
> OK, here's my 2c on this.
:-) <scared look>
> First, its not clear that this actually is an example of an N-ary
> relation. It /could/ be described that way, but semantically it is
> more like a single triple plus a comment about it (citing evidence for
> it, in effect).
Indeed, I was simplifying a bit. The references I cited (On Nary
Relations <https://www.w3.org/2004/08/12-Yoshio/onNaryRelations.html>and
Defining N-ary Relations on the Semantic Web
<https://www.w3.org/TR/swbp-n-aryRelations/>) both use the term in their
title but then go on to cover many patterns, like ones about evidence,
which are not n-ary relations in this strict sense. Is there a better
term for the broader problem?
> This matters because there will be cases which are genuinely N>2-ary
> relations, and you might want to not get them confused with this case,
> by keeping distinct encodings from day one.
Perhaps. Or it might be simpler to have a general n-ary-ish model that
works across the sphere. Not sure where the "as simple as possible but
no simpler" line falls here.
> Second, several of your alternatives don't make first base. Shape 4
> fails because asserting the reification doesnt actually assert the
> reified triple, it just says it exists. (See RDF semantics.)
Ah, true, I wasn't seriously proposing that one, so I missed that bit.
That's "easy" to fix by adding another arc, "a cred:Assertion". I've
added a note to that effect in the doc, but not modified the example
(yet). One could also make being-asserted a part of the semantics of
cred:evidence, but I wouldn't do that. I put quotes around "easy"
because of my sense that truth predicates are dangerous.
We probably don't need to talk about this option more, unless someone is
actually advocating for it.
> Shape 1 fails because what does _:a denote? What kind of thing is it?
> I can't find any interpretation that makes sense. It seems to be both
> a date and a 'credibility signal', whatever that is; but it is also
> the subject of operationalAsEarlyAs. Can a date be operational at a date?
[ Aside for the audience: Pat and I have been having this kind of debate
for about 19 years now. As a newbie, I used to find the tone and
strength of his arguments unsettling. Actually, I still do. ]
Sorry, it looks like my class naming turned out to be misleading. I was
assuming the context for people seeing this model would be the planned
new signals document. Here's the current title and abstract of that
document, to give that context:
Community-Approved Credibility Signals
Abstract: Credibility signals are observations, made by humans or
machines, which are used in deciding how much to trust some
information. This document specifies some types of these
observations which seem particularly useful in online credibility
assessments, especially when assisted by machine processing and a
network of people and systems making related observations. It also
includes some guidance on how credibility data (that is, data
expressing these observations) can be exchanged online. The choice
of which signals to include was made by the W3C Credible Web
Community Group and is expected to be revised periodically in light
of new information.
With that in mind, perhaps it's more clear that _:a denotes an
observation, a "credibility signal". Perhaps a signal isn't exactly "an
observation", but is more a record of the observation, or it's the
information obtained by making an observation, but I doubt that level of
semantic detail will be beneficial.
In this example, _:a represents an observation of the date a website is
first archived. So perhaps that class name should be
"cred:ObservationOfDateWebsiteFirstArchived", but in the context of
hundreds of other such classes, all starting "ObservationOf..." I
thought it best to drop that part. But maybe that's bad, since some
people (like you in this case) will be coming in without that context
and somehow thinking that when we call it a date it's a date. :-)
> Shape 2 kind of makes sense except for the relation
> credAbout:operationalAsEarlyAs which is misnamed, since (as you point
> out) the _:b node can have other information attached to it so is not
> particularly connected to anything to do with being as early as. What
> you need is something like credAbout:avatar since _:b is playing the
> role of a surrogate for the main subject here.
Yeah, I was just trying to follow the naming conventions of WikiData.
They view it as annotations on an arc, where they split the arc with
this blank node in the middle, and have the inbound and outbound parts
retain the name of the original arc, just in different namespaces. At
least, that's what I recall. I'm having trouble finding a simple and
clear explanation of their approach. One reference is Reifying RDF: What
Works Well With Wikidata?
<http://aidanhogan.com/docs/reification-wikidata-rdf-sparql.pdf> but
hopefully someone knows something more recent and/or simpler.
> But the one that works best is shape# 3. That captures the intended
> meaning perfectly: it asserts the main statement directly, and adds a
> comment ABOUT it. This correctly keeps data and metadata separated
> without either of them distorting the other or requiring some kind of
> hard-to-remember artifactual encoding. So it has my enthusiastic vote.
It was also my favorite, until I started actually writing software to
use this stuff. In practice, I find that I want to use named graphs for
multiple things, but using them for more than one thing at once is a
problem. Like, alice.example and bob.example are each providing data
streams with these website-date observations. I want to use the graph
name to keep track of what came from Alice and what came from Bob, but
that somewhat conflicts with using it inside the data. How do I record
that a quad came from Alice? I think there are some techniques, but they
seem to get fairly complicated, and they do this in the neighborhood of
a security layer (since I might trust triples from one source more than
from another), which increases some risks. After a while, I found my
preference shifting away from this.
There's also an issue which I'm surprised doesn't bother you: the
semantics of RDF datasets. There are two aspects here: (1) the statement
in the named graph isn't exactly asserted by the dataset; and (2) The
graph name (_:c) does not actually denote the graph, it is merely paired
with it. These are both issues that you and I talked about in the 2011
RDF 1.1 Working Group, but as I recall were never settled. I think the
relevant docs are RDF 1.1 Concepts and Abstract Syntax
<https://www.w3.org/TR/rdf11-concepts/> and RDF 1.1: On Semantics of RDF
Datasets. <https://www.w3.org/TR/2014/NOTE-rdf11-datasets-20140225/>
That said, I don't think those semantic issues are necessarily
real-world problems. I think in practice it's entirely possible to
publish and interact with datasets with whatever semantics we actually
want, with all the potential problems avoided by flagging the dataset in
some metadata as having our chosen semantics. Maybe.
> <End of 2c rant>
Thank you so much for your thoughts on this. I'm curious if I've missed
any of your points, or if I've swayed you at all.
-- Sandro
Received on Friday, 7 February 2020 15:22:17 UTC