- From: Sandro Hawke <sandro@w3.org>
- Date: Fri, 7 Feb 2020 10:22:12 -0500
- To: Pat Hayes <phayes@ihmc.us>
- Cc: Credible Web CG <public-credibility@w3.org>
- Message-ID: <977d7417-eeff-c5ce-1846-84176ed28aed@w3.org>
[Moving a gdoc thread to email] On 2/6/20 12:41 PM, Sandro Hawke wrote: > Today's initial meeting of the Data Access Task Force was not a great > success, since no one else showed up. (I'm interested in hearing > (perhaps privately) why this was. Perhaps just too-short notice or > otherwise poorly announced. Ten people confirmed that time slot in > general.) > > On the upside, I took the time to write up the issue, and maybe this > is better done in writing anyway. > > I'm starting with what basic graph shape to use to represent the n-ary > relation inherent in the first signal. > > See Options for RDF Expression of “Date Website First Archived” > <https://docs.google.com/document/d/1f7hWNybjYcSFFRT48LcLdmH3KBBoYUeH-lKJypu6zQw/edit#heading=h.cn7f188h2acs>. > Comments in email or the doc welcome. I think this might just come > down to taste, but if there are any actual problems with any of the > options (beyond zero), it would be good to highlight those before > making a decision. > > -- Sandro > On 2/6/20 2:26AM Pay Hayes wrote (in a google docs comment <https://docs.google.com/document/d/1f7hWNybjYcSFFRT48LcLdmH3KBBoYUeH-lKJypu6zQw/edit?disco=AAAAGMpCFhg>): > OK, here's my 2c on this. :-) <scared look> > First, its not clear that this actually is an example of an N-ary > relation. It /could/ be described that way, but semantically it is > more like a single triple plus a comment about it (citing evidence for > it, in effect). Indeed, I was simplifying a bit. The references I cited (On Nary Relations <https://www.w3.org/2004/08/12-Yoshio/onNaryRelations.html>and Defining N-ary Relations on the Semantic Web <https://www.w3.org/TR/swbp-n-aryRelations/>) both use the term in their title but then go on to cover many patterns, like ones about evidence, which are not n-ary relations in this strict sense. Is there a better term for the broader problem? > This matters because there will be cases which are genuinely N>2-ary > relations, and you might want to not get them confused with this case, > by keeping distinct encodings from day one. Perhaps. Or it might be simpler to have a general n-ary-ish model that works across the sphere. Not sure where the "as simple as possible but no simpler" line falls here. > Second, several of your alternatives don't make first base. Shape 4 > fails because asserting the reification doesnt actually assert the > reified triple, it just says it exists. (See RDF semantics.) Ah, true, I wasn't seriously proposing that one, so I missed that bit. That's "easy" to fix by adding another arc, "a cred:Assertion". I've added a note to that effect in the doc, but not modified the example (yet). One could also make being-asserted a part of the semantics of cred:evidence, but I wouldn't do that. I put quotes around "easy" because of my sense that truth predicates are dangerous. We probably don't need to talk about this option more, unless someone is actually advocating for it. > Shape 1 fails because what does _:a denote? What kind of thing is it? > I can't find any interpretation that makes sense. It seems to be both > a date and a 'credibility signal', whatever that is; but it is also > the subject of operationalAsEarlyAs. Can a date be operational at a date? [ Aside for the audience: Pat and I have been having this kind of debate for about 19 years now. As a newbie, I used to find the tone and strength of his arguments unsettling. Actually, I still do. ] Sorry, it looks like my class naming turned out to be misleading. I was assuming the context for people seeing this model would be the planned new signals document. Here's the current title and abstract of that document, to give that context: Community-Approved Credibility Signals Abstract: Credibility signals are observations, made by humans or machines, which are used in deciding how much to trust some information. This document specifies some types of these observations which seem particularly useful in online credibility assessments, especially when assisted by machine processing and a network of people and systems making related observations. It also includes some guidance on how credibility data (that is, data expressing these observations) can be exchanged online. The choice of which signals to include was made by the W3C Credible Web Community Group and is expected to be revised periodically in light of new information. With that in mind, perhaps it's more clear that _:a denotes an observation, a "credibility signal". Perhaps a signal isn't exactly "an observation", but is more a record of the observation, or it's the information obtained by making an observation, but I doubt that level of semantic detail will be beneficial. In this example, _:a represents an observation of the date a website is first archived. So perhaps that class name should be "cred:ObservationOfDateWebsiteFirstArchived", but in the context of hundreds of other such classes, all starting "ObservationOf..." I thought it best to drop that part. But maybe that's bad, since some people (like you in this case) will be coming in without that context and somehow thinking that when we call it a date it's a date. :-) > Shape 2 kind of makes sense except for the relation > credAbout:operationalAsEarlyAs which is misnamed, since (as you point > out) the _:b node can have other information attached to it so is not > particularly connected to anything to do with being as early as. What > you need is something like credAbout:avatar since _:b is playing the > role of a surrogate for the main subject here. Yeah, I was just trying to follow the naming conventions of WikiData. They view it as annotations on an arc, where they split the arc with this blank node in the middle, and have the inbound and outbound parts retain the name of the original arc, just in different namespaces. At least, that's what I recall. I'm having trouble finding a simple and clear explanation of their approach. One reference is Reifying RDF: What Works Well With Wikidata? <http://aidanhogan.com/docs/reification-wikidata-rdf-sparql.pdf> but hopefully someone knows something more recent and/or simpler. > But the one that works best is shape# 3. That captures the intended > meaning perfectly: it asserts the main statement directly, and adds a > comment ABOUT it. This correctly keeps data and metadata separated > without either of them distorting the other or requiring some kind of > hard-to-remember artifactual encoding. So it has my enthusiastic vote. It was also my favorite, until I started actually writing software to use this stuff. In practice, I find that I want to use named graphs for multiple things, but using them for more than one thing at once is a problem. Like, alice.example and bob.example are each providing data streams with these website-date observations. I want to use the graph name to keep track of what came from Alice and what came from Bob, but that somewhat conflicts with using it inside the data. How do I record that a quad came from Alice? I think there are some techniques, but they seem to get fairly complicated, and they do this in the neighborhood of a security layer (since I might trust triples from one source more than from another), which increases some risks. After a while, I found my preference shifting away from this. There's also an issue which I'm surprised doesn't bother you: the semantics of RDF datasets. There are two aspects here: (1) the statement in the named graph isn't exactly asserted by the dataset; and (2) The graph name (_:c) does not actually denote the graph, it is merely paired with it. These are both issues that you and I talked about in the 2011 RDF 1.1 Working Group, but as I recall were never settled. I think the relevant docs are RDF 1.1 Concepts and Abstract Syntax <https://www.w3.org/TR/rdf11-concepts/> and RDF 1.1: On Semantics of RDF Datasets. <https://www.w3.org/TR/2014/NOTE-rdf11-datasets-20140225/> That said, I don't think those semantic issues are necessarily real-world problems. I think in practice it's entirely possible to publish and interact with datasets with whatever semantics we actually want, with all the potential problems avoided by flagging the dataset in some metadata as having our chosen semantics. Maybe. > <End of 2c rant> Thank you so much for your thoughts on this. I'm curious if I've missed any of your points, or if I've swayed you at all. -- Sandro
Received on Friday, 7 February 2020 15:22:17 UTC