Re: What <<>> means from Ivan Herman on 2020-12-08 (public-rdf-star@w3.org from December 2020)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 8 Dec 2020 08:52:48 +0100
To: Pat Hayes <phayes@ihmc.us>
Cc: public-rdf-star@w3.org, Manu Sporny <msporny@digitalbazaar.com>, Dave Longley <dlongley@digitalbazaar.com>, Aidan Hogan <aidhog@gmail.com>
Message-Id: <66A2EA73-9B99-4848-A128-B9D1F7F85204@w3.org>
> On 7 Dec 2020, at 18:14, Patrick J Hayes <phayes@ihmc.us> wrote:
> 
> Hi Ivan
> 
> Yes, that sounds better than my nth-in-lexical-order idea (which was a somewhat desperate attempt to find /some/ way to identify one triple in a set of them), provided that (and this may be obvious, but I am not sufficiently familiar with the subtleties of hash-codes) one can easily get back from the hash to the particular instance of that triple in a particular RDF source (which is what I understand is the intended main use case), ie rather than just the SPO pattern itself.

I am not sure I understand what exactly you are asking. If I _only_ have the hash then I cannot reproduce the SPO triple (or the SPOG quad) from it. The hash function is not invertible. But if I have a a graph, the hash function is unique, so by calculating the hash from an upcoming SPO(G) I can identify its number, because that is deterministic (that is obvious in the case of a blank-node-less SPO(G) but relies on the graph canonicalization otherwise). Is this what you were referring to?

(Whether the numeration approach works for the practical use cases: I do not know either. But I agree with you that, instead of trying to push things down the throat of reification using this for some sort of semantics may be better…)

> I guess from your message that might involve quads…?

The canonicalization algorithm depends on a specific graph of course. What I was just referring to that it is also possible to canonicalize a dataset, ie, a collection of named graphs or, equivalently, a collection of quads (where even the graph identifier may be a blank node).

Cheers

Ivan


> 
> Pat
> 
>> On Dec 7, 2020, at 2:41 AM, Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
>> 
>> Hi Pat,
>> 
>> you are of course allowed to come out of 'the shadows of retirement';  let me be allowed to do the same from the shadows of retirement from the Semantic Web Activity:-)
>> 
>> Let us suppose we had a standardized scheme for an RDF canonicalization. As you well know:-) this would involve a canonical, deterministic relabeling of blank nodes. Because it would be done on the RDF data, not its serialization, it would be serialization independent.
>> 
>> *If* we have that, we could 'simply' (details to be filled in) generate the hash of the _canonical_ triple in, say, N-triple syntax and use that as the identification of the triple. Wouldn't that simplify your approach? (Ie, you would not have to find a way to order the triples in RDF/XML :-) What it would mean that the triple below would 'simply' be:
>> 
>> "an ugly hash string"^^http://tripleID/ <http://tripleid/> P O .
>> 
>> The added value would also be that if we have a <S P O> without a blank node, it hash becomes unique, ie, not necessarily dataset dependent. Also, a canonicalization algorithm (see below) may also be defined for quad, ie, the dependency of an <S P O> on a specific graph could also be taken into account.
>> 
>> WDYT?
>> 
>> Ivan
>> 
>> P.S. Of course, this relies on a standard for RDF canonicalization. For other reasons, this is very much needed in practical RDF applications. There are a few algorithms out there by now (Aidan Hogan has a published algorithm, Dave Longley has a different one which is also deployed in some areas) and we are actually considering reconciling those two and create a W3C standard for it. Just do not ask me when that would happen...
>> 
>> 
>>> On 6 Dec 2020, at 10:04, Patrick J Hayes <phayes@ihmc.us <mailto:phayes@ihmc.us>> wrote:
>>> 
>>> (I had vowed to not get invoved in this business, but I cannot resist putting my 2c into the discussion, so here goes…)
>>> 
>>> Some observations.
>>> 
>>> 1. From reading the various contributions, the primary goal seems to be to provide a compact and semantically coherent way to both assert a triple and to say something about it, where ‘it’ here means the instance or token of the triple in a particular document, either the same one that contained the asserted triple or one closely associated with it. And to do all this with minimal damage to the basic RDF model of graphs, triples, etc..
>>> 
>>> 2. Linking this to RDF reification was seen as one way to keep the RDF model intact, but in retrospect that might have been a mistake. FOr a variety of reasons, mostly historical, RDF reification has little to do with the primary goal. (In fact, I still have no clear idea what RDF reificaiton is for, even after working on two RDF working groups for everal years.)
>>> 
>>> 3. Putting aside reification as irrelevant, therefore, and focussing instead on the primary goal of annotating triples, there are basically two ways to do this. Either we somehow provide (invent) a way to give names to triples, so that we can use that name in other triples to make the assertions comprising the annotations; or, we attach the annotations to the triples by ostention, that is by directly attaching them to the triple. Which is a bit like pointing to the triple and saying “this triple” instead of inventing a name for it and using the name. This was, I believe, the original idea of the <<s p o>> P O . notation, which was unfortunately somewhat confused by linking it to RDF reification. 
>>> 
>>> 4. The problem with this, however, is that it requires extending RDF syntax to allow a new kind of node. Which naturally suggests that we should seek a way to treat this as a shorthand for a construct using more conventional RDF. Not reification, so what? The ‘outer’ triple would be a RDF triple if its subject were a bnode, URI or literal (well, it would be a generalised RDF triple with a literal subject). Of these, the semantically most sensible would be a literal, because literals, unlike URIs, are considered to have fixed denotations; and we want these triple names to have exactly this quality, to rigidly identify a particular syntactic object in a particular RDF source (https://www.w3.org/TR/rdf11-concepts/#change-over-time <https://www.w3.org/TR/rdf11-concepts/#change-over-time>). 
>>> 
>>> 5. So, let me suggest a literal scheme for identifying triles in documents. The datatype name is ‘http://tripleID/' <http://tripleid/'> and it reconizes strings of the form A+B+C where A is either the URI of the document containing the triple token, or the empty string, indicating the document in which the literal occurs; B is one of a set of predefined strings defining the various RDF surface syntaxes, eg 'RDF-XML’, ‘TURTLE’, etc;, and C is a numeral which identifies the particular triple following a convention defined for documents of that syntactic type. This requires standardizing a few of these conventions, but this should not be an impossibly large task (though I admit not having any idea how to approach ordering triples in RDF-XML.) Thus for example the literal value of  ‘+NTRIPLES+17’^^http://tripleID/ <http://tripleid/> is the 17th triple in the linear ordering of the triples in the document in which that literal occurs. 
>>> 
>>> 6. With this datatype defined, we can treat 
>>> <<s p o> P O>
>>> in a N-triples document as shorthand for the pair of triples
>>> s p o .
>>> ‘+NTRIPLES+1’^^http://tripleID/ <http://tripleid/> P O .
>>> which admittedly has a literal subject, but in every other respect is conventional RDF, albeit recoognizing a rather unconventional datatype.
>>> 
>>> 7. I suggest that using some such literal scheme (perhaps more elegant than this one) for generating triple-token identifiers is the simplest and semantically least objectionable way to map starred notations back into conventional RDF. It does not change the core of RDF by requiring new kinds of node, or exending the RDF semantics. It keeps all the oddity inside the description of the datatype. It allows metadata triples to annotate triples in other documents, even in a different RDF dialect, but it is more compact when this is not needed. It also can be fairly simply extended, if required, to allow annotaitons of multiple triples, ie of subgraphs, eg by allowing the strings to be extended by adding more ‘+numeral’ phrases, without changing the basic model. 
>>> 
>>> 8. Of the other alternatives for the annotation triple subject, bnides obvoiusly do not cut it as they don;t act as names or idnetifiers; and IRIs require a convention for naming triples iwth IRIs, which is so close to the named-graph idea that it hardly seems worth inventing something new to do it, but in any case requires extending the RDF model to include such a naming convention. 
>>> 
>>> OK, thats my 2c. If I have just re-invented someone else’s wheel, please forgive my failure of scholarship from the shadows of retirement. 
>>> 
>>> Pat Hayes
>>> 
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C 
>> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
>> mobile: +33 6 52 46 00 43
>> ORCID ID: https://orcid.org/0000-0003-0782-2704 <https://orcid.org/0000-0003-0782-2704>
>> 
> 


----
Ivan Herman, W3C 
Home: http://www.w3.org/People/Ivan/
mobile: +33 6 52 46 00 43
ORCID ID: https://orcid.org/0000-0003-0782-2704
Received on Tuesday, 8 December 2020 07:52:57 UTC