Re: What <<>> means from Ivan Herman on 2020-12-07 (public-rdf-star@w3.org from December 2020)

From: Ivan Herman <ivan@w3.org>
Date: Mon, 7 Dec 2020 09:41:24 +0100
To: Pat Hayes <phayes@ihmc.us>
Cc: public-rdf-star@w3.org, Manu Sporny <msporny@digitalbazaar.com>, Dave Longley <dlongley@digitalbazaar.com>, Aidan Hogan <aidhog@gmail.com>
Message-Id: <00430363-D7F8-4AAD-91F4-1FBC2B159E54@w3.org>
Hi Pat,

you are of course allowed to come out of 'the shadows of retirement';  let me be allowed to do the same from the shadows of retirement from the Semantic Web Activity:-)

Let us suppose we had a standardized scheme for an RDF canonicalization. As you well know:-) this would involve a canonical, deterministic relabeling of blank nodes. Because it would be done on the RDF data, not its serialization, it would be serialization independent.

*If* we have that, we could 'simply' (details to be filled in) generate the hash of the _canonical_ triple in, say, N-triple syntax and use that as the identification of the triple. Wouldn't that simplify your approach? (Ie, you would not have to find a way to order the triples in RDF/XML :-) What it would mean that the triple below would 'simply' be:

"an ugly hash string"^^http://tripleID/ P O .

The added value would also be that if we have a <S P O> without a blank node, it hash becomes unique, ie, not necessarily dataset dependent. Also, a canonicalization algorithm (see below) may also be defined for quad, ie, the dependency of an <S P O> on a specific graph could also be taken into account.

WDYT?

Ivan

P.S. Of course, this relies on a standard for RDF canonicalization. For other reasons, this is very much needed in practical RDF applications. There are a few algorithms out there by now (Aidan Hogan has a published algorithm, Dave Longley has a different one which is also deployed in some areas) and we are actually considering reconciling those two and create a W3C standard for it. Just do not ask me when that would happen...


> On 6 Dec 2020, at 10:04, Patrick J Hayes <phayes@ihmc.us> wrote:
> 
> (I had vowed to not get invoved in this business, but I cannot resist putting my 2c into the discussion, so here goes…)
> 
> Some observations.
> 
> 1. From reading the various contributions, the primary goal seems to be to provide a compact and semantically coherent way to both assert a triple and to say something about it, where ‘it’ here means the instance or token of the triple in a particular document, either the same one that contained the asserted triple or one closely associated with it. And to do all this with minimal damage to the basic RDF model of graphs, triples, etc..
> 
> 2. Linking this to RDF reification was seen as one way to keep the RDF model intact, but in retrospect that might have been a mistake. FOr a variety of reasons, mostly historical, RDF reification has little to do with the primary goal. (In fact, I still have no clear idea what RDF reificaiton is for, even after working on two RDF working groups for everal years.)
> 
> 3. Putting aside reification as irrelevant, therefore, and focussing instead on the primary goal of annotating triples, there are basically two ways to do this. Either we somehow provide (invent) a way to give names to triples, so that we can use that name in other triples to make the assertions comprising the annotations; or, we attach the annotations to the triples by ostention, that is by directly attaching them to the triple. Which is a bit like pointing to the triple and saying “this triple” instead of inventing a name for it and using the name. This was, I believe, the original idea of the <<s p o>> P O . notation, which was unfortunately somewhat confused by linking it to RDF reification. 
> 
> 4. The problem with this, however, is that it requires extending RDF syntax to allow a new kind of node. Which naturally suggests that we should seek a way to treat this as a shorthand for a construct using more conventional RDF. Not reification, so what? The ‘outer’ triple would be a RDF triple if its subject were a bnode, URI or literal (well, it would be a generalised RDF triple with a literal subject). Of these, the semantically most sensible would be a literal, because literals, unlike URIs, are considered to have fixed denotations; and we want these triple names to have exactly this quality, to rigidly identify a particular syntactic object in a particular RDF source (https://www.w3.org/TR/rdf11-concepts/#change-over-time). 
> 
> 5. So, let me suggest a literal scheme for identifying triles in documents. The datatype name is ‘http://tripleID/' and it reconizes strings of the form A+B+C where A is either the URI of the document containing the triple token, or the empty string, indicating the document in which the literal occurs; B is one of a set of predefined strings defining the various RDF surface syntaxes, eg 'RDF-XML’, ‘TURTLE’, etc;, and C is a numeral which identifies the particular triple following a convention defined for documents of that syntactic type. This requires standardizing a few of these conventions, but this should not be an impossibly large task (though I admit not having any idea how to approach ordering triples in RDF-XML.) Thus for example the literal value of  ‘+NTRIPLES+17’^^http://tripleID/ is the 17th triple in the linear ordering of the triples in the document in which that literal occurs. 
> 
> 6. With this datatype defined, we can treat 
> <<s p o> P O>
> in a N-triples document as shorthand for the pair of triples
> s p o .
> ‘+NTRIPLES+1’^^http://tripleID/ P O .
> which admittedly has a literal subject, but in every other respect is conventional RDF, albeit recoognizing a rather unconventional datatype.
> 
> 7. I suggest that using some such literal scheme (perhaps more elegant than this one) for generating triple-token identifiers is the simplest and semantically least objectionable way to map starred notations back into conventional RDF. It does not change the core of RDF by requiring new kinds of node, or exending the RDF semantics. It keeps all the oddity inside the description of the datatype. It allows metadata triples to annotate triples in other documents, even in a different RDF dialect, but it is more compact when this is not needed. It also can be fairly simply extended, if required, to allow annotaitons of multiple triples, ie of subgraphs, eg by allowing the strings to be extended by adding more ‘+numeral’ phrases, without changing the basic model. 
> 
> 8. Of the other alternatives for the annotation triple subject, bnides obvoiusly do not cut it as they don;t act as names or idnetifiers; and IRIs require a convention for naming triples iwth IRIs, which is so close to the named-graph idea that it hardly seems worth inventing something new to do it, but in any case requires extending the RDF model to include such a naming convention. 
> 
> OK, thats my 2c. If I have just re-invented someone else’s wheel, please forgive my failure of scholarship from the shadows of retirement. 
> 
> Pat Hayes
> 
> 


----
Ivan Herman, W3C 
Home: http://www.w3.org/People/Ivan/
mobile: +33 6 52 46 00 43
ORCID ID: https://orcid.org/0000-0003-0782-2704
Received on Monday, 7 December 2020 08:41:31 UTC