Re: Blank Node Identifiers and RDF Dataset Normalization from Pat Hayes on 2013-02-25 (public-linked-json@w3.org from February 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 25 Feb 2013 17:27:20 -0600
To: Steve Harris <steve.harris@garlik.com>
Cc: Markus Lanthaler <markus.lanthaler@gmx.net>, "'William Waites'" <ww@styx.org>, <msporny@digitalbazaar.com>, <public-rdf-wg@w3.org>, <public-linked-json@w3.org>
Message-Id: <D2837560-34C2-438B-B399-4CEADB035A5F@ihmc.us>
On Feb 25, 2013, at 9:45 AM, Steve Harris wrote:

> On 2013-02-25, at 13:00, Markus Lanthaler <markus.lanthaler@gmx.net> wrote:
> 
>>> For example:
>>> 
>>> SELECT * WHERE {
>>>  ?g dc:date ?d .
>>>  GRAPH ?g { ?x a foaf:Person }
>>> }
>> 
>> Given that it has been decided that graph labels do *not* denote the graph,
> 
> I believe it would be more correct to say that graph labels do not HAVE to demote the graph, they're allowed to if you want them to.

True, but we have no way to convey such a "want to" in RDF syntax. So whatever it is that the writer wanted, the reader has no way to know that. If the Web were telepathic, we would not need information transmission standards at all, as you could mind-project your desired meaning of all your byte streams. In the real world, however, we usually have to rely on specificaitons to provide us a clue as to how to interpret the things we read. According to our current specifications, when you read some RDF in a dataset which uses a URI which is also used as a graph label, you have no way to know whether or not the first use of the IRI is supposed to be related in meaning to the second use. 

> Regardless, the example is valid regardless on whatever graph labelling semantics are being used - within some system with a known relationship between graph labels and metadata.

But the entire point of RDF, why it was invented in the first place, was to allow information to be conveyed across the Web and used at the point of reading, without having to know any conventions in use at its point of creation. If we have an RDF convention that depends on the RDF being used "within some system", then we are mis-using RDF. We have created a design that cannot be used in RDF which is being used for its primary purpose, and in so doing, have destroyed any possibility of having a coherent semantics for the basic SPARQL construct. This is an epic failure, especially when we were chartered to provide a semantics for datasets. 

> 
> If the graph label refers to the document which was parsed, and the metadata refers to the parsing (which is a very common situation), then the example is equally valid.

I have no problem with that. But what if it refers to a person or a time, and not to the graph/graph-source/g-box/document at all? 

> I think you may be attaching too much important to the idea of denoting.

Denoting is simply a synonym for "naming" or "referring to". It's not an exotic idea. If you are using names (IRIs) in RDF, you are using them to denote. 

> 
>> I find such example especially confusing. You use the same variable (?g) in
>> the subject position and as a graph label knowing that they do not refer to
>> the same. Semantically, the two have nothing in common at all. ?g could
>> denote a person, a document, an event, whatever. The graph ?g is a
>> completely different "thing". Effectively you could say they use the same
>> IRI by coincidence. I think it are these kind of examples that lead to the
>> current situation. Contrast that with a query like and assume the IRI would
>> denote the graph
>> 
>> SELECT * WHERE {
>>  ?someone_thing :stated ?g .
>>  GRAPH ?g { ?x a foaf:Person }
>> }
>> 
>> 
>> I think at the very least, the effects of the decision that graph labels do
>> not denote the graph should be made clearer in RDF Concepts. I don't know
>> how but maybe an example helps to illustrate the problem. That information
>> also shouldn't be put in a non-normative note IMHO.
> 
> Well, first we'd have to find a problem with it…

Imagine a scenario where information from a number of sources is being integrated into one datastore, all about authorship of RDF graphs. The goal is to have a dataset with a default graph recording authorship information using triples

:personIRI :authorOf :graphLabel .

where :graphLabel identifies a graph in the dataset using the graph label convention. But suppose one of the sources being mashed has cleverly taken advantage of the graph-label-denoting-something-else freedom to simply label each graph with an IRI denoting its author. Then we will get triples like

:personIRI :authorOf :personIRI .

which can only be interpreted as something (be it a person or graph) authoring itself. Which is nonsense, and probably will cause an inconsistency with some data model or ontology defining :authorOf. 

You can work a similar problem with graph labels referring to just about anything other than the graph (or graph document).

> I suspect a world where graph labels always denote graphs would be much more confusing and counter-intuative to the average developer.

Why would it be confusing and counter-intuitive for something called a "graph label" to be the name of the thing it is labelling? Isn't it normal, even for the average developer, to think of identifiers as identifying something, and to feel a slight frisson of concern when they are obliged to use the same identifier to mean two different things at the same time? 

Pat

> 
> - Steve
> 
> -- 
> Steve Harris
> Experian
> +44 20 3042 4132
> Registered in England and Wales 653331 VAT # 887 1335 93
> 80 Victoria Street, London, SW1E 5JL
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 25 February 2013 23:28:01 UTC