Re: Blank Node Identifiers and RDF Dataset Normalization

On 02/24/2013 05:54 AM, Andy Seaborne wrote:
>> http://manu.sporny.org/2013/rdf-identifiers/
> 
> Do you agree or disagree that it is developer friendly to be able to
>  read the document at URL <http://example/foo>, get the triples and 
> put those triples into a dataset as graph labelled 
> <http://example/foo>?

I realize that you've asked questions similar to this one several times
now without the two of us being able to get into the details. I tried to
explain during the last telecon that this question is independent of the
"allow blank node identifiers as graph labels" decision that I was
asking for last week.

It does have a bearing on the "blank node identifiers and IRIs, should
denote the graph" comment I had made at some point, which is a parallel
concern, but far less to me than you may think it is. I also tried to
explain that the first decision can be made independently of the second.
We can allow blank node identifiers for graph names without saying that
they must denote the graph. The RDF 1.1 Concepts document can continue
to not take a strong position either way.

That said, let me try to answer your question more directly:

I agree that it is developer friendly to be able to read a document from
an URL and associate the triples generated with a graph labeled with the
same URL /as long as/ the developer doesn't use that URL to refer to
anything else. In that case, I view the URL as denoting the graph:

<http://example.com/foo> a rdf:Graph .

I disagree that it is a best practice for a developer to then use that
URL and associate it with a Person that isn't described at all in the graph:

<http://example.com/foo/> a rdf:Graph, foaf:Person .

If something is a bad practice, I don't view it as developer-friendly.

The developer-friendly guidance to developers should be: "If you use a
URL to label a graph, don't use that URL for anything else."

This concept is familiar to developers, because they are often told to
not re-use variables for entirely different purposes while programming:

var numPeople = 12;
...
numPeople = "Carrot"; // this is bad programming practice

I'd like to see us formalize this in the specification with something like:

"A graph name, whether it is an IRI or a blank node, SHOULD denote the
graph."

I do realize that there is legacy data where the developers sucked in
data and used the file/HTTP URL as the name of the graph without
understanding whether or not that same URL was used for other data. In
this case, they were lazy and should have labeled the graph with
something that didn't conflict.

I also realize that one might use the same URL to label the graph and
the main subject of the graph. While this may seem harmless on the
surface, it will confuse the software when you start making statements
(such as provenance statements) on the same URL that you make other
non-graph-related statements on. For example, if you attach
"<http://example.com/foo> foaf:name "Andy" ." to the same URL as the
graph labeled by http://example.com/foo, you get that the name of the
graph is "Andy", which is probably wrong.

Two questions for you:

1. Why are you interested in getting an answer to the question that you
asked, since it's secondary to what we're discussing?

2. Why is what I outline above not a workable solution for everyone that
uses RDF?

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny, G+: +Manu Sporny)
Founder/CEO - Digital Bazaar, Inc.
blog: Aaron Swartz, PaySwarm, and Academic Journals
http://manu.sporny.org/2013/payswarm-journals/

Received on Sunday, 24 February 2013 19:53:28 UTC