Re: Well Behaved RDF - Taming Blank Nodes, etc. from Hugh Glaser on 2012-12-13 (semantic-web@w3.org from December 2012)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Thu, 13 Dec 2012 20:50:58 +0000
To: Pat Hayes <phayes@ihmc.us>
CC: David Booth <david@dbooth.org>, semantic-web Web <semantic-web@w3.org>
Message-ID: <387E72E216DF1247A2F8ED4819C93BA71E389B23@UOS-MSG00041-SI.soton.ac.uk>
Hi again Pat,
On 13 Dec 2012, at 07:16, Pat Hayes <phayes@ihmc.us>
 wrote:

> 
> On Dec 12, 2012, at 10:10 AM, Hugh Glaser wrote:
> 
>> Nice idea David.
>> And I like that your example is about *consumption*, not production - I would hope that this exercise can use ease of consumption as the metric.
>> 
>> Your example of blank nodes is very important.
>> Another problem of blank nodes is that no-one else can make statements about the resource.
>> I won;t get into the discussion about whether blank nodes should be banned :-)
>> But this argument against their use leads me to a rule of thumb:
>> 
>> A string literal should only ever appear as the object of rdfs:label (or something that could naturally be made a sub-property of rdfs:label, such as foaf:name).
> 
> What complete nonsense. At least there is some practical reason to restrict bnodes, but none whatsoever for this.
> 
>> dHopefully the reason for this is clear:- if it is somewhere else, then it is essentially identifying a resource; and if it is a resource, someone else (or you, one day) may want to refer to it. And of course, if it is a string, you can't do that.
> 
> Wherever it occurs, it is identifying a string. String literals ALWAYS identify strings, as a matter of logical necessity. So of course you can refer to it: you use the string literal. (Duh.)
You are of course right.
There is more than one resource involved - sorry it wasn't clear.
It is just there is no resource for the thing that the string is labelling.

I tend to see things through examples of where I have had problems:
The biggest example of where huge amounts of work had to be done to fix things (and still causes trouble) is Dublin Core. If DCTerms had been the starting position, life would have been much easier.

Example 1:
http://data.open.ac.uk/person/0e5d4257051894026ea74b7ed55557e7
contains
<http://data.open.ac.uk/person/0e5d4257051894026ea74b7ed55557e7> <http://xmlns.com/foaf/0.1/topic_interest> "Watson" .
That is fine from the publisher's point of view - they just want to list Mathieu's interests, and it satisfies that requirement.
But essentially that is all it is good for.
No-one can provide a translation of the term and attach it to the interest.
No-one can do the sort of interesting smarts that we routinely do to find out who shares interests.
You couldn't even ask the system who else is also interested in the same topic.
Well you could do a query for "Watson", but that is like doing a natural language query, and a big selling point of the Semantic Web is that it lifts you up from natural language to proper identifiers.
Incidentally (pace bnodes), you will find the following in http://data.open.ac.uk/person/0e5d4257051894026ea74b7ed55557e7:
<http://data.open.ac.uk/person/0e5d4257051894026ea74b7ed55557e7> <http://xmlns.com/foaf/0.1/based_near> _:node17082 .
As far as I can tell, this tells me that Mathieu is based near somewhere - that's great stuff and really valuable?
Had they used URIs, as I load these documents I can then find out who is near the same place - not possible as it stands.

Example 2:
http://id.southampton.ac.uk/building/25 used to have the architect as a string
<http://id.southampton.ac.uk/building/25> <http://id.southampton.ac.uk/ns/buildingArchitect> "Gutteridge & Gutteridge" .
That's all Chris wanted to say - he knew the architect's name - fair enough.
But when I tried to do something interesting, I pointed out that people can't then process this to present extra interesting information, for example dbpedia.
And Chris can't then modify his system in response to a request to be able to see all the buildings by the same architect.
I'm pleased to say that Chris now publishes
<http://id.southampton.ac.uk/building/25> <http://id.southampton.ac.uk/ns/buildingArchitect> <http://id.southampton.ac.uk/building/25#architect> .
<http://id.southampton.ac.uk/building/25#architect> <http://www.w3.org/2000/01/rdf-schema#label> "Gutteridge & Gutteridge" .
and this data is now much more valuable.

> 
>> Yes, I know it is an extra URI (not blank, of course!) and a triple, but what is the point of RDF if you can't refer to resources?
>> I have to say that whenever I have been tempted to put a string in somewhere else I have been able to work out very real use cases where it would have been a bad thing to do.
>> Strings are labels, and that's it.
>> 
>> So what about other literals?
>> I actually wonder whether the same is often true here, but perhaps not as strongly. Should they only be used as the object of rdf:value or similar.
>> Consider if you have made a statement publishing your age using foaf:age. The age is an integer string.
>> This means that I can't make a comment on your age :-)
>> Or rather, if I do comment on your age, it will have to be more complex, and involve a URI for you.
> 
> Just as the data you are commenting on does.  You seem to be saying that in order to comment on data, you have to use the same conventions as the data already uses, to refer to the things that the data refers to. A good point, but I don't see what it has to do with literals. 
> 
>> So I can't for example say that your age is the same as my age (with some predicate).
> 
> Pat sameAgeAs Hugh .  Looks OK to me. Or if you want to do it in Owl, you could use a restriction class of people with the same value of the hasAge property as mine, and say you are in it. 
> 
>> And I can't give you a URI that is my age, that you could give to someone else who might know what it means, even though you don't.
> 
> How could a URI identify my age? I will invent one that does, here goes: http://www.ihmc.us/groups/phayes/frongle. OK, that denotes my age. Good luck with finding out how old I am, though.
I don't want to know how old you are; I want to be able to refer to your age.
Perhaps I want to ask a service whether the unnamed person with age frongle is old enough to buy alcohol.
> To do that, you would probably have to find some RDF that said something like
> 
> http://www.ihmc.us/groups/phayes  :age http://www.ihmc.us/groups/phayes/frongle .
> http://www.ihmc.us/groups/phayes/frongle :valueInYears "68"^^xsd:number .
> 
> I fail to see how this is more useful than
> 
> http://www.ihmc.us/groups/phayes foaf:age "68"^^xsd:number .
> 
> and it doesn't have any fewer literals in it. 
It is about having names for the things that you want to allow people to talk about.
As I said, I am not so sure about numbers.
But the distinction here is that I have an identifier for your age.
So I can make statements about it.
Otherwise I can only make statements about you.

Best
Hugh
> 
> Pat
> 
> 
>> 
>> I'm not sure if this is the sort of thing you were asking for, but that's my 2 cents worth.
>> 
>> Best
>> Hugh
>> On 12 Dec 2012, at 17:01, David Booth <david@dbooth.org>
>> wrote:
>> 
>>> I'm writing a paper to propose a profile of RDF that would enable
>>> simpler tools to process RDF, and I'm wondering if others have
>>> suggestions of constraints that may be helpful to include.  The idea is
>>> not to change the RDF standard, but to define a useful, voluntary subset
>>> -- Well Behaved RDF -- that is sufficient for most RDF applications but
>>> simplifies their development.
>>> 
>>> For example, one key limitation would be in the use of blank nodes,
>>> which severely complicate what could otherwise be simple tasks, such as
>>> comparing two RDF graphs for equality.  With unrestricted blank nodes,
>>> this becomes a difficult graph isomorphism problem instead of a simple
>>> text comparison.  Some have suggested eliminating blank nodes entirely,
>>> but a more modest restriction would be to limit them to common idioms
>>> that do not cause such complexity problems:
>>> 
>>> A Well-Behaved RDF graph is an RDF graph that can be serialized 
>>> as Turtle without the use of explicit blank node identifiers. 
>>> I.e., only blank nodes that are implicitly created by the 
>>> bracket "[ ... ]" or list "( ... )" notations are permitted. 
>>> 
>>> Are there other restrictions that would be helpful to have in a Well
>>> Behaved RDF profile, which would simplify our lives as RDF developers,
>>> but still meet the needs of most RDF applications?  For example, what if
>>> anything might be said about non-lean RDF graphs?  Should typed literals
>>> be required to be well formed per their type semantics?  Should the use
>>> of rdf:first and rdf:rest be limited to well-formed, rdf:nil-terminated
>>> lists, i.e., those that can be serialized as Turtle lists “( . . . )”?
>>> Etc.  What do others think?
>>> 
>>> 
>>> -- 
>>> David Booth, Ph.D.
>>> http://dbooth.org/
>>> 
>>> Opinions expressed herein are those of the author and do not necessarily
>>> reflect those of his employer.
>>> 
>>> 
>> 
>> 
>> 
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
>
Received on Thursday, 13 December 2012 20:51:52 UTC