Re: Well Behaved RDF - Taming Blank Nodes, etc.

( For the philoweb group: an extract from a debate on the semantic-web mailing list
that involves theories of reference )

On 19 Dec 2012, at 17:36, Lee Feigenbaum <lee@thefigtrees.net> wrote:

> On 12/19/2012 10:47 AM, Henry Story wrote:
>> On 19 Dec 2012, at 16:42, Lee Feigenbaum <lee@thefigtrees.net> wrote:
>> 
>>> On 12/19/2012 9:50 AM, Henry Story wrote:
>>>> On 12 Dec 2012, at 18:43, Pat Hayes <phayes@ihmc.us> wrote:
>>>> 
>>>>> On Dec 12, 2012, at 9:01 AM, David Booth wrote:
>>>>> 
>>>>>> I'm writing a paper to propose a profile of RDF that would enable
>>>>>> simpler tools to process RDF, and I'm wondering if others have
>>>>>> suggestions of constraints that may be helpful to include.  The idea is
>>>>>> not to change the RDF standard, but to define a useful, voluntary subset
>>>>>> -- Well Behaved RDF -- that is sufficient for most RDF applications but
>>>>>> simplifies their development.
>>>>>> 
>>>>>> For example, one key limitation would be in the use of blank nodes,
>>>>>> which severely complicate what could otherwise be simple tasks, such as
>>>>>> comparing two RDF graphs for equality.  With unrestricted blank nodes,
>>>>>> this becomes a difficult graph isomorphism problem instead of a simple
>>>>>> text comparison.  Some have suggested eliminating blank nodes entirely,
>>>>>> but a more modest restriction would be to limit them to common idioms
>>>>>> that do not cause such complexity problems:
>>>>>> 
>>>>>> A Well-Behaved RDF graph is an RDF graph that can be serialized
>>>>>> as Turtle without the use of explicit blank node identifiers.
>>>>>> I.e., only blank nodes that are implicitly created by the
>>>>>> bracket "[ ... ]" or list "( ... )" notations are permitted.
>>>>> That is too restrictive. There is a real need to be able to describe things such as "Joe's father" or "a woman in a red dress" which are naturally phrased as bnodes with identifying descriptors attached to them.
>>>> Yes, but here for a HTTP related example:
>>>> 
>>>> The web creates such resources all the time: whenever you POST a form
>>>> and the server does not give you a URI for the returned result, you have in
>>>> fact created a resource that has not got a name. This resource will need to be
>>>> described using a blank node. It could be described well enough as a creation of
>>>> that form, and with date and time, but giving it a name on that server would be
>>>> an error, and forcing oneself to name a remote resource, when the remote owner
>>>> did not want to do that is more work than you may want.
>>> What's wrong with generating a UUID-based URI for a POST request?
>> There's nothing wrong per se. But it is more prone to a mistake happening
>> that having a blank node. The blank node is the easiest way to deal with
>> things that don't have names.
> What sort of mistakes do you have in mind with URIs? With blank nodes, the mistakes are well-understood (see the modelling mistakes elsewhere in this thread and also the common mistake of attempting/expecting to reuse blank node identifiers).

If you use a UUID you could accidentally make a UUID that someone else has already used. 
Since you are making a claim to a global name, you have this kind of risk that comes with 
it. Writing [] does not have this type of risk.

UUIDs are especially problematic. You can create a UUID, but how do you prove that you 
created the UUID, and that your definition has priority over someone else's?
If there is a clash between statements about a UUID who is right? There
is no way to tell, because there is no dereferencing mechanism to find the meaning of the
term. And if you find a way where you can be traced to the namer of the name, unless there
is a document that defines it, how do you know - to ask a Wittgensteinian question - that
you meant the same thing last time you used it as this time?

> 
>> 
>> Here's another example: Imagine you have a rdf store and your URI index breaks.
>> You may have a bunch of things you can say about those URIs such as the
>> domain name and the protocol, but not the full path. There again it would be
>> easiest to describe this with blank nodes.
> I must admit to not understanding the scenario you're describing here. Generally speaking, I'm not very motivated by design decisions based on buggy pieces of software, but likely I'm misunderstanding.

[] uri [ protocol "http";
        domain "my.example" ];
  foaf:knows joe, jack, jim .

so here we have some partial information about a thing because our index broke down. After
some thinking we can slowly work out what the URIs were, and build them back together.


> 
>> 
>> In any otherwise you end up with names that are just complicated blank nodes, and you
>> then have exactly the same problem as blank nodes, except you just end up growning and
>> growing your names as you go along.
> 
> Well, except they don't have the same problems as blank nodes: UUID URIs are stable from one query to the next and can be linked to and referenced across document/database-context.

you gain some places, you loose elsewhere. 
It is also I think for the same reason that relative URLs need to be taken more seriously
in the RDF space.  You need to allow people to say "I am hungry" even if they don't know who
they are. So I am also for the RDF people developing a notion of a Relative URI Graph.

I think a lot of this is a question of construction from minimal bases:

 - bnodes: you know there is something but you can only refer to it by description
 - relative URIs: you can speak of something but you don't yet know the full publication context
 - full URIs: there is agreement between different people one how to resolve the meaning of it.

There are good arguments furthermore in Gareth Evans' "Varieties of Reference" that you cannot
name something if you cannot distinguish it from some other thing that fits the description. 
Something one would need to  look at carefully again. He gives an example of two identical
balls rolling in a container quickly. You can say there are two balls in the container, 
but you cannot name them he argues from a theory of reference that spans the thinking on this subject
for the whole of the 20th century ( it's a 500 page book)  since you would not be able to get 
your name a distinguishing description. And since names are meant to be shared with other people,
you could have something that would look like a name, but not function like one.

So just minting URIs in this theory alone does not a name make.

> 
> 
>> 
>> But this is another http-range-14 like debate I think. I'd better not get involved....
> 
> Your choice; as I said, I don't feel religious on this topic, but I'm trying to understand "the other side" and see what's general feeling versus concrete issues. I'm also trying to learn for my own nefarious purposes: to understand clearly when I'd want to use blank nodes.
> 
> Lee
> 
>> 
>>> Lee
>>> 
>>>>>> Are there other restrictions that would be helpful to have in a Well
>>>>>> Behaved RDF profile, which would simplify our lives as RDF developers,
>>>>>> but still meet the needs of most RDF applications?  For example, what if
>>>>>> anything might be said about non-lean RDF graphs? Should typed literals
>>>>>> be required to be well formed per their type semantics?
>>>>> FWIW, the RDF2 WG may well decide to make graphs with illformed literals inconsistent, which is equivalent to saying that the type semantics is imposed by the simple use of the literal.
>>>> I hope that does not make it impossible to create new literals.
>>>> 
>>>>>> Should the use
>>>>>> of rdf:first and rdf:rest be limited to well-formed, rdf:nil-terminated
>>>>>> lists, i.e., those that can be serialized as Turtle lists “( . . . )”?
>>>>> That makes perfect sense. The 2004 specs suggest this, in fact: they permit applications to treat 'silly' list descriptions as errors.
>>>>> 
>>>>> Pat
>>>>> 
>>>>> 
>>>>>> Etc.  What do others think?
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> David Booth, Ph.D.
>>>>>> http://dbooth.org/
>>>>>> 
>>>>>> Opinions expressed herein are those of the author and do not necessarily
>>>>>> reflect those of his employer.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> ------------------------------------------------------------
>>>>> IHMC                                     (850)434 8903 or (650)494 3973
>>>>> 40 South Alcaniz St.           (850)202 4416   office
>>>>> Pensacola                            (850)202 4440   fax
>>>>> FL 32502                              (850)291 0667   mobile
>>>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> A short message from my sponsors: Vive la France!
>>>> Social Web Architect
>>>> http://bblfish.net/
>>>> 
>> A short message from my sponsors: Vive la France!
>> Social Web Architect
>> http://bblfish.net/
>> 
> 

A short message from my sponsors: Vive la France!
Social Web Architect
http://bblfish.net/

Received on Wednesday, 19 December 2012 17:24:04 UTC