Re: Well Behaved RDF - Taming Blank Nodes, etc.

On 19 Dec 2012, at 20:14, Steve Harris <steve.harris@garlik.com> wrote:

> On 19 Dec 2012, at 18:27, Henry Story wrote:
>> 
>> On 19 Dec 2012, at 19:10, Steve Harris <steve.harris@garlik.com> wrote:
>> 
>>> On 2012-12-19, at 17:50, Henry Story wrote:
>>>> 
>>>> On 19 Dec 2012, at 18:43, Steve Harris <steve.harris@garlik.com> wrote:
>>>> 
>>>>> On 2012-12-19, at 16:36, Lee Feigenbaum wrote:
>>>>>>> Henry Story Wrote:
>>>>>>> In any case otherwise you end up with names that are just complicated blank nodes, and you
>>>>>>> then have exactly the same problem as blank nodes, except you just end up growning and
>>>>>>> growing your names as you go along.
>>>>>> 
>>>>>> Well, except they don't have the same problems as blank nodes: UUID URIs are stable from one query to the next and can be linked to and referenced across document/database-context.
>>>>> 
>>>>> Yes, this is the key problem with bNodes, which means you have to be /really/ careful about how and when you use them.
>>>> 
>>>> No, its' the opposite. This is a key problem with UUIDs as I argued in my later mail
>>>> http://lists.w3.org/Archives/Public/semantic-web/2012Dec/0097.html
>>> 
>>> Yes, but I don't buy your arguments.
>>> 
>>> You can't "prove" that you "created" some http: URI either, unless the document is signed by an unrevoked key, and that works just as well for any kind of URI.
>> 
>> The point is that there is a way one can come to agree what the definition of a term means for http
>> URIs. You GET it.
> 
> As Lee says below, you can GET some UUID-based URIs too.

yes, and in that case there is a very clear difference between a bnode and a http URI containing
a UUID. In the case of an http URI the client may be tempted to dereference the http URI in order
to find its meaning. In the case of bnodes that idea would not be possible.

In the urn:uuid:... case it is not possible either to dereference it, but then one has another problem....

> 
>> There really is no way to do so for a UUID. If two people dispute the meaning of the term, there is no
>> way you can come to decide on who was right to use it that way, since either could have come
>> to mint it. But next, even if you really worked hard on it, how would you know what the meaning
>> of the term was? 
>> 
>> And all of that needs to be put into context of what a machine can do reasonably easily. Whatever
>> the proof procedure for finding the meaning of a UUID is it's not something that is going to be doable
>> automatically. It would require expert police officers, inquisitions, highly specialised teams to work
>> out what is what in there, with access to hardware etc…
> 
> I agree with this bit, but I don't think a machine can reasonably easily resolve  a dispute about the meaning of a dereferencable URI, just by dereferencing it, and doing some computation on the result.

That is what WebID is based on. It is *because* you can dereference an HTTP URI that the proof procedure shown in http://webid.info/spec/ works, that you can find the public key of the user, and that you can the proove that the user is indeed who he claims to be.

> I'd love to be proved wrong though. The signed doc case is reasonably easy - as long as you trust the veracity of the private key (it's all degrees of trust). It's still just a claim though.

I am not sure how a signed doc is related to the urn:uuid. You could use .onion URLs with Tor or .garlic URL which I think contain a public key in the URL. In that case you have a proof procedure that works without DNS.

> 
> Noting stops someone (well, some legal and technical issues!) from publishing data from your domain, using "your" URIs, it would be very hard for a machine to tell that someone had done that. It's unlikely in 2012, but far more likely to happen than a UUID clash.

yes, but say you copied Tim Berners Lees Profile word for word with relative URLs to your space,
then the URLs would be different. If you copied it over in NTRiples format, keeping the URIs the
same, then people would derference the document on the w3c web site.

See the WebID definition doc for a picture of this

http://dvcs.w3.org/hg/WebID/raw-file/tip/spec/identity-respec.html

> 
>> Don't forget that I am responding to the following:
>> "UUID URIs are stable from one query to the next and can be linked to and referenced across document/database-context."
>> 
>> The name is stable yes, and there are advantages to that, but the meaning is not going to 
>> be understood, since you have no clear way of telling two divergent meanings apart. So they
>> are not really as linkable as you think. 
> 
> I don't /think/ that's different for any other kind of URI though.
> 
>>> Also, you say "If you use a UUID you could accidentally make a UUID that someone else has already used." well, it's either not a UUID (e.g. a bogus implementation) or there's some statistically insignificant chance http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates
>>> neither of those cases is very relevant.
>> 
>> Well in one case you have no chance of making a mistake (bnodes), in the other you have what you think is
>> a statistically small chance, but you are not taking into account bad faith. Those are not at all the same
>> thing. It's the difference between a mathematical truth that is necessarily true, and one that is contingent. 
> 
> There's a gap between a mathematical definition, and the actions of humans.
> 
> If we have a TriG document like:
> 
> <A> {
>   _:x543543df a <Foo> .
>   ...
> }
> …
> <B> {
>   _:x543543df a <Bar> .
>   …
> }
> 
> (suppose it was generated by some buggy process, or a typo, or whatever)
> 
> Then those two bNodes become conflated in the dataset.

No, because bnodes are not merged just like that. There is a renaming that has to go on in a merge process.

> The mathematical definition doesn't enter into it, it's just human error - or malicious - or whatever.
> 
> If you only use [], then it can only happen because of typos, or bugs, but it can still happen.

Keep the RDF stack non buggy. With bnodes you don't make a mistake, with URNs you can, just because someone could have software mistakenly generating urn:uuid: that someone else is. There is no way to verify it,as you would have to look at all the documents in the world to make sure you had not made a mistake. Plus someone can re-use the URNs maliciously. Those errors cannot appear with bnodes.

> 
>>>> Not every thing that looks like a URI really works like one. For example file:///... URIs 
>>>> usually are not global identifiers, and even though software accepts it, it's just a hack 
>>>> people use to get around software that forces them into this kind of situation. 
>>> 
>>> There are valid uses for file: URIs, but yes, you have to be careful.
>>> 
>>>> UUIDs are not a good way to go. They make it look like there is agreement, when in fact
>>>> conceptually things are just as broken.
>>> 
>>> How does that relate to bNodes? Software doesn't [typically :)] have opinions about the appearance of identifiers.
>> 
>> The point is that people use things that look like global identifiers, because some software
>> shoe-horns them into providing things that look like they are identifiers, even though
>> they don't really function in the right way.  So if you forced people to use URIs instead
>> of bnodes, they'd end up using URIs that were not global identifiers, but that just looked
>> like them.
> 
> Perhaps.
> 
> FWIW I'm not arguing that bNodes should be banned, just that the definition of them is not very useful. I would like to see them as indicators for the processor to replace them by some globally unique identifier - UUIDs is one candidate.

ok. :-)

I think a document that summarises some of the different pragmatic uses of bnodes, URLs relative urls still needs to be written. 


> 
> - Steve

A short message from my sponsors: Vive la France!
Social Web Architect
http://bblfish.net/

Received on Wednesday, 19 December 2012 19:55:39 UTC