Re: Well Behaved RDF - Taming Blank Nodes, etc. from Steve Harris on 2012-12-20 (semantic-web@w3.org from December 2012)

From: Steve Harris <steve.harris@garlik.com>
Date: Thu, 20 Dec 2012 12:00:19 +0000
To: Henry Story <henry.story@bblfish.net>
Cc: Lee Feigenbaum <lee@thefigtrees.net>, Pat Hayes <phayes@ihmc.us>, David Booth <david@dbooth.org>, semantic-web <semantic-web@w3.org>
Message-Id: <6EC7098D-EE66-4752-891A-0E84B33985AA@garlik.com>
On 2012-12-19, at 19:54, Henry Story wrote:

> 
> On 19 Dec 2012, at 20:14, Steve Harris <steve.harris@garlik.com> wrote:
> 
>> On 19 Dec 2012, at 18:27, Henry Story wrote:
>>> 
>>> On 19 Dec 2012, at 19:10, Steve Harris <steve.harris@garlik.com> wrote:
>>> 
>>>> On 2012-12-19, at 17:50, Henry Story wrote:
>>>>> 
>>>>> On 19 Dec 2012, at 18:43, Steve Harris <steve.harris@garlik.com> wrote:
>>>>> 
>>>>>> On 2012-12-19, at 16:36, Lee Feigenbaum wrote:
>>>>>>>> Henry Story Wrote:
>>>>>>>> In any case otherwise you end up with names that are just complicated blank nodes, and you
>>>>>>>> then have exactly the same problem as blank nodes, except you just end up growning and
>>>>>>>> growing your names as you go along.
>>>>>>> 
>>>>>>> Well, except they don't have the same problems as blank nodes: UUID URIs are stable from one query to the next and can be linked to and referenced across document/database-context.
>>>>>> 
>>>>>> Yes, this is the key problem with bNodes, which means you have to be /really/ careful about how and when you use them.
>>>>> 
>>>>> No, its' the opposite. This is a key problem with UUIDs as I argued in my later mail
>>>>> http://lists.w3.org/Archives/Public/semantic-web/2012Dec/0097.html
>>>> 
>>>> Yes, but I don't buy your arguments.
>>>> 
>>>> You can't "prove" that you "created" some http: URI either, unless the document is signed by an unrevoked key, and that works just as well for any kind of URI.
>>> 
>>> The point is that there is a way one can come to agree what the definition of a term means for http
>>> URIs. You GET it.
>> 
>> As Lee says below, you can GET some UUID-based URIs too.
> 
> yes, and in that case there is a very clear difference between a bnode and a http URI containing
> a UUID. In the case of an http URI the client may be tempted to dereference the http URI in order
> to find its meaning. In the case of bnodes that idea would not be possible.
> 
> In the urn:uuid:... case it is not possible either to dereference it, but then one has another problem….

Yeah, but no-ones forcing anyone to use URNs if they don't want to.

>>> There really is no way to do so for a UUID. If two people dispute the meaning of the term, there is no
>>> way you can come to decide on who was right to use it that way, since either could have come
>>> to mint it. But next, even if you really worked hard on it, how would you know what the meaning
>>> of the term was? 
>>> 
>>> And all of that needs to be put into context of what a machine can do reasonably easily. Whatever
>>> the proof procedure for finding the meaning of a UUID is it's not something that is going to be doable
>>> automatically. It would require expert police officers, inquisitions, highly specialised teams to work
>>> out what is what in there, with access to hardware etc…
>> 
>> I agree with this bit, but I don't think a machine can reasonably easily resolve  a dispute about the meaning of a dereferencable URI, just by dereferencing it, and doing some computation on the result.
> 
> That is what WebID is based on. It is *because* you can dereference an HTTP URI that the proof procedure shown in http://webid.info/spec/ works, that you can find the public key of the user, and that you can the proove that the user is indeed who he claims to be.

"Prove" is a word with mathematical connotations, it's more of a strong indicator IMHO.

Same applies to all signing/crypto procedures, to varying degrees, based on the paranoia of the signer, and the security of their environment, and the resources of the attackers.

But yes, OK.

>> I'd love to be proved wrong though. The signed doc case is reasonably easy - as long as you trust the veracity of the private key (it's all degrees of trust). It's still just a claim though.
> 
> I am not sure how a signed doc is related to the urn:uuid. You could use .onion URLs with Tor or .garlic URL which I think contain a public key in the URL. In that case you have a proof procedure that works without DNS.

If I say

<http://thing.com/something> dc:subject "Turtles" .

In a document signed by me, then that's a much stronger indicator that I believe that to be true (or would like others to believe it…) than anything that can be done with any combination of DNS and bNodes, IMHO.

I don't really see what you think bNodes win here?

>> Noting stops someone (well, some legal and technical issues!) from publishing data from your domain, using "your" URIs, it would be very hard for a machine to tell that someone had done that. It's unlikely in 2012, but far more likely to happen than a UUID clash.
> 
> yes, but say you copied Tim Berners Lees Profile word for word with relative URLs to your space,
> then the URLs would be different. If you copied it over in NTRiples format, keeping the URIs the
> same, then people would derference the document on the w3c web site.
> 
> See the WebID definition doc for a picture of this
> 
> http://dvcs.w3.org/hg/WebID/raw-file/tip/spec/identity-respec.html

Yeah, but it's possible (easy actually) to copy it somewhere on w3.org, then it gets a bit messy.

>>> Don't forget that I am responding to the following:
>>> "UUID URIs are stable from one query to the next and can be linked to and referenced across document/database-context."
>>> 
>>> The name is stable yes, and there are advantages to that, but the meaning is not going to 
>>> be understood, since you have no clear way of telling two divergent meanings apart. So they
>>> are not really as linkable as you think. 
>> 
>> I don't /think/ that's different for any other kind of URI though.
>> 
>>>> Also, you say "If you use a UUID you could accidentally make a UUID that someone else has already used." well, it's either not a UUID (e.g. a bogus implementation) or there's some statistically insignificant chance http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates
>>>> neither of those cases is very relevant.
>>> 
>>> Well in one case you have no chance of making a mistake (bnodes), in the other you have what you think is
>>> a statistically small chance, but you are not taking into account bad faith. Those are not at all the same
>>> thing. It's the difference between a mathematical truth that is necessarily true, and one that is contingent. 
>> 
>> There's a gap between a mathematical definition, and the actions of humans.
>> 
>> If we have a TriG document like:
>> 
>> <A> {
>>  _:x543543df a <Foo> .
>>  ...
>> }
>> …
>> <B> {
>>  _:x543543df a <Bar> .
>>  …
>> }
>> 
>> (suppose it was generated by some buggy process, or a typo, or whatever)
>> 
>> Then those two bNodes become conflated in the dataset.
> 
> No, because bnodes are not merged just like that. There is a renaming that has to go on in a merge process.

Yes they are. There's a resolution of the RDF WG that says bNode labels are scoped to the document, not the graph. I.E. if you use the label in two places in a document, it's the same bNode.

I think it's a questionable decision myself, but propel had use-cases for it.

>> The mathematical definition doesn't enter into it, it's just human error - or malicious - or whatever.
>> 
>> If you only use [], then it can only happen because of typos, or bugs, but it can still happen.
> 
> Keep the RDF stack non buggy. With bnodes you don't make a mistake, with URNs you can, just because someone could have software mistakenly generating urn:uuid: that someone else is. There is no way to verify it,as you would have to look at all the documents in the world to make sure you had not made a mistake. Plus someone can re-use the URNs maliciously. Those errors cannot appear with bnodes.

URNs are a blind alley - you can use UUIDs in many ways.

Those errors can appear with bNodes, as above.

>>>>> Not every thing that looks like a URI really works like one. For example file:///... URIs 
>>>>> usually are not global identifiers, and even though software accepts it, it's just a hack 
>>>>> people use to get around software that forces them into this kind of situation. 
>>>> 
>>>> There are valid uses for file: URIs, but yes, you have to be careful.
>>>> 
>>>>> UUIDs are not a good way to go. They make it look like there is agreement, when in fact
>>>>> conceptually things are just as broken.
>>>> 
>>>> How does that relate to bNodes? Software doesn't [typically :)] have opinions about the appearance of identifiers.
>>> 
>>> The point is that people use things that look like global identifiers, because some software
>>> shoe-horns them into providing things that look like they are identifiers, even though
>>> they don't really function in the right way.  So if you forced people to use URIs instead
>>> of bnodes, they'd end up using URIs that were not global identifiers, but that just looked
>>> like them.
>> 
>> Perhaps.
>> 
>> FWIW I'm not arguing that bNodes should be banned, just that the definition of them is not very useful. I would like to see them as indicators for the processor to replace them by some globally unique identifier - UUIDs is one candidate.
> 
> ok. :-)
> 
> I think a document that summarises some of the different pragmatic uses of bnodes, URLs relative urls still needs to be written. 

Very true.

- Steve
Received on Thursday, 20 December 2012 12:00:52 UTC