Re: genid example from RDF1.1 is bad from David Booth on 2014-09-24 (semantic-web@w3.org from September 2014)

From: David Booth <david@dbooth.org>
Date: Tue, 23 Sep 2014 21:21:24 -0400
To: "henry.story@bblfish.net" <henry.story@bblfish.net>
CC: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>, Semantic Web <semantic-web@w3.org>
Message-ID: <54221C94.9070109@dbooth.org>
On 09/23/2014 04:45 PM, henry.story@bblfish.net wrote:
>
> On 23 Sep 2014, at 21:46, David Booth <david@dbooth.org> wrote:
>
>> Hi Henry,
>>
>> On 09/23/2014 02:59 AM, henry.story@bblfish.net wrote:
>>> I just noticed the section on using ".well-known" URIs for
>>> skolemisation in the RDF1.1 spec. This lead to the following
>>> exactract of a conversation on the Linked Data Protocol mailing
>>> list. I am 100% against that and believe it should be removed for
>>> the next version of the RDF spec. I also propose a path to an
>>> improvement for it.
>>>
>>> On 23 Sep 2014, at 00:40, Pierre-Antoine Champin
>>> <pierre-antoine.champin@liris.cnrs.fr
>>> <mailto:pierre-antoine.champin@liris.cnrs.fr>> wrote:
>>>
>>>> Hi Henry,
>>>>
>>>> On Mon, Sep 22, 2014 at 4:06 PM, henry.story@bblfish.net
>>>> <mailto:henry.story@bblfish.net> <henry.story@bblfish.net
>>>> <mailto:henry.story@bblfish.net>> wrote:
>>>>
>>>>
>>>> I find genids pretty hackish part of the rdf1.1 spec frankly.
>>>> Genids are recognised apparently by analysing the schema of the
>>>> URI, which is pretty much against web architecture.
>>>> http://www.w3.org/TR/rdf11-concepts/#section-skolemization
>>
>> I assume you are referring to the principle of URI opacity: that
>> one should not make unlicensed assumptions about the nature of a
>> URI-identified resource based on the syntax of the URI.
>
> yes.
>
>> But RDF genids are based on .wellknown path prefix in conformance
>> with RFC 5785 http://tools.ietf.org/html/rfc5785 so this use of
>> ./wellknown/genid as a path prefix to indicate skolemized blank
>> node URI *is* licensed and does *not* violate the principle of URI
>> opacity.
>
>
> There is nothing in that rfc about this use for non-retrievable
> genids. Rather RFC5785 contains a lot of text for _retrievable_
> information metadata space
>
> [[ When this happens, it is common to designate a "well-known
> location" for such data, so that it can be easily located" ]]
>
> [[ Rather, they are designed to facilitate discovery of information
> on a site when it isn't practical to use other mechanisms; for
> example, when discovering policy that needs to be evaluated before a
> resource is accessed, or when using multiple round-trips is judged
> detrimental to performance. ]]
>
> Ie, you should be able to de-reference those URIs.

Good point.

> It is clear from
> Pierre Antoine Champin's interpretation of the RDF1.1 spec that
> genids are not dereferenceable when in .well-known space.

AFAICT, the RDF 1.1 spec says nothing either way about whether Skolem 
URIs should be dereferenceable:
http://www.w3.org/TR/rdf11-concepts/#h3_section-skolemization
So I think Pierre's interpretation is reflecting his own assumptions and 
perhaps common experience, but not the intent of the RDF 1.1 spec.

>
> So my take is that ./well-known URLs are meant to be a place you
> "look" for information by default when you don't know where else to
> look.

I think that was the original intent of RFC5785.  But when the RDF 
working group wanted to define a way to Skolemize blank nodes, 
.wellknown was proposed as an existing mechanism that would fit the bill.

> Eg in the case of the web-finger protocol you start off with an
> e-mail address and wish to find the home page of that user, so you
> find the e-mail address domain and look up a file on that domain's
> .well-known position. Ideally you should always find something
> there....
>
> http://tools.ietf.org/html/rfc7033
>
> It's really not the right way to go about that: the correct way would
> have been to query the e-mail server for the e-mail address for that
> info. But you know: it's not because its an RFC from the IETF that
> its well thought through. And there are numerous problems with
> .well-known URLs just by themselves, without this weird an unecessary
> twist given by the RDF1.1 spec on top of it.
>
>>
>>>>
>>>> So now every RDF linked data client would need to look at each
>>>> URI to see if it contains a ".wellknown/genid" string to know
>>>> if it should follow it or not. That's pretty un
>>>> linked-data-ish. Frankly I am quite surprised it made its way
>>>> through to the spec. The people supporting it must have made a
>>>> lot of noise.
>>>>
>>>>
>>>> Not everything is about your particular use case, Henry ;-)
>>>
>>> The arguments I am relying upon, which I will make explicit to
>>> you below, go way beyond my particular use case, and don't just
>>> take into account one spec, but the whole ecosystem of the web.
>>>
>>>>
>>>> RDF does not equate linked data. It does not mandate URIs to
>>>> be derefenceable. In that regard, genid URIs are no special
>>>> case, so they do not need the special treatment that you
>>>> suggest above. If you try to dereference them, you will get a
>>>> 404, that's all. It's not ideal in a Linked Data perspective
>>>> (though not lethal either), but it is perfectly acceptable from
>>>> the point of view of RDF.
>>>
>>> RDF 1.1 is part of a series of specification, where each
>>> specification does its job. is specified at the logical layer, so
>>> all it requires is the concept an IRI. That is the concept of a
>>> name with a referent. It's not  part of the mandate of RDF to
>>> specify how IRIs are meant to work.
>>>
>>> But the IRI specs on the other had do have something to say on
>>> the issues, and so does the overriding habit of use on the web.
>>> That is that an http, https, ftp, ftps uris refer without #uris
>>> refer to resources on the web which can be accessed by making an
>>> HTTP GET on that resource. Minting http URIs with the aim that
>>> they would return a 404 is just extreemly bad practice.
>>
>> Agreed.
>>
>>> A bit like a web site that had links that lead nowhere. Your web
>>> site would very soon be placed on the list of abandoned web
>>> sites, your ranking would fall dramatically in search engines,
>>> your user experience would be lousy, etc... ( And note that the
>>> RDF1.1 spec says nothing about this type of user experience
>>> either, but that does not mean it does not exist ).
>>>
>>> So I don't of course have anything against skolemisation, which
>>> makes perfect sense, but the example of a skolemisation URI used
>>> in RDF1.1 is absolutely repulsive, and SHOULD be removed as soon
>>> as possible.
>>
>> I assume the example you mean at
>> http://www.w3.org/TR/rdf11-concepts/#h3_section-skolemization is
>> the URI
>> http://example.com/.well-known/genid/d26a2d0e98334696f4ad70a677abc1f6
>>
>>
>>
Are you objecting to that URI because it uses example.com instead of an 
actual
>> domain name, and therefore it is not dereferenceable?
>
> No, of course not.
>
>> Avoiding example.com for that reason would seem to me to defeat its
>> purpose.
>
> yes.
>
>> Or are you objecting to it because it is an http: URI, and you are
>> assuming that skolemized URIs will not be dereferenceable?
>> Servers can be set up to make them dereferenceable, but probably
>> not with as much value as normal URIs in Linked Data.
>
> The way the example has set it up, every genid will have its own
> information resource URL, which means that clients that don't know
> about this .well-known convention could end up making endless
> connections to a server.
>
> If it were at least a #Url such as
>
> http://example.com/.well-known/genid#d26a2d0e98334696f4ad70a677abc1f6
>
>  then the
>
> http://example.com/.well-known/genid
>
> resource could at least return a document describing itself as a
> genid document
>
> <> a genid:Document .

That's a reasonable idea, though I believe there should be a slash at 
the end of the path prefix: .well-known/genid/
So a #URL constructed that way would be:
http://example.com/.well-known/genid/#d26a2d0e98334696f4ad70a677abc1f6

AFAICT that *can* be done with RDF genids, even if the example does not 
show that syntax.  The RDF 1.1 spec only specifies the path prefix; the 
rest of the URI is up to the generator, and the usual trade-offs between 
slash and hash URIs would apply.

>
> but in that case the .well-known location would be completely
> unimportant.

True, *if* the URI is dereferenceable to such a document *and* the 
client is online and wishes to spend the time to dereference it.  The 
big benefit of a .well-known/genid/ URI is that a client can reliably 
determine, by simple syntactic inspection of the URI, that the URI 
represents a Skolemized blank node.  This would be very important to 
tools that automatically perform Skolemization or de-Skolemization.

>
> As a result of this broken "convention" that breaks web architecture,

As I explained, it does *not* violate the web architectural principle of 
URI opacity.

> all clients would now  have to make a test for every URL they want to
> dereference to see if it does not start  with .well-known/genid !!!
> This is unacceptable.

I think of it the other way around: .well-known/genid/ enables clients 
to *avoid* network accesses if they don't care about Skolem URIs.

I agree that minting lots of 404 URI is bad: it is friendlier to make 
the URIs dereferenceable to useful information.  So maybe the question 
is: how useful should the returned information be, to justify the use of 
an http: URI instead of a URN?  If a generic document were returned like 
the one you showed above, or a human-oriented document were returned 
that pointed to the relevant section in the RDF 1.1 spec, would that be 
considered useful enough to justify the use of a dereferenceable http: URI?

>
> As I pointed out the correct way to deal with this is to create a
> bnode URN. No client would try to dereference it, so it would work
> correctly out of the box.

Okay if you never want those URIs to be dereferenceable.  But what if 
you want them to be dereferenceable, as cool URIs should be?

>
>>
>>>
>>> Instead they should choose a URN that does this or create a bnode
>>> URN type such as bnode:{domain}:{path}:{etag}:{identifier}
>>>
>>> where it is explicit that  this URN cannot be dereferenced
>>
>> That might be better for skolem URIs that are not intended to be
>> dereferenceable.
>
> it is not just better it is perfect for that situation. (well a
> speced out and thought over version of it would be perfect ).
>
>> But what if a decision is made later to make them dereferenceable?
>
> That's the clever thing about these skolem URIs I proposed. You can
> find the original document they are linked to,  by analysing the
> skolem URI.
>
> So imagine a client somehow has
> <bnodes:example.com/profile:etag1:bn20343>, then it can look up the
> document <https://example.com/profile> to find if the bnode has been
> given a dereferenceable URI. That document could contain a statement
> of the form
>
> <bnodes:example.com/profile:etag1:bn20343> owl:sameAs
> <https://example.com/joe#me> .
>
> to link the (secure) bnode to a dereferenceable URI.

Oh, I see.  Sorry I didn't pick up on that aspect before.

So you are proposing the use of a URN prefix, which is not normally 
dereferenceable, but *may* be usefully dereferenceable by a well-defined 
process of syntactic URI manipulation.  In contrast, the RDF WG adopted 
the use of an http: prefix, which is normally dereferenceable, but may 
*not* be usefully dereferenceable, as determined by a well-defined 
process of syntactic URI inspection.  In essence, you are proposing the 
opposite default from what the RDF WG adopted: instead of requiring the 
client to recognize the URI pattern to know that the URI is a Skolemized 
blank node and might *not* be dereferenceable, the client would be 
required to recognize the URN pattern to know that the (modified) URN 
*might* be dereferenceable.

That's an interesting proposal.  I guess if the vast majority of Skolem 
URIs are not intended to be dereferenceable then that may be an improvment.

>
>> It would be bad to have to change them all.  It seems to me that a
>> better balance would be to make them http: URIs, but configure
>> servers to return a generic message each time any
>> .well-known/genid/ URI is dereferenced, pointing to the above
>> section of the RDF specs.
>
> Certainly better than a 404, but then you still have the issue that
> you are creating a dereferenceable  URI for something that most
> likely you don't want to be dereferenced, messing up the rest of the
> expectations on the Web. Furthermore if it is dereferenceable with a
> description of it as a bnode URL, then the .well-known URL is
> unnecessary.
>
> Again I think would be better to have a URN with a method to find a
> good dereferenceable version of that URN. But at the very least a
> dereferenceable bnode URL should describe itself as being such a
> thing, and then we can avoid the .well-known answer.
>
>> OTOH, a client seeing an http: .well-known/genid URI could also
>> have different expectations about whether such URIs are likely to
>> be dereferenceable.
>
> That is what I don't want people to have to add to their Linked Data
> code. It is unnecessary. There are better answers that don't break
> expectations of good behavior of URLs.

But it does seem that you are trading that for requiring the client to 
recognize a special URN pattern instead, right?

David

>
>
>
> Henry
>
>>
>> David
>
> Social Web Architect http://bblfish.net/
>
>
>
>
Received on Wednesday, 24 September 2014 01:21:53 UTC