Re: genid example from RDF1.1 is bad

On 23 Sep 2014, at 21:46, David Booth <david@dbooth.org> wrote:

> Hi Henry,
> 
> On 09/23/2014 02:59 AM, henry.story@bblfish.net wrote:
>> I just noticed the section on using ".well-known" URIs for skolemisation
>> in the RDF1.1 spec.
>> This lead to the following exactract of a conversation on the Linked
>> Data Protocol mailing list.
>> I am 100% against that and believe it should be removed for the next
>> version of the RDF spec.
>> I also propose a path to an improvement for it.
>> 
>> On 23 Sep 2014, at 00:40, Pierre-Antoine Champin
>> <pierre-antoine.champin@liris.cnrs.fr
>> <mailto:pierre-antoine.champin@liris.cnrs.fr>> wrote:
>> 
>>> Hi Henry,
>>> 
>>> On Mon, Sep 22, 2014 at 4:06 PM, henry.story@bblfish.net
>>> <mailto:henry.story@bblfish.net> <henry.story@bblfish.net
>>> <mailto:henry.story@bblfish.net>> wrote:
>>> 
>>> 
>>>    I find genids pretty hackish part of the rdf1.1 spec frankly.
>>>    Genids are recognised apparently by analysing the schema
>>>    of the URI, which is pretty much against web architecture.
>>>    http://www.w3.org/TR/rdf11-concepts/#section-skolemization
> 
> I assume you are referring to the principle of URI opacity: that one should not make unlicensed assumptions about the nature of a URI-identified resource based on the syntax of the URI.

yes.

> But RDF genids are based on .wellknown path prefix in conformance with RFC 5785
> http://tools.ietf.org/html/rfc5785
> so this use of ./wellknown/genid as a path prefix to indicate skolemized blank node URI *is* licensed and does *not* violate the principle of URI opacity.


There is nothing in that rfc about this use for non-retrievable genids.
Rather RFC5785 contains a lot of text for _retrievable_ information 
metadata space

[[
   When this happens, it is common to designate a "well-known location"
   for such data, so that it can be easily located"
]]

[[
   Rather, they are designed to facilitate
   discovery of information on a site when it isn't practical to use
   other mechanisms; for example, when discovering policy that needs to
   be evaluated before a resource is accessed, or when using multiple
   round-trips is judged detrimental to performance.
]]

Ie, you should be able to de-reference those URIs. It is clear from Pierre
Antoine Champin's interpretation of the RDF1.1 spec that genids are not 
dereferenceable when in .well-known space.

So my take is that ./well-known URLs are meant to be a place you "look" for information by 
default when you don't know where else to look. Eg in the case of the web-finger
protocol you start off with an e-mail address and wish to find the home page of
that user, so you find the e-mail address domain and look up a file on that domain's
.well-known position. Ideally you should always find something there....

   http://tools.ietf.org/html/rfc7033

It's really not the right way to go about that: the correct way would have been to query
the e-mail server for the e-mail address for that info. But you know: it's not because its
an RFC from the IETF that its well thought through. And there are numerous problems with
.well-known URLs just by themselves, without this weird an unecessary twist given by 
the RDF1.1 spec on top of it.

> 
>>> 
>>>    So now every RDF linked data client would need to look at each URI
>>>    to see if it contains a ".wellknown/genid" string to know if it
>>>    should follow it
>>>    or not. That's pretty un linked-data-ish. Frankly I am quite
>>>    surprised it made its way through to the spec. The people
>>>    supporting it
>>>    must have made a lot of noise.
>>> 
>>> 
>>> Not everything is about your particular use case, Henry ;-)
>> 
>> The arguments I am relying upon, which I will make explicit to you
>> below, go way beyond my particular use case,
>> and don't just take into account one spec, but the whole ecosystem of
>> the web.
>> 
>>> 
>>> RDF does not equate linked data. It does not mandate URIs to be
>>> derefenceable. In that regard, genid URIs are no special case, so they
>>> do not need the special treatment that you suggest above. If you try
>>> to dereference them, you will get a 404, that's all. It's not ideal in
>>> a Linked Data perspective (though not lethal either), but it is
>>> perfectly acceptable from the point of view of RDF.
>> 
>> RDF 1.1 is part of a series of specification, where each specification
>> does its job. is specified at the logical layer, so all it requires
>> is the concept an IRI. That is the concept of a name with a referent.
>> It's not  part of the mandate of RDF to specify how IRIs are meant to work.
>> 
>> But the IRI specs on the other had do have something to say on the
>> issues, and so does the overriding habit of use on the web. That is
>> that an http, https, ftp, ftps uris refer without #uris refer to
>> resources on the web which can be accessed by making an HTTP GET on that
>> resource. Minting http URIs with the aim that they would return a 404 is
>> just extreemly bad practice.
> 
> Agreed.
> 
>> A bit like a web site that had links that
>> lead nowhere. Your web site would very soon be placed on the list of
>> abandoned web sites, your ranking would fall dramatically in
>> search engines, your user experience would be lousy, etc... ( And note
>> that the RDF1.1 spec says nothing about this type of user experience
>> either, but that does not mean it does not exist ).
>> 
>> So I don't of course have anything against skolemisation, which makes
>> perfect sense, but the example of a skolemisation URI
>> used in RDF1.1 is absolutely repulsive, and SHOULD be removed as soon as
>> possible.
> 
> I assume the example you mean at
> http://www.w3.org/TR/rdf11-concepts/#h3_section-skolemization
> is the URI
> http://example.com/.well-known/genid/d26a2d0e98334696f4ad70a677abc1f6
> 
> Are you objecting to that URI because it uses example.com instead of an actual
> domain name, and therefore it is not dereferenceable?  

No, of course not.

> Avoiding example.com for that reason would seem to me to defeat its purpose.  

yes.

> Or are you objecting to it because it is an http: URI, and you are assuming that skolemized
> URIs will not be dereferenceable?   Servers can be set up to make them dereferenceable, but
> probably not with as much value as normal URIs in Linked Data.

The way the example has set it up, every genid will have its own information 
resource URL, which means that clients that don't know about this .well-known 
convention could end up making endless connections to a server.

If it were at least a #Url such as

  http://example.com/.well-known/genid#d26a2d0e98334696f4ad70a677abc1f6

then the 

  http://example.com/.well-known/genid

resource could at least return a document describing itself as a genid document

  <> a genid:Document .

but in that case the .well-known location would be completely unimportant.

As a result of this broken "convention" that breaks web architecture, all clients would
now  have to make a test for every URL they want to dereference to see 
if it does not start  with .well-known/genid !!! This is unacceptable.

As I pointed out the correct way to deal with this is to create a bnode URN. No client
would try to dereference it, so it would work correctly out of the box.

> 
>> 
>> Instead they should choose a URN that does this or create a bnode URN
>> type such as
>>   bnode:{domain}:{path}:{etag}:{identifier}
>> 
>> where it is explicit that  this URN cannot be dereferenced
> 
> That might be better for skolem URIs that are not intended to be dereferenceable.

it is not just better it is perfect for that situation. (well a speced out
and thought over version of it would be perfect ).

>  But what if a decision is made later to make them dereferenceable?

That's the clever thing about these skolem URIs I proposed. You can find the original 
document they are linked to,  by analysing the skolem URI.

So imagine a client somehow has <bnodes:example.com/profile:etag1:bn20343>, then
it can look up the document <https://example.com/profile> to find if the bnode has
been given a dereferenceable URI. That document could contain a statement of the form

   <bnodes:example.com/profile:etag1:bn20343> owl:sameAs <https://example.com/joe#me> .

to link the (secure) bnode to a dereferenceable URI.

>  It would be bad to have to change them all.  It seems to me that a better balance
> would be to make them http: URIs, but configure servers to return a generic message
> each time any .well-known/genid/ URI is dereferenced, pointing to the above section
> of the RDF specs.  

Certainly better than a 404, but then you still have the issue that you are creating a 
dereferenceable  URI for something that most likely you don't want to be dereferenced, messing 
up the rest of the expectations on the Web. Furthermore if it is dereferenceable 
with a description of it as a bnode URL, then the .well-known URL is unnecessary. 

Again I think would be better to have a URN with a method to find a good dereferenceable 
version of that URN. But at the very least a dereferenceable bnode URL should describe
itself as being such a thing, and then we can avoid the .well-known answer.

> OTOH, a client seeing an http: .well-known/genid URI could also have different 
> expectations about whether such URIs are likely to be dereferenceable.

That is what I don't want people to have to add to their Linked Data code. It is 
unnecessary. There are better answers that don't break expectations of good behavior 
of URLs.



Henry

> 
> David

Social Web Architect
http://bblfish.net/

Received on Tuesday, 23 September 2014 20:46:21 UTC