Re: genid example from RDF1.1 is bad from henry.story@bblfish.net on 2014-09-24 (semantic-web@w3.org from September 2014)

From: <henry.story@bblfish.net>
Date: Wed, 24 Sep 2014 07:34:49 +0200
To: David Booth <david@dbooth.org>
Cc: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>, Semantic Web <semantic-web@w3.org>
Message-Id: <DD947C73-C641-4AAF-97FB-8EF67619A8E2@bblfish.net>
On 24 Sep 2014, at 03:21, David Booth <david@dbooth.org> wrote:

> On 09/23/2014 04:45 PM, henry.story@bblfish.net wrote:
>> 
>> On 23 Sep 2014, at 21:46, David Booth <david@dbooth.org> wrote:
>> 
>>> Hi Henry,
>>> 
>>> On 09/23/2014 02:59 AM, henry.story@bblfish.net wrote:
>>>> I just noticed the section on using ".well-known" URIs for
>>>> skolemisation in the RDF1.1 spec. This lead to the following
>>>> exactract of a conversation on the Linked Data Protocol mailing
>>>> list. I am 100% against that and believe it should be removed for
>>>> the next version of the RDF spec. I also propose a path to an
>>>> improvement for it.
>>>> 
>>>> On 23 Sep 2014, at 00:40, Pierre-Antoine Champin
>>>> <pierre-antoine.champin@liris.cnrs.fr
>>>> <mailto:pierre-antoine.champin@liris.cnrs.fr>> wrote:
>>>> 
>>>>> Hi Henry,
>>>>> 
>>>>> On Mon, Sep 22, 2014 at 4:06 PM, henry.story@bblfish.net
>>>>> <mailto:henry.story@bblfish.net> <henry.story@bblfish.net
>>>>> <mailto:henry.story@bblfish.net>> wrote:
>>>>> 
>>>>> 
>>>>> I find genids pretty hackish part of the rdf1.1 spec frankly.
>>>>> Genids are recognised apparently by analysing the schema of the
>>>>> URI, which is pretty much against web architecture.
>>>>> http://www.w3.org/TR/rdf11-concepts/#section-skolemization
>>> 
>>> I assume you are referring to the principle of URI opacity: that
>>> one should not make unlicensed assumptions about the nature of a
>>> URI-identified resource based on the syntax of the URI.
>> 
>> yes.
>> 
>>> But RDF genids are based on .wellknown path prefix in conformance
>>> with RFC 5785 http://tools.ietf.org/html/rfc5785 so this use of
>>> ./wellknown/genid as a path prefix to indicate skolemized blank
>>> node URI *is* licensed and does *not* violate the principle of URI
>>> opacity.
>> 
>> 
>> There is nothing in that rfc about this use for non-retrievable
>> genids. Rather RFC5785 contains a lot of text for _retrievable_
>> information metadata space
>> 
>> [[ When this happens, it is common to designate a "well-known
>> location" for such data, so that it can be easily located" ]]
>> 
>> [[ Rather, they are designed to facilitate discovery of information
>> on a site when it isn't practical to use other mechanisms; for
>> example, when discovering policy that needs to be evaluated before a
>> resource is accessed, or when using multiple round-trips is judged
>> detrimental to performance. ]]
>> 
>> Ie, you should be able to de-reference those URIs.
> 
> Good point.
> 
>> It is clear from
>> Pierre Antoine Champin's interpretation of the RDF1.1 spec that
>> genids are not dereferenceable when in .well-known space.
> 
> AFAICT, the RDF 1.1 spec says nothing either way about whether Skolem URIs should be dereferenceable:
> http://www.w3.org/TR/rdf11-concepts/#h3_section-skolemization
> So I think Pierre's interpretation is reflecting his own assumptions and perhaps common experience, but not the intent of the RDF 1.1 spec.

If it is not the intent of the RDF 1.1 spec, then it is incomplete, because it says nothing
about what should be found at such a location, which is a reasonable expectation to have
for something that is dereferenceable. For example how could such a document say: any #URI
Referred to via this document is a bnode ( even if we don't describe it ). 

> 
>> 
>> So my take is that ./well-known URLs are meant to be a place you
>> "look" for information by default when you don't know where else to
>> look.
> 
> I think that was the original intent of RFC5785.  But when the RDF working group wanted to define a way to Skolemize blank nodes, .wellknown was proposed as an existing mechanism that would fit the bill.


yes, it seems like it was done in a hurry, and other options that would have
been more approripriate not considered in enough depth.

Consider that RFC5785 states in the FAQ at the end of the document:

[[
 1. Aren't well-known locations bad for the Web?

      They are, but for various reasons -- both technical and social --
      they are commonly used and their use is increasing.  This memo
      defines a "sandbox" for them, to reduce the risks of collision and
      to minimise the impact upon pre-existing URIs on sites.
]]



> 
>> Eg in the case of the web-finger protocol you start off with an
>> e-mail address and wish to find the home page of that user, so you
>> find the e-mail address domain and look up a file on that domain's
>> .well-known position. Ideally you should always find something
>> there....
>> 
>> http://tools.ietf.org/html/rfc7033
>> 
>> It's really not the right way to go about that: the correct way would
>> have been to query the e-mail server for the e-mail address for that
>> info. But you know: it's not because its an RFC from the IETF that
>> its well thought through. And there are numerous problems with
>> .well-known URLs just by themselves, without this weird an unecessary
>> twist given by the RDF1.1 spec on top of it.
>> 
>>> 
>>>>> 
>>>>> So now every RDF linked data client would need to look at each
>>>>> URI to see if it contains a ".wellknown/genid" string to know
>>>>> if it should follow it or not. That's pretty un
>>>>> linked-data-ish. Frankly I am quite surprised it made its way
>>>>> through to the spec. The people supporting it must have made a
>>>>> lot of noise.
>>>>> 
>>>>> 
>>>>> Not everything is about your particular use case, Henry ;-)
>>>> 
>>>> The arguments I am relying upon, which I will make explicit to
>>>> you below, go way beyond my particular use case, and don't just
>>>> take into account one spec, but the whole ecosystem of the web.
>>>> 
>>>>> 
>>>>> RDF does not equate linked data. It does not mandate URIs to
>>>>> be derefenceable. In that regard, genid URIs are no special
>>>>> case, so they do not need the special treatment that you
>>>>> suggest above. If you try to dereference them, you will get a
>>>>> 404, that's all. It's not ideal in a Linked Data perspective
>>>>> (though not lethal either), but it is perfectly acceptable from
>>>>> the point of view of RDF.
>>>> 
>>>> RDF 1.1 is part of a series of specification, where each
>>>> specification does its job. is specified at the logical layer, so
>>>> all it requires is the concept an IRI. That is the concept of a
>>>> name with a referent. It's not  part of the mandate of RDF to
>>>> specify how IRIs are meant to work.
>>>> 
>>>> But the IRI specs on the other had do have something to say on
>>>> the issues, and so does the overriding habit of use on the web.
>>>> That is that an http, https, ftp, ftps uris refer without #uris
>>>> refer to resources on the web which can be accessed by making an
>>>> HTTP GET on that resource. Minting http URIs with the aim that
>>>> they would return a 404 is just extreemly bad practice.
>>> 
>>> Agreed.
>>> 
>>>> A bit like a web site that had links that lead nowhere. Your web
>>>> site would very soon be placed on the list of abandoned web
>>>> sites, your ranking would fall dramatically in search engines,
>>>> your user experience would be lousy, etc... ( And note that the
>>>> RDF1.1 spec says nothing about this type of user experience
>>>> either, but that does not mean it does not exist ).
>>>> 
>>>> So I don't of course have anything against skolemisation, which
>>>> makes perfect sense, but the example of a skolemisation URI used
>>>> in RDF1.1 is absolutely repulsive, and SHOULD be removed as soon
>>>> as possible.
>>> 
>>> I assume the example you mean at
>>> http://www.w3.org/TR/rdf11-concepts/#h3_section-skolemization is
>>> the URI
>>> http://example.com/.well-known/genid/d26a2d0e98334696f4ad70a677abc1f6
>>> 
>>> 
>>> 
> Are you objecting to that URI because it uses example.com instead of an actual
>>> domain name, and therefore it is not dereferenceable?
>> 
>> No, of course not.
>> 
>>> Avoiding example.com for that reason would seem to me to defeat its
>>> purpose.
>> 
>> yes.
>> 
>>> Or are you objecting to it because it is an http: URI, and you are
>>> assuming that skolemized URIs will not be dereferenceable?
>>> Servers can be set up to make them dereferenceable, but probably
>>> not with as much value as normal URIs in Linked Data.
>> 
>> The way the example has set it up, every genid will have its own
>> information resource URL, which means that clients that don't know
>> about this .well-known convention could end up making endless
>> connections to a server.
>> 
>> If it were at least a #Url such as
>> 
>> http://example.com/.well-known/genid#d26a2d0e98334696f4ad70a677abc1f6
>> 
>> then the
>> 
>> http://example.com/.well-known/genid
>> 
>> resource could at least return a document describing itself as a
>> genid document
>> 
>> <> a genid:Document .
> 
> That's a reasonable idea, though I believe there should be a slash at the end of the path prefix: .well-known/genid/
> So a #URL constructed that way would be:
> http://example.com/.well-known/genid/#d26a2d0e98334696f4ad70a677abc1f6
> 
> AFAICT that *can* be done with RDF genids, even if the example does not show that syntax.  The RDF 1.1 spec only specifies the path prefix; the rest of the URI is up to the generator, and the usual trade-offs between slash and hash URIs would apply.
> 
>> 
>> but in that case the .well-known location would be completely
>> unimportant.
> 
> True, *if* the URI is dereferenceable to such a document *and* the client is online and wishes to spend the time to dereference it.  The big benefit of a .well-known/genid/ URI is that a client can reliably determine, by simple syntactic inspection of the URI, that the URI represents a Skolemized blank node.  This would be very important to tools that automatically perform Skolemization or de-Skolemization.

yes, it is this interest in skolemization/de-skolemization that is the important use case.
And it is also why the "dereferenceability" of bnode URIs is a minor minor minor use case.
It is also why it was so important to have a syntactic method to determine if a URI is 
a skolemized bnode, for otherwise a dereferenceable document that described itself as a 
repository of bnodes would have been 

 - the easy solution
 - not required registering a path in the .well-known space, 
 - been useable by people that don't have access to the .well-known uri subdirectory
 - not been a patch up solution as officially explained by the RFC5785
 - struck intelligent people like Pierre-Antoine Champin  as the obvious solution instead 
   of them declaring on a Linked Data mailing list that the default 404 was the obvious interpretation. 
 - ...

> 
>> 
>> As a result of this broken "convention" that breaks web architecture,
> 
> As I explained, it does *not* violate the web architectural principle of URI opacity.
> 
>> all clients would now  have to make a test for every URL they want to
>> dereference to see if it does not start  with .well-known/genid !!!
>> This is unacceptable.
> 
> I think of it the other way around: .well-known/genid/ enables clients to *avoid* network accesses if they don't care about Skolem URIs.
> 
> I agree that minting lots of 404 URI is bad: it is friendlier to make the URIs dereferenceable to useful information.  So maybe the question is: how useful should the returned information be, to justify the use of an http: URI instead of a URN?  If a generic document were returned like the one you showed above, or a human-oriented document were returned that pointed to the relevant section in the RDF 1.1 spec, would that be considered useful enough to justify the use of a dereferenceable http: URI?
> 
>> 
>> As I pointed out the correct way to deal with this is to create a
>> bnode URN. No client would try to dereference it, so it would work
>> correctly out of the box.
> 
> Okay if you never want those URIs to be dereferenceable.  But what if you want them to be dereferenceable, as cool URIs should be?
> 
>> 
>>> 
>>>> 
>>>> Instead they should choose a URN that does this or create a bnode
>>>> URN type such as bnode:{domain}:{path}:{etag}:{identifier}
>>>> 
>>>> where it is explicit that  this URN cannot be dereferenced
>>> 
>>> That might be better for skolem URIs that are not intended to be
>>> dereferenceable.
>> 
>> it is not just better it is perfect for that situation. (well a
>> speced out and thought over version of it would be perfect ).
>> 
>>> But what if a decision is made later to make them dereferenceable?
>> 
>> That's the clever thing about these skolem URIs I proposed. You can
>> find the original document they are linked to,  by analysing the
>> skolem URI.
>> 
>> So imagine a client somehow has
>> <bnodes:example.com/profile:etag1:bn20343>, then it can look up the
>> document <https://example.com/profile> to find if the bnode has been
>> given a dereferenceable URI. That document could contain a statement
>> of the form
>> 
>> <bnodes:example.com/profile:etag1:bn20343> owl:sameAs
>> <https://example.com/joe#me> .
>> 
>> to link the (secure) bnode to a dereferenceable URI.
> 
> Oh, I see.  Sorry I didn't pick up on that aspect before.

:-)

> 
> So you are proposing the use of a URN prefix, which is not normally dereferenceable, but *may* be usefully dereferenceable by a well-defined process of syntactic URI manipulation.  In contrast, the RDF WG adopted the use of an http: prefix, which is normally dereferenceable, but may *not* be usefully dereferenceable, as determined by a well-defined process of syntactic URI inspection.  In essence, you are proposing the opposite default from what the RDF WG adopted: instead of requiring the client to recognize the URI pattern to know that the URI is a Skolemized blank node and might *not* be dereferenceable, the client would be required to recognize the URN pattern to know that the (modified) URN *might* be dereferenceable.
> 
> That's an interesting proposal.  I guess if the vast majority of Skolem URIs are not intended to be dereferenceable then that may be an improvment.

yes, and I think it is pretty clear that they are not meant for that. The key use case
that I have heard of were:

  1. to be able to easily patch a document 
     strong bnodes makes it exceedingly easy to support a PATCH format 
     eg: http://lists.w3.org/Archives/Public/public-ldp-wg/2014Sep/0034.html

  2. to make it easy to do queries from a particular bnode.
     eg: I explored a graph with a sparql query and I receive a bnode, but I'd like
     to ask another query about that particular bnode

> 
>> 
>>> It would be bad to have to change them all.  It seems to me that a
>>> better balance would be to make them http: URIs, but configure
>>> servers to return a generic message each time any
>>> .well-known/genid/ URI is dereferenced, pointing to the above
>>> section of the RDF specs.
>> 
>> Certainly better than a 404, but then you still have the issue that
>> you are creating a dereferenceable  URI for something that most
>> likely you don't want to be dereferenced, messing up the rest of the
>> expectations on the Web. Furthermore if it is dereferenceable with a
>> description of it as a bnode URL, then the .well-known URL is
>> unnecessary.
>> 
>> Again I think would be better to have a URN with a method to find a
>> good dereferenceable version of that URN. But at the very least a
>> dereferenceable bnode URL should describe itself as being such a
>> thing, and then we can avoid the .well-known answer.
>> 
>>> OTOH, a client seeing an http: .well-known/genid URI could also
>>> have different expectations about whether such URIs are likely to
>>> be dereferenceable.
>> 
>> That is what I don't want people to have to add to their Linked Data
>> code. It is unnecessary. There are better answers that don't break
>> expectations of good behavior of URLs.
> 
> But it does seem that you are trading that for requiring the client to recognize a special URN pattern instead, right?

All linked data clients need to look at the protocol part of a URI before deciding to dereference them.
All existing code that does this just need to look at the first few characters of the string of the URI.
All that code will work fine.

Code that needs to deal with Bnodes will need to have special functionality, and so it is up to 
it to need to learn about the to be developed bnode URN. All other code will treat URNs and BNodes
pretty much equivalently - which is exactly the feature looked for.


> 
> David
> 
>> 
>> 
>> 
>> Henry
>> 
>>> 
>>> David
>> 
>> Social Web Architect http://bblfish.net/

Social Web Architect
http://bblfish.net/
Received on Wednesday, 24 September 2014 05:35:20 UTC