- From: David Booth <david@dbooth.org>
- Date: Tue, 23 Sep 2014 21:21:24 -0400
- To: "henry.story@bblfish.net" <henry.story@bblfish.net>
- CC: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>, Semantic Web <semantic-web@w3.org>
On 09/23/2014 04:45 PM, henry.story@bblfish.net wrote: > > On 23 Sep 2014, at 21:46, David Booth <david@dbooth.org> wrote: > >> Hi Henry, >> >> On 09/23/2014 02:59 AM, henry.story@bblfish.net wrote: >>> I just noticed the section on using ".well-known" URIs for >>> skolemisation in the RDF1.1 spec. This lead to the following >>> exactract of a conversation on the Linked Data Protocol mailing >>> list. I am 100% against that and believe it should be removed for >>> the next version of the RDF spec. I also propose a path to an >>> improvement for it. >>> >>> On 23 Sep 2014, at 00:40, Pierre-Antoine Champin >>> <pierre-antoine.champin@liris.cnrs.fr >>> <mailto:pierre-antoine.champin@liris.cnrs.fr>> wrote: >>> >>>> Hi Henry, >>>> >>>> On Mon, Sep 22, 2014 at 4:06 PM, henry.story@bblfish.net >>>> <mailto:henry.story@bblfish.net> <henry.story@bblfish.net >>>> <mailto:henry.story@bblfish.net>> wrote: >>>> >>>> >>>> I find genids pretty hackish part of the rdf1.1 spec frankly. >>>> Genids are recognised apparently by analysing the schema of the >>>> URI, which is pretty much against web architecture. >>>> http://www.w3.org/TR/rdf11-concepts/#section-skolemization >> >> I assume you are referring to the principle of URI opacity: that >> one should not make unlicensed assumptions about the nature of a >> URI-identified resource based on the syntax of the URI. > > yes. > >> But RDF genids are based on .wellknown path prefix in conformance >> with RFC 5785 http://tools.ietf.org/html/rfc5785 so this use of >> ./wellknown/genid as a path prefix to indicate skolemized blank >> node URI *is* licensed and does *not* violate the principle of URI >> opacity. > > > There is nothing in that rfc about this use for non-retrievable > genids. Rather RFC5785 contains a lot of text for _retrievable_ > information metadata space > > [[ When this happens, it is common to designate a "well-known > location" for such data, so that it can be easily located" ]] > > [[ Rather, they are designed to facilitate discovery of information > on a site when it isn't practical to use other mechanisms; for > example, when discovering policy that needs to be evaluated before a > resource is accessed, or when using multiple round-trips is judged > detrimental to performance. ]] > > Ie, you should be able to de-reference those URIs. Good point. > It is clear from > Pierre Antoine Champin's interpretation of the RDF1.1 spec that > genids are not dereferenceable when in .well-known space. AFAICT, the RDF 1.1 spec says nothing either way about whether Skolem URIs should be dereferenceable: http://www.w3.org/TR/rdf11-concepts/#h3_section-skolemization So I think Pierre's interpretation is reflecting his own assumptions and perhaps common experience, but not the intent of the RDF 1.1 spec. > > So my take is that ./well-known URLs are meant to be a place you > "look" for information by default when you don't know where else to > look. I think that was the original intent of RFC5785. But when the RDF working group wanted to define a way to Skolemize blank nodes, .wellknown was proposed as an existing mechanism that would fit the bill. > Eg in the case of the web-finger protocol you start off with an > e-mail address and wish to find the home page of that user, so you > find the e-mail address domain and look up a file on that domain's > .well-known position. Ideally you should always find something > there.... > > http://tools.ietf.org/html/rfc7033 > > It's really not the right way to go about that: the correct way would > have been to query the e-mail server for the e-mail address for that > info. But you know: it's not because its an RFC from the IETF that > its well thought through. And there are numerous problems with > .well-known URLs just by themselves, without this weird an unecessary > twist given by the RDF1.1 spec on top of it. > >> >>>> >>>> So now every RDF linked data client would need to look at each >>>> URI to see if it contains a ".wellknown/genid" string to know >>>> if it should follow it or not. That's pretty un >>>> linked-data-ish. Frankly I am quite surprised it made its way >>>> through to the spec. The people supporting it must have made a >>>> lot of noise. >>>> >>>> >>>> Not everything is about your particular use case, Henry ;-) >>> >>> The arguments I am relying upon, which I will make explicit to >>> you below, go way beyond my particular use case, and don't just >>> take into account one spec, but the whole ecosystem of the web. >>> >>>> >>>> RDF does not equate linked data. It does not mandate URIs to >>>> be derefenceable. In that regard, genid URIs are no special >>>> case, so they do not need the special treatment that you >>>> suggest above. If you try to dereference them, you will get a >>>> 404, that's all. It's not ideal in a Linked Data perspective >>>> (though not lethal either), but it is perfectly acceptable from >>>> the point of view of RDF. >>> >>> RDF 1.1 is part of a series of specification, where each >>> specification does its job. is specified at the logical layer, so >>> all it requires is the concept an IRI. That is the concept of a >>> name with a referent. It's not part of the mandate of RDF to >>> specify how IRIs are meant to work. >>> >>> But the IRI specs on the other had do have something to say on >>> the issues, and so does the overriding habit of use on the web. >>> That is that an http, https, ftp, ftps uris refer without #uris >>> refer to resources on the web which can be accessed by making an >>> HTTP GET on that resource. Minting http URIs with the aim that >>> they would return a 404 is just extreemly bad practice. >> >> Agreed. >> >>> A bit like a web site that had links that lead nowhere. Your web >>> site would very soon be placed on the list of abandoned web >>> sites, your ranking would fall dramatically in search engines, >>> your user experience would be lousy, etc... ( And note that the >>> RDF1.1 spec says nothing about this type of user experience >>> either, but that does not mean it does not exist ). >>> >>> So I don't of course have anything against skolemisation, which >>> makes perfect sense, but the example of a skolemisation URI used >>> in RDF1.1 is absolutely repulsive, and SHOULD be removed as soon >>> as possible. >> >> I assume the example you mean at >> http://www.w3.org/TR/rdf11-concepts/#h3_section-skolemization is >> the URI >> http://example.com/.well-known/genid/d26a2d0e98334696f4ad70a677abc1f6 >> >> >> Are you objecting to that URI because it uses example.com instead of an actual >> domain name, and therefore it is not dereferenceable? > > No, of course not. > >> Avoiding example.com for that reason would seem to me to defeat its >> purpose. > > yes. > >> Or are you objecting to it because it is an http: URI, and you are >> assuming that skolemized URIs will not be dereferenceable? >> Servers can be set up to make them dereferenceable, but probably >> not with as much value as normal URIs in Linked Data. > > The way the example has set it up, every genid will have its own > information resource URL, which means that clients that don't know > about this .well-known convention could end up making endless > connections to a server. > > If it were at least a #Url such as > > http://example.com/.well-known/genid#d26a2d0e98334696f4ad70a677abc1f6 > > then the > > http://example.com/.well-known/genid > > resource could at least return a document describing itself as a > genid document > > <> a genid:Document . That's a reasonable idea, though I believe there should be a slash at the end of the path prefix: .well-known/genid/ So a #URL constructed that way would be: http://example.com/.well-known/genid/#d26a2d0e98334696f4ad70a677abc1f6 AFAICT that *can* be done with RDF genids, even if the example does not show that syntax. The RDF 1.1 spec only specifies the path prefix; the rest of the URI is up to the generator, and the usual trade-offs between slash and hash URIs would apply. > > but in that case the .well-known location would be completely > unimportant. True, *if* the URI is dereferenceable to such a document *and* the client is online and wishes to spend the time to dereference it. The big benefit of a .well-known/genid/ URI is that a client can reliably determine, by simple syntactic inspection of the URI, that the URI represents a Skolemized blank node. This would be very important to tools that automatically perform Skolemization or de-Skolemization. > > As a result of this broken "convention" that breaks web architecture, As I explained, it does *not* violate the web architectural principle of URI opacity. > all clients would now have to make a test for every URL they want to > dereference to see if it does not start with .well-known/genid !!! > This is unacceptable. I think of it the other way around: .well-known/genid/ enables clients to *avoid* network accesses if they don't care about Skolem URIs. I agree that minting lots of 404 URI is bad: it is friendlier to make the URIs dereferenceable to useful information. So maybe the question is: how useful should the returned information be, to justify the use of an http: URI instead of a URN? If a generic document were returned like the one you showed above, or a human-oriented document were returned that pointed to the relevant section in the RDF 1.1 spec, would that be considered useful enough to justify the use of a dereferenceable http: URI? > > As I pointed out the correct way to deal with this is to create a > bnode URN. No client would try to dereference it, so it would work > correctly out of the box. Okay if you never want those URIs to be dereferenceable. But what if you want them to be dereferenceable, as cool URIs should be? > >> >>> >>> Instead they should choose a URN that does this or create a bnode >>> URN type such as bnode:{domain}:{path}:{etag}:{identifier} >>> >>> where it is explicit that this URN cannot be dereferenced >> >> That might be better for skolem URIs that are not intended to be >> dereferenceable. > > it is not just better it is perfect for that situation. (well a > speced out and thought over version of it would be perfect ). > >> But what if a decision is made later to make them dereferenceable? > > That's the clever thing about these skolem URIs I proposed. You can > find the original document they are linked to, by analysing the > skolem URI. > > So imagine a client somehow has > <bnodes:example.com/profile:etag1:bn20343>, then it can look up the > document <https://example.com/profile> to find if the bnode has been > given a dereferenceable URI. That document could contain a statement > of the form > > <bnodes:example.com/profile:etag1:bn20343> owl:sameAs > <https://example.com/joe#me> . > > to link the (secure) bnode to a dereferenceable URI. Oh, I see. Sorry I didn't pick up on that aspect before. So you are proposing the use of a URN prefix, which is not normally dereferenceable, but *may* be usefully dereferenceable by a well-defined process of syntactic URI manipulation. In contrast, the RDF WG adopted the use of an http: prefix, which is normally dereferenceable, but may *not* be usefully dereferenceable, as determined by a well-defined process of syntactic URI inspection. In essence, you are proposing the opposite default from what the RDF WG adopted: instead of requiring the client to recognize the URI pattern to know that the URI is a Skolemized blank node and might *not* be dereferenceable, the client would be required to recognize the URN pattern to know that the (modified) URN *might* be dereferenceable. That's an interesting proposal. I guess if the vast majority of Skolem URIs are not intended to be dereferenceable then that may be an improvment. > >> It would be bad to have to change them all. It seems to me that a >> better balance would be to make them http: URIs, but configure >> servers to return a generic message each time any >> .well-known/genid/ URI is dereferenced, pointing to the above >> section of the RDF specs. > > Certainly better than a 404, but then you still have the issue that > you are creating a dereferenceable URI for something that most > likely you don't want to be dereferenced, messing up the rest of the > expectations on the Web. Furthermore if it is dereferenceable with a > description of it as a bnode URL, then the .well-known URL is > unnecessary. > > Again I think would be better to have a URN with a method to find a > good dereferenceable version of that URN. But at the very least a > dereferenceable bnode URL should describe itself as being such a > thing, and then we can avoid the .well-known answer. > >> OTOH, a client seeing an http: .well-known/genid URI could also >> have different expectations about whether such URIs are likely to >> be dereferenceable. > > That is what I don't want people to have to add to their Linked Data > code. It is unnecessary. There are better answers that don't break > expectations of good behavior of URLs. But it does seem that you are trading that for requiring the client to recognize a special URN pattern instead, right? David > > > > Henry > >> >> David > > Social Web Architect http://bblfish.net/ > > > >
Received on Wednesday, 24 September 2014 01:21:53 UTC