Re: URI Comparisons: RFC 2616 vs. RDF

David Wood wrote:
> On Jan 19, 2011, at 10:59, Nathan wrote:
>> ps: as an illustration of how engrained URI normalization is, I've capitalized the domain names in the to: and cc: fields, I do hope the mail still come through, and hope that you'll accept this email as being sent to you. Hopefully we'll also find this mail in the archives shortly at htTp://lists.W3.org/Archives/Public/public-lod/2011Jan/ - Personally I'd hope that any statements made using these URIs (asserted by man or machine) would remain valid regardless of the (incorrect?-)casing.
> 
> Heh.  OK, I'll bite.  Domain names in email addressing are defined in IETF RFC 2822 (and its predecessor RFC 822), which defers the interpretation to RFC 1035 ("Domain names - implementation and specification).  RFC 1035 section 2.3.3 states that domain names in DNS, and therefore in (E)SMTP, are to be compared in a case-insensitive manner.
> 
> As far as I know, the W3C specs do not so refer to RFC 1035.

And I'll bite in the other direction, why not treat URIs as URIs? why go 
against both the RDF Specification [1] and the URI specification when 
they say /not/ to encode permitted US-ASCII characters (like ~ %7E)? why 
force case-sensitive matching on the scheme and domain on URIs matching 
the generic syntax when the specs say must be compared case 
insensitively? and so on and so forth.

I have to be honest, I can't see what good this is doing anybody, in 
fact it's the complete opposite scenario, where data is being junked and 
scrapped because we are ignoring the specifications which are designed 
to enable interoperability and limit unexpected behaviour.

I'm currently preparing a list of errors I'm finding in RDF, RDFa and 
linked data tooling to do with this, and I have to admit even I'm 
surprised at the sheer number of tools which are affected.

Additionally there's a very nasty, and common, use case which I can't 
test fully, so would appreciate people taking the time to check their 
own libraries/clients, as follows:

If you find some data with the following setup (example):

   @base <htTp://EXAMPLE.org/foo/bar> .
   <#t> x:rel <../baz> .

and then you "follow your nose" to <htTp://EXAMPLE.org/baz>, will you 
find any triples about it? (problem 1) and if there's no base on the 
second resource, and it uses relative URIs, then the base you'll be 
using is <htTp://EXAMPLE.org/baz>, and thus, you'll effectively create a 
new set of statements which the author never wrote, or intended (problem 2).

In other words, in this scenario, no matter what you do you're either 
going to get no data (even though it's there) or get a set of statements 
which were never said by the author (because the casing is different).

Further, essentially all RDFa ever encountered by a browser has the 
casing on all URIs in href and src, and all these which are resolved, 
automatically normalized - so even if you set the base to 
<htTp://EXAMPLE.org/> or use it in a URI, browser tools, extensions, and 
js based libraries will only ever see the normalized URIs (and thus be 
incompatible with the rest of the RDF world).

I'll continue on getting the specific examples for current RDF tooling 
and resources and get it on the wiki, but I'll say now that almost every 
tool I've encountered so far "does it wrong" in inconsistent 
non-compatible ways.

Finally, I'll ask again, if anybody has any use case which benefits from 
<htTp://EXAMPLE.org/%7efoo> and <http://example.org/~foo> being classed 
as different RDF URIs, I'd love to hear it.

[1] """The encoding consists of: ... 2. %-escaping octets that do not 
correspond to permitted US-ASCII characters."""
  - http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref

Best,

Nathan

Received on Wednesday, 19 January 2011 21:46:21 UTC