Re: IRI vs. URI Reference in RDFa from Steve Harris on 2011-05-19 (public-rdf-wg@w3.org from May 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Thu, 19 May 2011 09:53:20 +0100
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: RDF Working Group <public-rdf-wg@w3.org>
Message-Id: <B0509B11-C414-4C38-AB14-AC1F5620C6D9@garlik.com>
I think that's the correct approach too.

Theoretically there are security issues (which is one reason punycode exists), due to URLs that look similar to other ones, but as RDF is designed for consumption by machines I don't see this as a serious issue:

<http://garlık.com/data.rdf> {   # n.b. not an "i"
   <falseThing> a <Fact> .
}

Is not going to fool any machines into believing that garlik.com asserted that data.

- Steve

On 2011-05-19, at 06:33, Manu Sporny wrote:

> BCC-cross-posted to: SWCG, RDFa WG
> 
> There is an RDF/RDF Web Apps/RDFa coordination issue that the RDF Web
> Apps group needs to have resolved in order to take RDFa Core 1.1 and
> XHTML+RDFa 1.1 into Candidate Recommendation. We are requesting input
> from RDF WG and coordination help from SW CG
> 
> The basic question is what should an RDFa processor do when it comes
> across a value in an HTML document that looks like this:
> 
> <a rel="foaf:homepage"
>   href="http://www.schweizer-küche.de/">Schweizer Küche</a>
> 
> The issue is being tracked here (raised by Mischa):
> 
> http://www.w3.org/2010/02/rdfa/track/issues/87
> 
> We had a very long conversation about it on the telecon last week:
> 
> http://www.w3.org/2010/02/rdfa/meetings/2011-05-12#ISSUE__2d_87__3a__IRI_vs__2e__URI_References
> 
> So the question is whether or not the markup above should generate this:
> 
> <> foaf:homepage <http://www.schweizer-küche.de/> .
> 
> or should generate this:
> 
> <> foaf:homepage <http://www.xn--schweizer-kche-qsb.de/> .
> 
> There were good arguments both ways, but I believe that the RDFa WG
> settled on the RDFa processor not modifying the URL value when
> generating the triples for two reasons:
> 
> 1) Punycoding URLs could change the meaning of the triple such that
>   matching rules written by the author would no longer match.
> 2) Punycoding URLs are culturally imperialistic - most of the world's
>   primary languages cannot be expressed in ASCII, we shouldn't
>   force punycoding on all languages "other than English".
> 3) Modifying URLs away from the authors intent, or away from well-known
>   transforms like relative-IRI to absolute-IRIs or normalized
>   IRIs, is bad. We shouldn't attempt to guess what the author meant.
> 4) IDN is a hack and should be dragged into the street and shot
>   (ok, so this is just my opinion :P)
> 
> So the general assertion is that RDFa Processors should only perform the
> following transformation on IRIs:
> 
> 1. Relative to Absolute IRI transformation.
> 
> That is, they shouldn't punycode and they shouldn't attempt to do any
> other processing on the IRI output by the processor. In other words,
> RDFa Processors shouldn't second-guess the document author. Thoughts?
> 
> -- manu
> 
> PS: This also, tangentially, re-opens the can of worms on equivalence
> testing for IRIs in RDF. Is http://example.com/ros&#xE9 the same as
> HTTP://example.com/ros%C3%A9 for equivalence testing in RDF?
> 
> -- 
> Manu Sporny (skype: msporny, twitter: manusporny)
> President/CEO - Digital Bazaar, Inc.
> blog: PaySwarm Developer Tools and Demo Released
> http://digitalbazaar.com/2011/05/05/payswarm-sandbox/
> 
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Thursday, 19 May 2011 08:53:51 UTC