Re: unicode escapes in prefix names

* Richard Cyganiak <richard@cyganiak.de> [2011-11-23 15:15+0000]
> On 23 Nov 2011, at 14:50, Eric Prud'hommeaux wrote:
> > Of course, URL minters can mint whatever they want, but the mapping to URI (for that popular GET protocol) *loses* '%'s. So a reason to avoid excessive %-ification is that, when you push it through the standard processing at the far end, say, Apache's mapping to a filename, those lost '%'s don't come back. As an example, <http://example.com/R&D> and <http://example.com/R%26D> map to the same URL (Apache will look for <server root>/R&D).
> 
> +1
> 
> > I've seen short exemplars bandied about, but the ones I deal with reallistically are IRIs mapped from protein identifiers which have ':'s in them. I have a nice syntax for writing most of my queries and most of my data, nicely categorized by namespace prefixes which helps me visually distinguish proteins from mechanisms from drugs. But if I'm unlucky enough to need to reference one with a ':' in it, I'm not allowed to use the obvious escaping syntax? Instead I have to throw all that away and have a big opaque IRI in the middle of some otherwise organized data or query?
> 
> Yup, you need to use a full IRI. On the plus side, you don't need to look up unicode code points or do hexadecimal arithmetics.
> 
> I think the average user would rather use a full IRI than figure out how to turn the dot at the end of the IRI into a unicode escape sequence.
> 
> In many ways, expanding the prefix and wrapping everything into <…> is a friendlier escaping mechanism than looking up unicode code points.

I see it as momentarily easier to author, but much harder to read, debug or maintain; a large community would exploit escapes in prefixed names.

I'm not sure I see the reasoning against including escapes in the grammar for prefixed names. It's a minimal grammar delta from allowing them in IRIs and literals (I added <U_CHAR> to <PN_CHARS_BASE> in <http://www.w3.org/2005/01/yacker/uploads/turtleEsc?lang=perl&markup=html#term-turtleEsc-UCHAR>). It doesn't allow any more invalid IRI forms than does <IRI_REF> (and we can always demand implementors validate against [^<>\"{}|^`\\] - [#0000-#20]), it's closer to syntactic compatibility with with SPARQL 1.0 escapes, and it's trivial for and implementor to call the same un-escaping code for prefixed name components as they call for literals and IRIs.


> Not everyone is a Unicode geek with an obsession for orderly query layout ;-)

I may agree with your second point, but I'm pretty sure that the 7 billionth happy unicode geek was born at the end of October.
  http://a57.foxnews.com/static/managed/img/Scitech/396/223/Peru%207%20Billionth%20Person.jpg


> Richard

-- 
-ericP

Received on Wednesday, 23 November 2011 15:49:52 UTC