Re: unicode escapes in prefix names

On 23 Nov 2011, at 15:49, Eric Prud'hommeaux wrote:
>> In many ways, expanding the prefix and wrapping everything into <…> is a friendlier escaping mechanism than looking up unicode code points.
> 
> I see it as momentarily easier to author, but much harder to read

So you find “\u00C9” easier to read than “É”?

You find “United\u002520Kingdom” easier to mentally parse than “United%20Kingdom” (ugly as it is)?

You find “ts16\u003A44\u003A28Z” easier to mentally parse than “ts16:44:28Z”?

You're arguing that prefixed names with the former forms are easier to use than full IRIs with the latter forms. I don't believe that one second. At best you're moving a turd from one pocket to another.

> , debug or maintain;

I'm not sure that a query littered with unicode escapes is easy to debug or maintain. Surely, if a query doesn't work, one of the things you need to check is whether all the colons, full stops, commas, percent signs and plus signs that the query authors elected to unicode-escape in order to be able to squeeze IRIs into prefixed names are indeed correct, or if they mixed up a \u0025 for a \u002d somewhere. You gain neat layout, shorter tokens and hopefully less duplication, but introduce a new source of potential errors that cannot be found by eyeballing the query but require unicode lookup tables. This is not a debug/maintenance win.

> a large community would exploit escapes in prefixed names.

My crystal ball disagrees with your crystal ball here.

For perspective, let's keep in mind that an *actually* large community is currently adopting an RDF-derived syntax that had prefixed names abolished altogether (microdata).

> I'm not sure I see the reasoning against including escapes in the grammar for prefixed names. It's a minimal grammar delta from allowing them in IRIs and literals (I added <U_CHAR> to <PN_CHARS_BASE> in <http://www.w3.org/2005/01/yacker/uploads/turtleEsc?lang=perl&markup=html#term-turtleEsc-UCHAR>). It doesn't allow any more invalid IRI forms than does <IRI_REF> (and we can always demand implementors validate against [^<>\"{}|^`\\] - [#0000-#20]), it's trivial for and implementor to call the same un-escaping code for prefixed name components as they call for literals and IRIs.

I don't dispute that it's an easy enough change for the spec editors and for implementers. I say that it would be a bad change because it doesn't result in benefits for users, authors, or implementers.

(You claim that some author benefits would result; I say that they would materialize only for a small subset of authors – those who have memorized Unicode tables – while making life more difficult for the rest.)

> it's closer to syntactic compatibility with with SPARQL 1.0 escapes

Which I argue is a broken design.

>> Not everyone is a Unicode geek with an obsession for orderly query layout ;-)
> 
> I may agree with your second point, but I'm pretty sure that the 7 billionth happy unicode geek was born at the end of October.
>  http://a57.foxnews.com/static/managed/img/Scitech/396/223/Peru%207%20Billionth%20Person.jpg

Not sure what you're tying to get at here. I don't think she can tell an É from a \u00C9 yet. Most of the 7 billion should never have to.

Best,
Richard

Received on Wednesday, 23 November 2011 17:01:45 UTC