Re: unicode escapes in prefix names

Hi Andy,

On 22 Nov 2011, at 21:04, Andy Seaborne wrote:
> With a goal of maximising compatibility between Turtle and SPARQL, maximising compatibility from both heritiages is important.
> 
> SPARQL 1.0 allows \u in prefix names (and in fact uniformly)

Allowing escapes everywhere was a design mistake in SPARQL 1.0 IMHO. I was looking forward to seeing this fixed in 1.1.

> SPARQL is already changing to accommodate Turtle in a major way for implementers

I would argue that SPARQL is changing to avoid a security risk in SPARQL Update:
http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2011Aug/0010.html

> Turtle can make a smaller change to accommodate SPARQL.
> (smaller because it does not change the design of a Turtle parser as it does to a SPARQL one)

Turtle would thereby acquire the same security risk.

Transmitting potentially sensitive information in a format that supports obfuscation of element boundaries is not a good idea. I'm not aware of a single other format besides SPARQL 1.0 that has this “feature”. Turtle should not support obfuscation of element boundaries.

> More inline - some of your examples are about %-encoding in prefixed names and not about unicode escapes.

They are real-world examples taken from DBpedia, the poster child of RDF datasets. Yes of course they are about %-encoding and the fact that DBpedia uses %-encoded characters in IRIs where it doesn't have to. Such is the state of deployed reality. The examples illustrate the complexity that users already have to deal with regarding a rather simple question – how to deal with the “É” in “Éire” when figuring out how to query DBpedia. This is the reason why any additional complexity should be motivated by benefits to users and document authors, not by dependencies between specs or modest implementation issues.

>> As it stands, none of the following IRIs can be written as prefixed
>> names – they all have to be written as full IRIs:
>> 
>> 1.<%C3%89ire>
> 
> This isn't about encoding.

Right – it's about the complexity that authors already face in this area.

>> 2.<search?q=eire>
>> 3.<Galway,_Ireland>
>> 4.<Éire>  if you don't know how to type É but know that you can use \u00C9 instead
> 
> Aside from the fact it's relative, why not?

Because xxx:\u00C9ire is not a valid prefixed name (in Turtle – it is in SPARQL 1.0).

>> 5.<U.S.>
> 
> What have trailing dots got to do with unicode escapes?

They are both examples of stuff that prevents prefixed names from being an *all-purpose* IRI abbreviation mechanism.

>> 6.<United%20Kingdom>
> 
> use of % - not about unicode escapes.

Ditto. The point is that only a very limited range of IRIs can be abbreviated using prefixed names (I gave six examples where a reasonable person might expect it to work but it doesn't), and users don't actually benefit from a change that makes *one* of those many cases work.

Without a benefit to users, I don't see the case for a backwards compatibility breaking change to Turtle.

> My suggestion is not expanding the range of characters that are, or are not, allowed in a prefix name but I'm open to adding %xx.

This would make a second example work, while the four others still don't.

As long as most IRIs can't be usefully abbreviated with prefixed names, it's a fundamental mistake to think of prefixed names as an all-purpose IRI abbreviation mechanism. It just isn't. It's a feature for abbreviating IRIs that have been designed with the feature in mind. (I may be refuting a point here that you didn't make but others did when asking for the same feature.)

>> The proposal adds a whole bunch of complexity to the story that one
>> needs to tell to explain how the hell prefixed names work, and what
>> we get in return is a solution for the case that matters least –
>> number 4 – while all the others still don't work and require falling
>> back to full IRIs.
> 
> What about compatibility?

Compatibility? Between what and what? SPARQL and Turtle? That can be achieved by SPARQL 1.1 matching Turtle's (Team Submission) behaviour.

Best,
Richard

Received on Tuesday, 22 November 2011 23:42:17 UTC