Re: unicode escapes in prefix names from Richard Cyganiak on 2011-11-23 (public-rdf-wg@w3.org from November 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 23 Nov 2011 13:36:02 +0000
To: Gavin Carothers <gavin@carothers.name>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, RDF-WG <public-rdf-wg@w3.org>
Message-Id: <95EB9487-B609-4C9B-9D42-DB3C96D22C39@cyganiak.de>
On 23 Nov 2011, at 01:20, Gavin Carothers wrote:
>> I would argue that SPARQL is changing to avoid a security risk in SPARQL Update:
>> http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2011Aug/0010.html
> 
> Obfuscated comments are not really a security risk.

The problem is obfuscated DELETE statements, not obfuscated comments.

> SPARQL 1.0 allows for escaping sequences in all tokens
> effectively. The previous decision was to allow escaping in as many
> tokens as seemed reasonable.

I'd say it should be as few tokens as is reasonable.

>>> This isn't about encoding.
>> 
>> Right – it's about the complexity that authors already face in this area.
> 
> Yeah, it's about IRI normalisation. Depending on which IRI
> normalisation one is expecting
> <Éire> and <%C3%89ire> could be the same.

Not in RDF.

[[
IRI equality: Two IRIs are equal if and only if they are equivalent under Simple String Comparison according to section 5.1 of [IRI]. Further normalization must not be performed when comparing IRIs for equality.
]]
http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#section-IRIs

> I believe DBpedia is "wrong" in storing the % escaped form.

It's bad practice, but it's what they do and it's what users of our specs have to deal with in the real world.

[[
Interoperability problems can be avoided by minting only IRIs that are normalized according to Section 5 of [IRI]. Non-normalized forms that should be avoided include:

	• Percent-encoding of characters where it is not required by IRI syntax
]]
http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#section-IRIs

>>>> 4.<Éire>  if you don't know how to type É but know that you can use \u00C9 instead
>>> 
>>> Aside from the fact it's relative, why not?
>> 
>> Because xxx:\u00C9ire is not a valid prefixed name (in Turtle – it is in SPARQL 1.0).
> 
> xxx:Éire is valid in RDFa 1.0, RDFa 1.1, SPARQL 1.0, XML 1.0, XML 1.1,
> and Turtle (TS, WD).

Sure. Andy made the case that some people can't type “É” and hence have to use \u00C9.

> xxx:\u00C9ire is valid in RDFa 1.0, RDFa 1.1, SPARQL 1.0, and Turtle (WD)

\u escapes in RDFa? I sincerely hope not! And in Turtle it says so without WG consensus.

> Comes down mostly to do we follow XML

(and Java and SQL)

> in not allowing escaping in
> names or not? But a bit more complicated by the fact we of course DO
> already allow escaping in names some names (<\u00C9ire>) just not all
> names.

Yes, so anyone who needs unicode escapes can already use them in <…>.

> Prefixed names are a all-purpose IRI abbreviation mechanism in RDFa.

RDFa uses CURIEs which are a different story. CURIEs work where there are delimiters around them. Prefixed names in SPARQL and Turtle don't have delimiters around them. When you have delimiters, it's easy to go wild with what you allow to happen between them.

> Which thanks to FaceBook Open Graph has far more deployed data then
> SPARQL does.

Hardly a good example. Facebook requires that the “ogp” prefix be used and ignores prefix declarations. Also, RDFa's design managed to piss off some other group so much that they made their own competing standard, which is now winning the adoption war. The thing they hated most about RDFa was its use of prefixed names. Their standard has no prefixed names at all.

>> Compatibility? Between what and what? SPARQL and Turtle? That can be achieved by SPARQL 1.1 matching Turtle's (Team Submission) behaviour.
> 
> The Team Submission of course has issues as well. Which behaviour? Not
> allowing escapes in prefix names?

Yes.

> Using QNames not SPARQL PNames? Not allowing numbers to start prefix names? 

No.

> Our early decisions seem to be coming unstuck.

I don't recall this group making a decision to allow unicode escapes in prefixed names.

> SPARQL 1.1 isn't really this group's job. Not to mention it seeming rather obvious that the thing SPARQL 1.1 needs to be most compatible with is SPARQL 1.0

Unicode escape handling in SPARQL 1.0 is broken. The old SPARQL WG broke it – they copied the general syntax from Turtle, but in the process decided to change the sane escape handling they could have inherited from Turtle to something bizarre that's pretty much unique in the computing world. (They made other departures from Turtle too and some of those were major improvements.) Now the SPARQL 1.1 WG has to deal with that and yes that's their job. Unicode escape handling in Turtle is *not* broken – it's been sane ever since January 2004, so why mess with it.

Best,
Richard
Received on Wednesday, 23 November 2011 13:36:35 UTC