- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Mon, 15 Aug 2011 18:15:10 +0100
- To: RDF-WG <public-rdf-wg@w3.org>
The argument for the \u-escape proposal is the ability to put characters
that occur in some sources of existing data (e.g. lifescience) into
prefix names.
Gavin has proposed another issue with "Prefixed names and slashes" which
is related but not identical.
People have expressed a desire for maximum compatibility between Turtle
and SPARQL. We have to start from where we are, not take a clean-slate
approach and discount all switching costs. See ISSUE-1.
For the proposal:
1/ For the existing data, why does the original character have to used
and not "_", "." or "-"?
2/ Prefix names are about readability. \003D is not a readable form of "=".
3/ There has been no analysis of alternatives
e.g. expand the range of chars allowed as Gavin suggests.
e.g. reuse delimiting by <> and overload scheme/prefix which has
deployed experience. It's even a recurring user expectation albeit not
common.
4/ The current proposal also introduces \u-escapes into the prefix part
and blank node labels but the argument only applies to the local part of
prefixed names.
What we need is design principle and a set of possible ways to achieve
the objective. \u-escapes is not the only option.
Comments on certain assertions about the current situation inline.
Andy
On 15/08/11 13:11, Eric Prud'hommeaux wrote:
> * Richard Cyganiak<richard@cyganiak.de> [2011-08-15 11:24+0100]
...
>> I think there is consensus in the group that we should not add extensions to Turtle at this point. We should just standardize it as it is already implemented (modulo SPARQL alignment).
>>
>> Personally I am very strongly opposed to extending Turtle in ways that are incompatible with SPARQL.
>
> I agree with keeping SPARQL and Turtle compatible so I'll address
> SPARQL:
>
> SPARQL has already changed from processing escape sequences before
> lexing to after lexing. Previously legal SPARQL strings like
> PREFIX<http://example.org/> ex:
> ASK { ?s ex:ab\u0063d ?o }
> become illegal if SPARQL doesn't accept escaping in prefixed names.
> Do such strings exist in the wild? Probably not as they weren't useful
> to utter (because they were unescaped before lexing). But the argument
> about backward compatibility swings in favor of SPARQL allowing them
> in prefixed names.
I don't follow this argument - there is no backwards compatibility here
for data or queries.
Turtle, upto and including the start point for this WG did not allow \u
escapes in qnames at all. Existing software and data does not use the
feature if it follows that doc.
SPARQL, up and including 1.1 LC, allowed escaping in hard-to-type
charcaters like α (Unicode codepoint 03B1, Greek small letter alphas).
SPARQL-WG has a issue box in the LC to signal the possible change. The
LC period got no feedback on the matter.
Every Turtle and SPARQL parser has to change if these changes become
permanent. That is a backwards compatibility argument.
> Another argument for escaping is that identifier names (e.g. in
> biology) have things like ':' and '$' in them. Prefixes add a huge
> amount to the readability of SPARQL and Turtle. Forcing a query or
> data writer to abandon the logical prefix because there's an illegal
> localname character is an equally huge impediment to usability.
>
> Escape sequences in strings and IRIs are of limited use as one can
> embed the all legal IRI and string chars in those productions with the
> help of the specialized escapes \\[nrtb].
Surely the use case which lead to the current deployed systems is
foreign language characters and that is well supported currently.
> Disallowing escape sequences
> in SPARQL
Who is proposing that? SPARQL-WG isn't.
SPARQL-LC published with the existing escaping (done on the input stream
before parsing).
> (and maintaining the status quo in Turtle) means we have to
> justify to the users why the escaping rules they can apply to strings
> and IRIs aren't applicable to prefixed names where they'd be most useful.
The use case which lead to the current deployed systems is forgein
language characters and that is well supported currently.
...
Received on Monday, 15 August 2011 17:15:43 UTC