Re: unicode escapes in prefix names from Richard Cyganiak on 2011-11-24 (public-rdf-wg@w3.org from November 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 24 Nov 2011 20:43:45 +0000
To: Eric Prud'hommeaux <eric@w3.org>
Cc: Gavin Carothers <gavin@carothers.name>, Andy Seaborne <andy.seaborne@epimorphics.com>, RDF-WG <public-rdf-wg@w3.org>
Message-Id: <30AB8585-49D2-419A-99A9-0F5C0E60C14E@cyganiak.de>
On 24 Nov 2011, at 18:39, Eric Prud'hommeaux wrote:
>> Prefixed names are for shortening appropriately designed IRIs. You want to (ab)use them for something else – as a means of inserting documentation into your query, and then find that it doesn't work very well. SPARQL has comments!
> 
> I've not seen anyone rely on comments when they can rely on namespace prefixes. For example
>  PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> 
>  SELECT DISTINCT ?name
>  WHERE { 
>      ?x rdf:type foaf:Person . 
>      ?x foaf:name ?name
>  }
> needs no documentation and
>  PREFIX foaf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>  PREFIX rdf: <http://xmlns.com/foaf/0.1/>
> 
>  SELECT DISTINCT ?name
>  WHERE { 
>      ?x foaf:type rdf:Person . # for everyone of RDF type FOAF Person
>      ?x rdf:name ?name         #     get their FOAF name
>  }
> is downright antisocial.

That's a different case – the rdf: and foaf: prefixes are fixed by convention and practically everybody knows them. That's not the kind of namespace we are talking about here – all the terms in these namespaces can already be prefix-abbreviated because they were designed for this.

We are talking about instance-level namespaces that you currently can't prefix-abbreviate.

People commonly just write out the full IRIs. They manage. See DBpedia. Comments are available in SPARQL for documentation in cases where it's needed.

>> You allege that users expect to be able to get around syntax constraints using unicode escapes. I don't think that's well-founded. Most languages don't work that way – you can't get around the syntax constraints imposed on identifiers using unicode escapes in any of XML, SQL, Java, Javascript, SPARQL 1.0, CSV, ASN.1 or just about any other language I can think of. What makes you believe that users expect to be able to avoid constraints on identifier tokens using unicode escapes in Turtle, when this isn't possible in other languages?
> 
> Most of these languages have pretty conventional escaping for the parts where someone is dealing with arbitrary text:

We are not talking about unicode escaping in strings. We are talking about unicode escaping in restricted-syntax tokens. Some languages allow unicode escapes in identifiers, some don't. None, as far as I can see, allows expanding the range of identifier characters using unicode escapes.

> In all of these, you can generate literals to e.g match a given input or generate a particular output. In SPARQL, the range of things we must match includes IRIs.

Yes, and SPARQL allows writing any character in any IRI using as many unicode escapes as you like. What it doesn't do is allow the range of legal characters in restricted-syntax tokens using unicode escapes.

I repeat my question: What makes you believe that users expect to be able to avoid syntax constraints on these tokens using unicode escapes in Turtle, when this isn't possible in other languages?

> I'm presuming that *some* people use well-thought-out namespace prefixes.

What's wrong with expecting these same people to comment their queries?

>> Do you expect average SPARQL query authors (perhaps a domain expert or DBA-type person with some RDF background) to hand-write those queries with unicode escapes? If not, then who is writing them?
> 
> Yes, mean that some SPARQL authors will choose to use escaped prefix names instead of full IRIs. (I find it trivial in emacs because I can write the character and use a macro to expand it to a \u code.)

Yeah but you're the 1%.

The average SPARQL author doesn't use emacs macro. The average SPARQL author is a second-year student in India who can't set up their classpath in Eclipse. If we're lucky, in the future the average SPARQL author will be more like the average SQL author – who still doesn't use emacs, and doesn't have a clue what Unicode is.

>>> Some day, tools requiring varying levels of expertise may hide users from some to all of this via various semaphores
>> 
>> Yes – if we were at that stage already then this wouldn't be a big issue.
>> 
>> I still don't understand your reasoning at all. If you want to write “Cyclin_D/Cdk4” in a prefixed name, then why are you pushing for a half-assed non-solution like kinease:Cyclin_D\u002FCdk4 instead of an actually useful and readable approach that has precedent, like regex-style kinease:Cyclin_D\/Cdk4 ?
> 
> Two reasons:
>  I pushed a bit for CURIES. That was killed because we couldn't get 100% coverage of what's escaped and what's not. I still want to be able to use prefixes.

I wasn't asking about CURIEs. CURIEs can't work in SPARQL. I asked about regex-style backslash escaping. That's readable (compared to unicode escapes), useful and has precedent. Why are you not pushing for that?

>  I think that current SPARQL and Turtle are less intuitive to programmers who are used to writing escapes when they need them.

You mean yourself?

> Either get rid of them or make them logical.

That's what I want too. What you propose isn't logical. Unicode escapes sequences are for transmitting a larger set of characters using a smaller set of characters, not for dealing with limited character ranges in a grammar and delimiter collisions.
Received on Thursday, 24 November 2011 20:44:16 UTC