Character escapes in prefix names from Andy Seaborne on 2011-11-25 (public-rdf-wg@w3.org from November 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 25 Nov 2011 10:55:46 +0000
To: RDF-WG <public-rdf-wg@w3.org>
Message-ID: <4ECF7432.20104@epimorphics.com>

1/ If we want to have extra characters in prefixed names
(extra characters means ones not allowed by the current syntax for 
PN_LOCAL)
then it seems better to use the character escape mechanism.

Character escapes turn off the meaning of character in that context 
(e.g. turning " into a char in the string, not the delimiter).  The 
current meaning of these characters is to end the prefixed name.

Using character escapes is also (vaguely) readable.

     og:audio\:title
     dbpedia:\%C3\%89ire
     db:employee.id\=123
     kinase:Cyclin_D\/Cdk4

A possible set is:

    ~.-!$&'()*+,;=:/?#@%

 From RFC 3986

A/ unreserved extras which have positional restrictions (leading "-" and 
trailing ".")  ~.-

B/ sub-delims   !$&'()*+,;=

C/ gen-delims without []   :/?#@

D/ %

The prefixed name is still required to be a valid IRI.

(I haven't gone though all these chars in detail but they are legal IRI 
chars and not ones marked "unwise", I think)

2/ Variant: Adding %XX as a token rule (so the parser will check it's 
two hex digits), otherwise have \% in the character escapes as above.


3/ Variant: One that we haven't discussed much is #, which is sometimes 
mentioned as a nuisance.  Unescaped # is also possible without major 
risk of breaking things.  You'd have to write a comment, with no 
immediately proceeding whitespace in the middle of a "triples" block.  I 
don't recall ever seeing such a thing.

 Andy

Received on Friday, 25 November 2011 10:56:26 UTC