Re: Character escapes in prefix names from David Wood on 2011-11-25 (public-rdf-wg@w3.org from November 2011)

From: David Wood <david@3roundstones.com>
Date: Fri, 25 Nov 2011 08:58:20 -0500
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: RDF-WG <public-rdf-wg@w3.org>
Message-Id: <89E89395-ABFE-4FC0-B2A4-045A25D47485@3roundstones.com>

Good idea.  The normal sort of issues that come up include:

- Is an escaped character not in the list its normal value, e.g. \a == a?  I think so.

- The above infers that a backslash should always be escaped: \\

String escapes work this way in several programming languages.

Regards,
Dave


On Nov 25, 2011, at 5:55 AM, Andy Seaborne <andy.seaborne@epimorphics.com> wrote:

> 1/ If we want to have extra characters in prefixed names
> (extra characters means ones not allowed by the current syntax for PN_LOCAL)
> then it seems better to use the character escape mechanism.
> 
> Character escapes turn off the meaning of character in that context (e.g. turning " into a char in the string, not the delimiter).  The current meaning of these characters is to end the prefixed name.
> 
> Using character escapes is also (vaguely) readable.
> 
>    og:audio\:title
>    dbpedia:\%C3\%89ire
>    db:employee.id\=123
>    kinase:Cyclin_D\/Cdk4
> 
> A possible set is:
> 
>   ~.-!$&'()*+,;=:/?#@%
> 
> From RFC 3986
> 
> A/ unreserved extras which have positional restrictions (leading "-" and trailing ".")  ~.-
> 
> B/ sub-delims   !$&'()*+,;=
> 
> C/ gen-delims without []   :/?#@
> 
> D/ %
> 
> The prefixed name is still required to be a valid IRI.
> 
> (I haven't gone though all these chars in detail but they are legal IRI chars and not ones marked "unwise", I think)
> 
> 2/ Variant: Adding %XX as a token rule (so the parser will check it's two hex digits), otherwise have \% in the character escapes as above.
> 
> 
> 3/ Variant: One that we haven't discussed much is #, which is sometimes mentioned as a nuisance.  Unescaped # is also possible without major risk of breaking things.  You'd have to write a comment, with no immediately proceeding whitespace in the middle of a "triples" block.  I don't recall ever seeing such a thing.
> 
>    Andy
>

Received on Friday, 25 November 2011 13:59:05 UTC