Re: working through some details on "Just Works" escapes

On 30/11/11 14:24, Eric Prud'hommeaux wrote:
> * Richard Cyganiak<richard@cyganiak.de>  [2011-11-25 14:59+0000]
>> On 25 Nov 2011, at 14:24, Andy Seaborne wrote:
>>>> - Is an escaped character not in the list its normal value, e.g. \a == a?  I think so.
>>>
>>> Some languages do indeed have \X be X for undefined X but (wild claim!) it can get a bit mysterious.  Your example shows this (:-) In C, \a is "audible bell" unicode codepoint U+0007 = BELL, not 'a'
>>>
>>> We already have in a string, \t is a tab not a "t"
>>>
>>> So I prefer to identify the characters that are allowed without defaulting to pass-through.
>>
>> +1, for consistency. It would be weird to have \t be U+0009 in "strings", forbidden in<IRIs>, and “t” in prefixed:names.
>>
>> Then there's the question whether "\-" should equal "-" in prefixed:names, and "\_" == "_" and "\." == "." and so forth. Authors are likely to be unsure whether a particular punctuation character needs backslash-escaping or not, so they might be tempted to escape them just in case, and it would be good if it Just Worked anyways. This uncertainty is unlikely to occur for alphanumeric characters.

OK, that's something that can be explained nicely.

Add "_" to the set of characters that the escapes cover.

The legal non-alphanumerics, non-escaped chars are "." "-" and "_" in 
SPARQL and the Turtle ED.

>
> We can write Just Works into a grammar in a way which communicates the *required* escape chars:
>
>    <identifier_char>: [a-zA-Z0-9-] | \\[~.-!$&'()*+,;=:/?#@%] || \\. # added to permit e.g. "\x", which will be transformed to "x"

That risks the \t confusion.

Is escaping the URI-legal, non-alphanumerics as character escapes 
acceptable to you?

>
> (That presumes first longest lexing. Unordered longest lexing would require \\[~.-…@%] || \\[~.-…@%] .)
> We can't, however, use the grammar to validate the unescaped form if the escaping is written into the grammar.
>
> I think Just Works works as long as we never add language features for which the escaped version of a character never acquires a special meaning. An example of where that rule wasn't observered is in some regex dialects in which e.g. "\(\)" and "[]" are meta characters (presumably because capture was added to the language and they wanted backward compatibility with old patterns which didn't have a special meaning for "()"s.) Anyone who used the Just Works feature may have escaped "()"s just in case, which would break when "\(\)" became meta-characters.
>
> We're largely safe from that anitpattern as we already have our reserved set of characters, so even if we add a special meaning for '@' in path expressions, anything with an (unescaped) '@' would be an error anyways.

 Andy

Received on Wednesday, 30 November 2011 15:52:06 UTC