Re: proposed clarifications to the SPARQL grammar

On Thu, 2006-03-09 at 17:35 -0500, Eric Prud'hommeaux wrote:
> [...]I propose the following change to

I reviewed this 1st change, and I support it:

> [[
>  ... For compatibility with future
> versions of Unicode, the characters in this string may include unassigned
> Unicode codepoints (see Identifier and Pattern Syntax [UNIID] section 4
> Pattern Syntax). ...
> ]]

But that's all the energy I have for this sort of thing for today.

I leave it to others to check the other change:

> Further, I would like to address Bjoern's comments on escape sequences by
> modifying
> [[
> A.5 Escape sequences in strings
> 
> Strings are used for the lexical form of RDF terms and in expressions.
> Within a string, the following escape sequences apply. The escape
> character is backslash "\" (#x5C). No other escape sequences are defined
> for strings.  Names for characters given are the common names.
> 
> These escape sequences apply to all rules making up the rule for string
> (rules: STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1,
> STRING_LITERAL_LONG2).
> 
> <table>
> 
> where HEX  is a hexadecimal character
> 
>     HEX ::= [0-9] | [A-F] | [a-f]
> 
> Examples:
> ...
> ]]
> to
> [[
> A.5 Escape sequences in strings
> 
> The following escape sequences may be used in any string production
> (e.g. STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1,
> STRING_LITERAL_LONG2):
> 
> <table>
> 
> Any escaped character in the range #x00 - #xEFFFFF may appear in any
> string production. For instance, "\n" may appear in a STRING_LITERAL1 even
> though the unescaped form is not valid in that production.
> ]]
> 
> This clarifies n points:
>   - parsers must be able to process currently unassigned Unicode characters.
>   - SPARQL strings include the character #x00.
>   - which codepoints can be produced through \uU escape sequences.
>   - there *is* a difference between escaped characters in strings and
>     escaped characters in variable names and IRI references.
> 
> I specify the range to be #x00 - #xEFFFFF while XML 1.1 uses #x01 -
> #xEFFFFF, citing "Due to potential problems with APIs, #x0 is still
> forbidden both directly and as a character reference." I read our LC
> document as allowing #x00 - #xEFFFFF and am trying to avoid any
> changes to the language at this late date. I don't think the
> liberalization will hurt us.
-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E

Received on Thursday, 9 March 2006 23:39:36 UTC