- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Fri, 10 Mar 2006 16:04:46 +0000
- To: Eric Prud'hommeaux <eric@w3.org>
- CC: public-rdf-dawg@w3.org
Eric Prud'hommeaux wrote:
> I addressed the "SPARQL and Unicode versions" comment with some text
> proposed in
> http://www.w3.org/mid/20060126021444.GZ17752@w3.org
> Bjoern Hoehrmann pointed out several remaining shortcomings in
> http://www.w3.org/mid/90vnt1dqjg0d74lfe4j21f69bpofniafea@hive.bjoern.hoehrmann.de
> To address these issues, I propose the following change to
> http://www.w3.org/2001/sw/DataAccess/rq23/#grammar
>
> I would like to change A. SPARQL Grammar from
> [[
> A SPARQL query string is a Unicode character string (c.f. section 6.1
> String concepts of [CHARMOD]) in the language defined by the following
> grammar, starting with the Query production. The EBNF format is the same
> as that used in the XML 1.1 specification[XML11]. Please see the
> "Notation" section of that specification for specific information about
> the notation.
>
> In addition, the following sections apply.
> ]]
> to
> [[
> A SPARQL query string is a Unicode character string (c.f. section 6.1
> String concepts of [CHARMOD]) in the language defined by the following
> grammar, starting with the Query production. For compatibility with future
> versions of Unicode, the characters in this string may include unassigned
> Unicode codepoints (see Identifier and Pattern Syntax [UNIID] section 4
> Pattern Syntax). For productions with excluded character classes (for
> example "[^<>'{}|^`]"), the characters are excluded from the range #x00 -
> #xEFFFFF.
>
> The EBNF notation used in the grammar is defined in Extensible Markup
> Language (XML) 1.1 [XML11] section 6 Notation.
>
> In addition, rules A.1 to A.5 apply.
> ]]
Content-wise that seems like a good change.
Editorially, I wonder if it would be clearer to
+ have a Unicode section (A.1 and bump the rest all up one)
+ Move the EBNF text to A.7.
Or just move the EBNF text and put the Unicode stuff in as a separate paragraph.
>
> and add an informative reference to
>
> [UNIID] Identifier and Pattern Syntax 4.1.0, Mark Davis, Unicode Standard
> Annex #31, 25 March 2005, http://www.unicode.org/reports/tr31/tr31-5.html .
> Latest version available at http://www.unicode.org/reports/tr31/ .
>
>
>
> Further, I would like to address Bjoern's comments on escape sequences by
> modifying
> [[
> A.5 Escape sequences in strings
>
> Strings are used for the lexical form of RDF terms and in expressions.
> Within a string, the following escape sequences apply. The escape
> character is backslash "\" (#x5C). No other escape sequences are defined
> for strings. Names for characters given are the common names.
>
> These escape sequences apply to all rules making up the rule for string
> (rules: STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1,
> STRING_LITERAL_LONG2).
>
> <table>
>
> where HEX is a hexadecimal character
>
> HEX ::= [0-9] | [A-F] | [a-f]
>
> Examples:
> ...
> ]]
> to
> [[
> A.5 Escape sequences in strings
>
> The following escape sequences may be used in any string production
> (e.g. STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1,
> STRING_LITERAL_LONG2):
>
> <table>
HEX bit?
>
> Any escaped character in the range #x00 - #xEFFFFF may appear in any
> string production. For instance, "\n" may appear in a STRING_LITERAL1 even
> though the unescaped form is not valid in that production.
> ]]
>
I think this is the right direction for string literals. The \n illustration
is good.
> This clarifies n points:
> - parsers must be able to process currently unassigned Unicode characters.
> - SPARQL strings include the character #x00.
> - which codepoints can be produced through \uU escape sequences.
> - there *is* a difference between escaped characters in strings and
> escaped characters in variable names and IRI references.
>
> I specify the range to be #x00 - #xEFFFFF while XML 1.1 uses #x01 -
> #xEFFFFF, citing "Due to potential problems with APIs, #x0 is still
> forbidden both directly and as a character reference." I read our LC
> document as allowing #x00 - #xEFFFFF and am trying to avoid any
> changes to the language at this late date. I don't think the
> liberalization will hurt us.
It is only the #x00 that I can't judge. XML left it out for a reason - I'm
happy to include it in SPARQL but would prefer a positive reason.
Andy
Received on Friday, 10 March 2006 16:05:08 UTC