- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Fri, 10 Mar 2006 16:04:46 +0000
- To: Eric Prud'hommeaux <eric@w3.org>
- CC: public-rdf-dawg@w3.org
Eric Prud'hommeaux wrote: > I addressed the "SPARQL and Unicode versions" comment with some text > proposed in > http://www.w3.org/mid/20060126021444.GZ17752@w3.org > Bjoern Hoehrmann pointed out several remaining shortcomings in > http://www.w3.org/mid/90vnt1dqjg0d74lfe4j21f69bpofniafea@hive.bjoern.hoehrmann.de > To address these issues, I propose the following change to > http://www.w3.org/2001/sw/DataAccess/rq23/#grammar > > I would like to change A. SPARQL Grammar from > [[ > A SPARQL query string is a Unicode character string (c.f. section 6.1 > String concepts of [CHARMOD]) in the language defined by the following > grammar, starting with the Query production. The EBNF format is the same > as that used in the XML 1.1 specification[XML11]. Please see the > "Notation" section of that specification for specific information about > the notation. > > In addition, the following sections apply. > ]] > to > [[ > A SPARQL query string is a Unicode character string (c.f. section 6.1 > String concepts of [CHARMOD]) in the language defined by the following > grammar, starting with the Query production. For compatibility with future > versions of Unicode, the characters in this string may include unassigned > Unicode codepoints (see Identifier and Pattern Syntax [UNIID] section 4 > Pattern Syntax). For productions with excluded character classes (for > example "[^<>'{}|^`]"), the characters are excluded from the range #x00 - > #xEFFFFF. > > The EBNF notation used in the grammar is defined in Extensible Markup > Language (XML) 1.1 [XML11] section 6 Notation. > > In addition, rules A.1 to A.5 apply. > ]] Content-wise that seems like a good change. Editorially, I wonder if it would be clearer to + have a Unicode section (A.1 and bump the rest all up one) + Move the EBNF text to A.7. Or just move the EBNF text and put the Unicode stuff in as a separate paragraph. > > and add an informative reference to > > [UNIID] Identifier and Pattern Syntax 4.1.0, Mark Davis, Unicode Standard > Annex #31, 25 March 2005, http://www.unicode.org/reports/tr31/tr31-5.html . > Latest version available at http://www.unicode.org/reports/tr31/ . > > > > Further, I would like to address Bjoern's comments on escape sequences by > modifying > [[ > A.5 Escape sequences in strings > > Strings are used for the lexical form of RDF terms and in expressions. > Within a string, the following escape sequences apply. The escape > character is backslash "\" (#x5C). No other escape sequences are defined > for strings. Names for characters given are the common names. > > These escape sequences apply to all rules making up the rule for string > (rules: STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1, > STRING_LITERAL_LONG2). > > <table> > > where HEX is a hexadecimal character > > HEX ::= [0-9] | [A-F] | [a-f] > > Examples: > ... > ]] > to > [[ > A.5 Escape sequences in strings > > The following escape sequences may be used in any string production > (e.g. STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1, > STRING_LITERAL_LONG2): > > <table> HEX bit? > > Any escaped character in the range #x00 - #xEFFFFF may appear in any > string production. For instance, "\n" may appear in a STRING_LITERAL1 even > though the unescaped form is not valid in that production. > ]] > I think this is the right direction for string literals. The \n illustration is good. > This clarifies n points: > - parsers must be able to process currently unassigned Unicode characters. > - SPARQL strings include the character #x00. > - which codepoints can be produced through \uU escape sequences. > - there *is* a difference between escaped characters in strings and > escaped characters in variable names and IRI references. > > I specify the range to be #x00 - #xEFFFFF while XML 1.1 uses #x01 - > #xEFFFFF, citing "Due to potential problems with APIs, #x0 is still > forbidden both directly and as a character reference." I read our LC > document as allowing #x00 - #xEFFFFF and am trying to avoid any > changes to the language at this late date. I don't think the > liberalization will hurt us. It is only the #x00 that I can't judge. XML left it out for a reason - I'm happy to include it in SPARQL but would prefer a positive reason. Andy
Received on Friday, 10 March 2006 16:05:08 UTC