- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Thu, 9 Mar 2006 17:35:11 -0500
- To: public-rdf-dawg@w3.org
- Message-ID: <20060309223511.GT412@w3.org>
I addressed the "SPARQL and Unicode versions" comment with some text proposed in http://www.w3.org/mid/20060126021444.GZ17752@w3.org Bjoern Hoehrmann pointed out several remaining shortcomings in http://www.w3.org/mid/90vnt1dqjg0d74lfe4j21f69bpofniafea@hive.bjoern.hoehrmann.de To address these issues, I propose the following change to http://www.w3.org/2001/sw/DataAccess/rq23/#grammar I would like to change A. SPARQL Grammar from [[ A SPARQL query string is a Unicode character string (c.f. section 6.1 String concepts of [CHARMOD]) in the language defined by the following grammar, starting with the Query production. The EBNF format is the same as that used in the XML 1.1 specification[XML11]. Please see the "Notation" section of that specification for specific information about the notation. In addition, the following sections apply. ]] to [[ A SPARQL query string is a Unicode character string (c.f. section 6.1 String concepts of [CHARMOD]) in the language defined by the following grammar, starting with the Query production. For compatibility with future versions of Unicode, the characters in this string may include unassigned Unicode codepoints (see Identifier and Pattern Syntax [UNIID] section 4 Pattern Syntax). For productions with excluded character classes (for example "[^<>'{}|^`]"), the characters are excluded from the range #x00 - #xEFFFFF. The EBNF notation used in the grammar is defined in Extensible Markup Language (XML) 1.1 [XML11] section 6 Notation. In addition, rules A.1 to A.5 apply. ]] and add an informative reference to [UNIID] Identifier and Pattern Syntax 4.1.0, Mark Davis, Unicode Standard Annex #31, 25 March 2005, http://www.unicode.org/reports/tr31/tr31-5.html . Latest version available at http://www.unicode.org/reports/tr31/ . Further, I would like to address Bjoern's comments on escape sequences by modifying [[ A.5 Escape sequences in strings Strings are used for the lexical form of RDF terms and in expressions. Within a string, the following escape sequences apply. The escape character is backslash "\" (#x5C). No other escape sequences are defined for strings. Names for characters given are the common names. These escape sequences apply to all rules making up the rule for string (rules: STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1, STRING_LITERAL_LONG2). <table> where HEX is a hexadecimal character HEX ::= [0-9] | [A-F] | [a-f] Examples: ... ]] to [[ A.5 Escape sequences in strings The following escape sequences may be used in any string production (e.g. STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1, STRING_LITERAL_LONG2): <table> Any escaped character in the range #x00 - #xEFFFFF may appear in any string production. For instance, "\n" may appear in a STRING_LITERAL1 even though the unescaped form is not valid in that production. ]] This clarifies n points: - parsers must be able to process currently unassigned Unicode characters. - SPARQL strings include the character #x00. - which codepoints can be produced through \uU escape sequences. - there *is* a difference between escaped characters in strings and escaped characters in variable names and IRI references. I specify the range to be #x00 - #xEFFFFF while XML 1.1 uses #x01 - #xEFFFFF, citing "Due to potential problems with APIs, #x0 is still forbidden both directly and as a character reference." I read our LC document as allowing #x00 - #xEFFFFF and am trying to avoid any changes to the language at this late date. I don't think the liberalization will hurt us. -- -eric office: +81.466.49.1170 W3C, Keio Research Institute at SFC, Shonan Fujisawa Campus, Keio University, 5322 Endo, Fujisawa, Kanagawa 252-8520 JAPAN +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA cell: +81.90.6533.3882 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution.
Received on Thursday, 9 March 2006 22:35:28 UTC