Proposal to NOT address I18N-ISSUE-193: define when escapes are evaluated marks? from Eric Prud'hommeaux on 2012-10-02 (www-international@w3.org from October to December 2012)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Tue, 2 Oct 2012 07:22:07 -0400
To: RDF-WG WG <public-rdf-wg@w3.org>, Internationalization Core Working Group <www-international@w3.org>
Cc: Gavin Carothers <gavin@carothers.name>
Message-ID: <20121002112204.GG27799@w3.org>

Proposal to NOT address I18N-ISSUE-193: define when escapes are evaluated
===============================================================

Issue: Section 6.4, both forms of Unicode escape sequence: The spec doesn't say at what stage the escape sequences are converted to their corresponding characters. Can \u0022 start or end a string literal (as it does in, for example, Java)? Appendix B implies that escapes are replaced with their character equivalents before document processing, but it doesn't appear to say that explicitly anywhere.

don't [http://www.w3.org/TR/2012/WD-turtle-20120710/#sec-parsing-terms 7.2] and [http://www.w3.org/TR/2012/WD-turtle-20120710/#sec-iri-references 6.3] cover that?

The table in <http://www.w3.org/TR/2012/WD-turtle-20120710/#term2escape>
[[
              Context where each kind of escape sequence can be used            
                                                  numeric string     reserved   
                                                  escapes escapes   character   
                                                                     escapes    
  IRIs, used as RDF terms or as in @prefix or     yes     no      no            
  @base declarations                                                            
  local names                                     no      no      yes           
  Strings                                         yes     yes     no            
]]
provides an overview of where the different escapes may be used. For
excruciating detail, §7 RDF Term Constructors
<http://www.w3.org/TR/2012/WD-turtle-20120710/#sec-parsing-terms> provides a
mapping from grammatical productions to unicode strings, e.g. for IRIs:
[[
           production              type                procedure              
                                          The characters between "<" and ">"  
                                          are unescaped¹ to form the unicode  
IRIREF                           IRI      string of the IRI. Relative IRI     
                                          resolution is performed per section 
                                          6.3 IRI References.                 
                                          The potentially empty unicode string
]]
The "unescaped¹" link refers to this text:
[[
¹ section 6.4 Escape Sequences defines a mapping from escaped unicode strings
to unicode strings. The following lexical tokens are unescaped to produce
unicode strings: IRIREF, STRING_LITERAL_SINGLE_QUOTE, STRING_LITERAL_QUOTE, 
STRING_LITERAL_LONG_SINGLE_QUOTE and STRING_LITERAL_LONG_QUOTE .
]]
I think this covers exactly what to do to map from a string of characters in a Turtle document to the lexical form of either an IRI, RDF Literal or Blank Node in the RDF abstract syntax.

Proposal: no change


Please indicate whether this address the stated issue.
-- 
-ericP

Received on Tuesday, 2 October 2012 11:22:44 UTC