Re: I18N-ISSUE-193: define when escapes are evaluated [TURTLE] from Martin J. Dürst on 2012-09-08 (public-rdf-comments@w3.org from September 2012)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Sat, 08 Sep 2012 13:34:59 +0900
To: Gavin Carothers <gavin@carothers.name>
CC: Internationalization Core Working Group <www-international@w3.org>, public-rdf-comments@w3.org
Message-ID: <504ACAF3.5040702@it.aoyama.ac.jp>

On 2012/09/08 1:34, Gavin Carothers wrote:
> On Fri, Sep 7, 2012 at 9:25 AM, Internationalization Core Working
> Group Issue Tracker<sysbot+tracker@w3.org>  wrote:
>> I18N-ISSUE-193: define when escapes are evaluated [TURTLE]
>>
>> http://www.w3.org/International/track/issues/193
>>
>> Raised by: Norbert Lindenberg
>> On product: TURTLE
>>
>> http://www.w3.org/2012/08/15-i18n-minutes.html
>>
>> Section 6.4, both forms of Unicode escape sequence: The spec doesn't say at what stage the escape sequences are converted to their corresponding characters. Can \u0022 start or end a string literal (as it does in, for example, Java)?
>
> No. Escape sequences occur inside literals.

Hello Gavin,

Does that mean literals in the RDF sense (the strings that are usually 
put in rectangular boxes when drawing RDF graphs) only? What about the 
use of this notation in other places (IRIs, i.e. reference in RDF)?

If it's indeed only RDF literals, then this is very similar to XML, 
where element/attribute names and related stuff cannot contain any 
escapes. In XML, this makes it impossible to encode an arbitrary XML 
document in US-ASCII or iso-8859-1 or so. Maybe this is less of a 
problem for TURTLE, because it's always UTF-8, but we better make sure.

Regards,    Martin.

> There is a table in 6.4
> showing when they can be used. The normative processing requirements
> for when things are escaped are expressed in 7.2 RDF Term Constructors
> for example:
>
> STRING_LITERAL_SINGLE_QUOTE  lexical form The characters between the
> outermost "'"s are unescaped¹ to form the unicode string of a lexical
> form.
>
> and the footnote to that table:
>
> ¹ section 6.4 Escape Sequences defines a mapping from escaped unicode
> strings to unicode strings. The following lexical tokens are unescaped
> to produce unicode strings: IRIREF, STRING_LITERAL_SINGLE_QUOTE,
> STRING_LITERAL_QUOTE, STRING_LITERAL_LONG_SINGLE_QUOTE and
> STRING_LITERAL_LONG_QUOTE .
>
> Perhaps some additional language could be used in the 6.4 section
> introducing escapes rather then relying on interpretation of the
> table.
>
>
>> Appendix B implies that escapes are replaced with their character equivalents before document processing, but it doesn't appear to say that explicitly anywhere.
>
> Appendix B may not be clear enough in it's Encoding considerations:
> section. It may also simply be using old language. Thanks!
>
>>
>>
>>
>
>

Received on Saturday, 8 September 2012 04:35:36 UTC