Re: I18N-ISSUE-193: define when escapes are evaluated [TURTLE] from Gavin Carothers on 2012-09-08 (www-international@w3.org from July to September 2012)

From: Gavin Carothers <gavin@carothers.name>
Date: Fri, 7 Sep 2012 22:05:48 -0700
To: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Cc: Internationalization Core Working Group <www-international@w3.org>, public-rdf-comments@w3.org
Message-ID: <CAPqY83zENoiP4EZroF7ZU0zAnpuQN7gRud+NUyFUxLzHgBYmLA@mail.gmail.com>

On Fri, Sep 7, 2012 at 9:34 PM, "Martin J. Dürst"
<duerst@it.aoyama.ac.jp> wrote:
> On 2012/09/08 1:34, Gavin Carothers wrote:
>>
>> On Fri, Sep 7, 2012 at 9:25 AM, Internationalization Core Working
>> Group Issue Tracker<sysbot+tracker@w3.org>  wrote:
>>>
>>> I18N-ISSUE-193: define when escapes are evaluated [TURTLE]
>>>
>>> http://www.w3.org/International/track/issues/193
>>>
>>> Raised by: Norbert Lindenberg
>>> On product: TURTLE
>>>
>>> http://www.w3.org/2012/08/15-i18n-minutes.html
>>>
>>> Section 6.4, both forms of Unicode escape sequence: The spec doesn't say
>>> at what stage the escape sequences are converted to their corresponding
>>> characters. Can \u0022 start or end a string literal (as it does in, for
>>> example, Java)?
>>
>>
>> No. Escape sequences occur inside literals.
>
>
> Hello Gavin,
>
> Does that mean literals in the RDF sense (the strings that are usually put
> in rectangular boxes when drawing RDF graphs) only? What about the use of
> this notation in other places (IRIs, i.e. reference in RDF)?
>
> If it's indeed only RDF literals, then this is very similar to XML, where
> element/attribute names and related stuff cannot contain any escapes. In
> XML, this makes it impossible to encode an arbitrary XML document in
> US-ASCII or iso-8859-1 or so. Maybe this is less of a problem for TURTLE,
> because it's always UTF-8, but we better make sure.

Both "Strings" and IRIs can contain \u style escapes. See the table in
http://www.w3.org/TR/turtle/#sec-escapes captioned "Context where each
kind of escape sequence can be used".

However, prefixed names do NOT allow the use \u escapes, but this does
not limit ability to produce a Turtle document that is only ASCII but
still has Iñtërnâtiônàlizætiøn as either an IRI or String literal, it
just means that instances of Iñtërnâtiônàlizætiøn in an IRI can not be
shorted using the prefix mechanism in a theoretical ASCII only
document (which is not required, recommended or even suggested by
Turtle).

>
> Regards,    Martin.
>
>
>> There is a table in 6.4
>> showing when they can be used. The normative processing requirements
>> for when things are escaped are expressed in 7.2 RDF Term Constructors
>> for example:
>>
>> STRING_LITERAL_SINGLE_QUOTE     lexical form    The characters between the
>> outermost "'"s are unescaped¹ to form the unicode string of a lexical
>> form.
>>
>> and the footnote to that table:
>>
>> ¹ section 6.4 Escape Sequences defines a mapping from escaped unicode
>> strings to unicode strings. The following lexical tokens are unescaped
>> to produce unicode strings: IRIREF, STRING_LITERAL_SINGLE_QUOTE,
>> STRING_LITERAL_QUOTE, STRING_LITERAL_LONG_SINGLE_QUOTE and
>> STRING_LITERAL_LONG_QUOTE .
>>
>> Perhaps some additional language could be used in the 6.4 section
>> introducing escapes rather then relying on interpretation of the
>> table.
>>
>>
>>> Appendix B implies that escapes are replaced with their character
>>> equivalents before document processing, but it doesn't appear to say that
>>> explicitly anywhere.
>>
>>
>> Appendix B may not be clear enough in it's Encoding considerations:
>> section. It may also simply be using old language. Thanks!
>>
>>>
>>>
>>>
>>
>>
>

Received on Saturday, 8 September 2012 05:06:16 UTC