Re: N-Triples V1.6 Character Encoding and Escaping

At 12:22 PM 7/25/01 +0100, Dave Beckett wrote:
>One way to escape a character.  Check.
>
>    [[* Explicit end delimiters MUST be provided. Escapes such as
>    \uABCD where the end delimiter is a space or any character other
>    than [01-9A-F] SHOULD be avoided: it is not clear visually, and it
>    can cause an editor to insert spurious line-breaks when
>    word-wrapping on spaces. Forms like SPREAD's &UABCD; [SPREAD] or
>    XML's &#xhhhh;, where the escape is explicitly terminated by a
>    semicolon, are much better. Escaped characters SHOULD be
>    acceptable wherever unescaped characters are. In particular, they
>    SHOULD be acceptable in identifiers and comments.
>    ]]
>
>Oh dear; the python style things \uABCD are mentioned as should be
>avoided.  This is only a recommendation though.
>
>So I propose we provide one way to escape:
>    '\u' [A-Fa-f0-9]{1,8} ';'
>which generates the appropriate Unicode code point from 1-8 hex digits.

Which falls foul of another rule, i.e. inventing a new escaping 
mechanism.  (I assert that adding the ';' terminator changes the escape 
mechanism.)

It is not clear to me that a fixed-length form like \uxxxx or \Uxxxxxxxx 
actually breaks the rule given above:  it depends on one's interpretation 
of "delimiter":  counting is a well-established way of delimiting values in 
some kinds of structure.

Just an observation:  I don't really mind which way we go (but also note 
that a compelling reason that N-triples is as it is is that existing N3 
processors can read it).

#g


------------------------------------------------------------
Graham Klyne                    Baltimore Technologies
Strategic Research              Content Security Group
<Graham.Klyne@Baltimore.com>    <http://www.mimesweeper.com>
                                 <http://www.baltimore.com>
------------------------------------------------------------

Received on Wednesday, 25 July 2001 14:20:55 UTC