fully normalized unicode [was: Agenda for XML Core WG telcon of 2013 March 20] from Paul Grosso on 2013-03-20 (public-xml-core-wg@w3.org from March 2013)

From: Paul Grosso <paul@paulgrosso.name>
Date: Wed, 20 Mar 2013 11:34:08 -0500
To: public-xml-core-wg@w3.org
Message-ID: <5149E500.5080204@paulgrosso.name>

I was happily editing my response to Roger when I came across
the following in my response (with which everyone agreed):

> >
> > 2. This element:
> >
> > <comment>&#x338;</comment>
> >
> > is not fully normalized, right? (Since the content of the <comment>
> > element begins with a combining character and "content" is defined
> > to be a "relevant construct.") Note: hex 338 is the combining solidus
> > overlay character.
>
> That element is fully normalized--see below.

where "below" we say:

> An application that produces
>
> <comment>&#x0338;</comment>
>
> has produced fully normalized output. There's nothing that isn't
> Unicode normalized about that sequence 27 characters.
>
> An application that produced
>
> <comment>X</comment>
>
> where "X" is a single U0338 character would not be producing
> normalized output.


But then I see that Henry gave exactly that input (that we said
was fully normalized) to RXP and it said it was not normalized.

So I'm stuck again on how to answer Roger's basic question of
whether <comment>&#x338;</comment> is fully normalized or not.

paul


On 2013-03-20 10:30, Henry S. Thompson wrote:
> Paul Grosso writes:
>
>> Paul asked Henry:
>>   Henry, at http://www.w3.org/XML/2002/09/xml11-implementation
>>   it says that RXP "incorporates code from Martin Duerst to
>>   optionally check for Unicode character normalization." Is
>>   there something we can say to Roger about this?
>>
>> ACTION to Henry: Find out if RXP does optionally check for
>> Unicode character normalization.
> Indeed it does.
>
> Given
>
> c.xml:
> <c>&#x338;</c>
>
> you can do the following:
>
>> (echo "<?xml version='1.1'?>" ; rxp c.xml)|rxp  -U 1
> <?xml version="1.1" encoding="UTF-8"?>
> Error: pcdata not normalized
>   in unnamed entity at line 2 char 6 of <stdin>
>
> ht

Received on Wednesday, 20 March 2013 16:34:33 UTC