Re: IBM's XML 1.1 tests

> > > (1) ibm-valid-P02-ibm02v01.xml
> > > The UTF-8 code for LSEP (2028) in this file seems to be wrong.
> > > I believe it should be e2 80 a8, the file has e0 9f ac which is
> > > a non-shortest UTF-8 sequence for something else.

I am no expert on Unicode, but have sometimes the need to understand it.
According to table 3.1B of Unicode 3.2, the sequence e0 9f ac is not
a valid UTF-8 sequence. That much I understand. At this point I was assuming
that this table allows one to check valid/shortest sequences.

> > > [GM] Agree, a typo, the byte sequence corresponds to the character #x7EC
> > > and should be changed to e2 80 a8, but its still a valid document.
> >
> >It's not the shortest sequence for 7EC, so it's a UTF-8 error and
> >therefore not well-formed.

Now, this confuses me. The UTF-8 table allows this sequence, but it cannot
map to 7EC, but must map to somewhere in the range 1000 to CFFF.
So, is it now a sequence for 7EC, and if yes, where am I wrong?

Karl

Received on Friday, 7 November 2003 16:20:29 UTC