RE: Weekly Call for Agenda Items

> >And that in the definition of the RDF graph we use MUST
> language, whereas in
> >the discussion of RDF/XML we indicate that parsers SHOULD use normalizing
> >transcoders, (with a reference to a CHARMOD WD).
> >
> >Issue: do we want a note saying that non normalized unicode
> input MUST NOT
> >be normalized. (This is one of the safe guards in charmod, but it assumes
> >that specs are defining a processing model, and we are not).
>
> I'm confused here.  Why do we say the a parser should use a normalizing
> transcoder whilst at the same time saying it MUST NOT normalize
> non-normalized unicode?
>
> Dumbo of Bristol
> Brian
>

I've got bigger ears than you! (also of Bristol)

The early uniform normalization approach to unicode normalization is that
the first producer of *Unicode* must normalize (and nobody else).

I am an RDF/XML parser.

If I read an RDF/XML document and it is in utf-8 then if it's not normalised
it's an error.

If I read Shift-JIS (no idea whether there is a normalization problem) then
I turn it into unicode, and it's my job to make sure that unicode is
normalised. To perform this job, I ask Xerces to do it, and they either have
or don't have appropriate normalizing transcoders. Once charmod is deployed
they will have these things. Since charmod is not deployed I prefer SHOULD
language.

The motivation for this is to ensure maximum compatibility with the lazy
implementor who does nothing. He probably has never heard of shift-JIS and
so rejects it. A normalized unicode document is processed identically by me
and Mr Lazy. A non-normalized unicode document is rejected by me (in strict
conformance mode). (In default mode, a non-normalized unicode document
generates a warning from me, but then I do not normalize it but proceed
anyway like Mr Lazy).

So when we both produce an answer we produce the same answer.

Jeremy

Received on Friday, 15 March 2002 05:10:26 UTC