Re: XML literals, canonical form, and normal form C problem

As there was no explicit 7.2.16 in the proposed changes, I did not connect
the second change to this production.

I agree that the proposed changes eliminate the technical problem.

I would still prefer that RDF could handle unnormalized Unicode in
literals, however.

More comments in-line below.


From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Subject: Re: XML literals, canonical form, and normal form C problem
Date: Thu, 18 Sep 2003 15:44:59 +0200

> 
> Peter:
>  > I believe that this response does not adequately address the issue as it
>  > does not address the case of typed literals in Production 7.2.16. 
> (Untyped
>  > literals are handled correctly in this production.)
> 
> We agreed with this which is why we proposed a rewording of that 
> production: i.e.
> 
> replace
>  > > [[
>  > >
>  > > If the rdf:datatype attribute d is given then o :=
>  > typed-literal(literal-value
>  > > := t.string-value, literal-datatype := d.string-value) otherwise
>  > > t.string-value MUST be a Unicode[UNICODE] string in Normal Form 
> C[NFC], o
>  > :=
>  > > literal(literal-value := t.string-value, literal-language := 
> e.language)
>  > and
>  > > the
>  > > ]]
>  > >
>  > > with
>  > > [[
>  > > The Unicode [UNICODE] string t.string-value MUST be in Normal Form 
> C[NFC].
>  > > If the rdf:datatype attribute d is given then o :=
>  > typed-literal(literal-value
>  > > := t.string-value, literal-datatype := d.string-value), otherwise o :=
>  > > literal(literal-value := t.string-value, literal-language := 
> e.language).
>  > The
>  > > ..
>  > > ]]
> 
> 
> 
> 
> Peter:
>  >I think that there needs to be some text somewhere in the RDF documents
>  >indicating which portions of an RDF/XML document must be in Normal 
> Form >C.
> 
> *needs* seems quite strong for this issue.
> The formal grammar serves this function.
> The best advice to document authors is to read charmod, and avoid 
> non-NFC text. The best advice to implementors is the formal grammar.

One problem is that the restrictions to NFC are not in the grammar itself
but instead in the accompanying text.  This makes it hard to determine
where NFC is required.  One solution would thus be to include the NFC
requirements in the grammar itself.

>  >I believe that it is possible to have a valid RDF/XML document which when
>  >interpreted as a Unicode string is not in Normal Form C.
> 
> Correct. For example, within XML Comments, and XML processing 
> instructions. However, such documents will not be valid XML 1.1.

Also within URI references, I believe, which makes this issue much more important.

> I do not believe further changes are needed on this issue.
> 
> Are you dissatisified? If so, I will propose that the WG records your 
> objection and moves on.

My view is that the proposed changes to the documents, while technically
sufficient, are nonetheless insufficient.  More information is needed on
where NFC is required in (the meaningful portions of) RDF/XML documents.

> Jeremy

Peter F. Patel-Schneider

Received on Thursday, 18 September 2003 10:26:07 UTC