W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > September 2001

RE: 2001-09-07#5 Literals

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Mon, 24 Sep 2001 16:51:21 +0100
To: "Graham Klyne" <Graham.Klyne@MIMEsweeper.com>
Cc: <w3c-rdfcore-wg@w3.org>
Message-ID: <JAEBJCLMIFLKLOJGMELDOEDCCCAA.jjc@hplb.hpl.hp.com>
Thanks Graham for your details comments.
Much appreciated.

> >[12]
> >RDF/XML documents SHOULD be W3C-normalized as specified in
> >[CHARMOD]. Moreover, after the stripping of comments and
> >processing instructions an RDF/XML document SHOULD still be
> >W3C-normalized. It is the responsibility of the document
> >creator to fulfil this requirement. RDF/XML processors MUST NOT
> >correct input that is not W3C-normalized.

> I'm not sure what is the value of saying this.
> It seems to me that this would be an application concern,
> if an RDF processor is still expected to accept non-normalized XML
> as a literal.  Hence I'd rather say nothing here.

My reading of CHARMOD was that the principle of early normalization was
important. To be in accord with this, we should prohibit later



I read "Without a precise specification, it is not possible to determine
reliably whether or not two strings are identical. Such a specification must
take into account character encoding, the way to perform normalization and
where or when to perform it."

Since I have seen defining literal equality as an important part of the
literal specification, I have seen it as significant to spend some effort on
specifying precise normalization behaviour.

There is a requirement on document authors (to produce W3C-normalized
documents) and on document processors (not to normalize). My understanding
is that this is a robust requirement in the sense that it only fails if both
parties screw up, if only one does then it succeeds.

> >[13]
> >RDF/XML processors MAY detect lack of W3C-normalization in
> >an input document, and issue a diagnostic.
> Similarly, I don't think this has any place in the RDF
> specification, other
> than perhaps as a non-normative implementation recommendation (hence not
> using RFC 2119 form - may rather than MAY).
Agreed 'may' not 'MAY', it was really only a concession after the rather
harsh MUST NOT in para [12], and was intended to be read in the context of
that MUST NOT. Perhaps sticking it in brackets and adding it to the end of
[12] would convey that better.

> >[14]
> >Summary of text normalization for RDF/XML processors.
> >RDF/XML processors MUST use a normalizing transcoder
> >from non-UCS-based encodings.
> >RDF/XML processors MUST NOT do any other text normalization.
> What's a normalizing transcoder in this context?  (I think this means
> conversion to character-normalized UCS/Unicode.)
> [Later:  now I see --
> http://www.w3.org/TR/charmod/#def-normalizing-transcoder;
> citation at this
> point would be helpful.].


> >[15]
> >Unicode string equality within Literals is given by binary
> >equality.
> >(cf. http://www.w3.org/TR/charmod/#sec-IdentityMatching )
> I think the expression "binary equality" here is an
> over-simplification of
> the cited identity matching algorithm;  I'd suggest describing it as
> "String identity matching, per [citation]".

Again quoting from charmod:

"1. Early uniform normalization to W3C-normalized form, as defined in 4.2.2
W3C-normalized Text
2. Conversion to a common encoding of UCS, if necessary
3. Expansion of all escapes
4. Binary comparison"

Since the text has already ensured steps 1, 2 and 3, all that remains is
step 4. I am happy to substitute "binary comparison" for "binary equality"
or even "binary comparison as given by

> >[33]
> >RDF Literals arising from the propertyElt production with
> >rdf:parseType="Literal" attribute (using the [n]th production
> >of 6.12):
> I think the indicated transformations are legitimate for *any* value of
> parseType;  if parseType='Resource' that would not affect the resultant
> RDF, and for other values of parseType, if they are not recognized then
> they may be treated as 'Literal' per spec.

Hmm, I'll think about this one. Personally I am not sure I would want the
spec to continue to say what to do with unrecognised values of parseType.
The old spec has the strange behaviour of saying that parseType must be
"Resource" or "Literal", but if it isn't there is what you do ....

Once again, thanks a lot for a thorough read. Later in the week I'll
incorporate comments and send out a new version of the text.

Received on Monday, 24 September 2001 11:51:56 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:53:51 UTC