Re: Fwd "a comment on NFC" from Brian McBride on 2003-10-02 (w3c-rdfcore-wg@w3.org from October 2003)

From: Brian McBride <bwm@hplb.hpl.hp.com>
Date: Thu, 02 Oct 2003 11:00:32 +0100
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Cc: w3c-i18n-ig@w3.org, w3c-rdfcore-wg@w3.org
Message-ID: <3F7BF740.4070500@hplb.hpl.hp.com>

Well, the overnight developments I had hoped for aren't going to happen.

First a context setting ramble, than two concrete suggestions.

I believe that it is Peter's intention to formally object to the current 
  RDF handling of normal form C.

I feel I don't really understand the issue very well, but I'll try to 
summarize my understanding, such as it is.  Please correct my 
misunderstanings.

RDFCore is following CharMod and I18N advice in requiring literals to be 
  in normal form C.  XML 1.0 and XSD datatypes do not require this. 
Thus there are legal fragments of XML 1.0 that are not in normal form C, 
legal xsd:string's that are not in normal form C and legal xsd:anyURI's 
that are not in normal form C and these cannot be used in an RDF graph.

I think that the issue arises rarely in practice, e.g. when a string or 
xml fragment contains a combining character with nothing to combine with.

One issue of concern to Peter is that the current specs prohibit us 
saying in say Owl that some string (which is not in normal form C) is 
not in normal form C.  I think this is wrong, in that it is possible to 
invent a datatype whose lexical space consists of strings in normal form 
C, but whose value space is not, that would allow the representation of 
all strings.  The same could be done for XML fragments, though would 
then loose the benefit of the parseType="Literal" convenience syntax.
Thus whilst the RDF specs would not be providing a standard way of 
representing non-NFC strings, it would not be preventing their expression.

That said, it does seem odd to me that we are precluding RDF from 
representing some legal fragments of XML 1.0 as XML Literals.  Please 
interpret "odd" as massive English understatement.

This situation has arisen because we have been striving to be good 
citizens, especially with respect to internationalization and have 
adopted good practice earlier than some other specs.  This does not play 
well when we embed fragments of language conforming to those other specs 
in our language.  This is a situation when one has to consider the 
wisdom of trying to be "ahead of the pack".

I am tempted by an idea I will attribute to pfps, though I'm not sure he 
is advocating it, that we should report these difficulties we have 
encountered trying to deploy charmod to I18N and seek their advice on 
managing the transition, specifically given that we embed fragments of 
non-conforming languages in ours.

I also wonder whether this issue might be addressed by toning down the 
language from MUST to SHOULD e.g.

[...]

> which includes the additional following para:
> 
> [[
> The string in both plain and typed literals is required to
> be in Unicode Normal Form C [NFC]. This requirement is motivated
> by [Charmod] particularly section 4 Early Uniform Normalization.
> ]]

becomes something like

[[
The string in both plain and typed literals SHOULD be in Unicode Normal 
Form C [NFC].  This is motivated by anticipation that [Charmod], 
particularly section 4 Early Uniform Normalization will become 
standardized practice.  Implementations SHOULD accept strings which are 
not in Normal Form C and MAY issue a warning in such circumstances.
]]

Brian

Received on Thursday, 2 October 2003 06:02:38 UTC