- From: Brian McBride <bwm@hplb.hpl.hp.com>
- Date: Thu, 02 Oct 2003 11:00:32 +0100
- To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Cc: w3c-i18n-ig@w3.org, w3c-rdfcore-wg@w3.org
Well, the overnight developments I had hoped for aren't going to happen. First a context setting ramble, than two concrete suggestions. I believe that it is Peter's intention to formally object to the current RDF handling of normal form C. I feel I don't really understand the issue very well, but I'll try to summarize my understanding, such as it is. Please correct my misunderstanings. RDFCore is following CharMod and I18N advice in requiring literals to be in normal form C. XML 1.0 and XSD datatypes do not require this. Thus there are legal fragments of XML 1.0 that are not in normal form C, legal xsd:string's that are not in normal form C and legal xsd:anyURI's that are not in normal form C and these cannot be used in an RDF graph. I think that the issue arises rarely in practice, e.g. when a string or xml fragment contains a combining character with nothing to combine with. One issue of concern to Peter is that the current specs prohibit us saying in say Owl that some string (which is not in normal form C) is not in normal form C. I think this is wrong, in that it is possible to invent a datatype whose lexical space consists of strings in normal form C, but whose value space is not, that would allow the representation of all strings. The same could be done for XML fragments, though would then loose the benefit of the parseType="Literal" convenience syntax. Thus whilst the RDF specs would not be providing a standard way of representing non-NFC strings, it would not be preventing their expression. That said, it does seem odd to me that we are precluding RDF from representing some legal fragments of XML 1.0 as XML Literals. Please interpret "odd" as massive English understatement. This situation has arisen because we have been striving to be good citizens, especially with respect to internationalization and have adopted good practice earlier than some other specs. This does not play well when we embed fragments of language conforming to those other specs in our language. This is a situation when one has to consider the wisdom of trying to be "ahead of the pack". I am tempted by an idea I will attribute to pfps, though I'm not sure he is advocating it, that we should report these difficulties we have encountered trying to deploy charmod to I18N and seek their advice on managing the transition, specifically given that we embed fragments of non-conforming languages in ours. I also wonder whether this issue might be addressed by toning down the language from MUST to SHOULD e.g. [...] > which includes the additional following para: > > [[ > The string in both plain and typed literals is required to > be in Unicode Normal Form C [NFC]. This requirement is motivated > by [Charmod] particularly section 4 Early Uniform Normalization. > ]] becomes something like [[ The string in both plain and typed literals SHOULD be in Unicode Normal Form C [NFC]. This is motivated by anticipation that [Charmod], particularly section 4 Early Uniform Normalization will become standardized practice. Implementations SHOULD accept strings which are not in Normal Form C and MAY issue a warning in such circumstances. ]] Brian
Received on Thursday, 2 October 2003 06:02:38 UTC