- From: Brian McBride <bwm@hplb.hpl.hp.com>
- Date: Thu, 02 Oct 2003 15:43:13 +0100
- To: Martin Duerst <duerst@w3.org>, "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>
- Cc: Jeremy Carroll <jjc@hplb.hpl.hp.com>, w3c-i18n-ig@w3.org, w3c-rdfcore-wg@w3.org
Martin Duerst wrote: > > At 11:00 03/10/02 +0100, Brian McBride wrote: > >> Well, the overnight developments I had hoped for aren't going to happen. > > > If that's referring to a response from our side, sorry. It wasn't. I'm sorry that wasn't clear Martin. I was not casting any aspertions about timeliness in your direction, and in fact, I'm grateful for this quick response. [...] > > > I agree with 'rarely in practice'. NFC was designed to align with current > practice where possible. > > The example you give is not exactly typical, here are some: > - A string contains a base character and a combining character, and there > is a precomposed character for this combination > - A string contains a character (e.g. Angstrom) that is a canonical > equivalent (i.e. exact copy) of another (A-ring). > - A string starts with a combining character with nothing to combine with Thanks Martin, thats helpful. > > >> One issue of concern to Peter is that the current specs prohibit us >> saying in say Owl that some string (which is not in normal form C) is >> not in normal form C. I think this is wrong, in that it is possible >> to invent a datatype whose lexical space consists of strings in normal >> form C, but whose value space is not, that would allow the >> representation of all strings. The same could be done for XML >> fragments, though would then loose the benefit of the >> parseType="Literal" convenience syntax. >> Thus whilst the RDF specs would not be providing a standard way of >> representing non-NFC strings, it would not be preventing their >> expression. > > > I'm a bit confused here, but I'll try to use my own words. > > RDF would always be able to represent non-NFC strings, e.g. by > defining them as a collection/sequence of integers represented > by a graph. There is in my understanding nothing one can or should > do or be able to do to prevent that if somebody really wants to > do that. Right. I think that's the essential point - there are other ways of representing non-nfc strings if you really want to. > > What you propose above seems to be somewhat different, i.e. > normalized strings would represent unnormalized strings. > But this would run into all kinds of problems, Then I won't pursue it - I merely meant it as an example of one approach. If that doesn't work no matter, as the one you have suggested does. [...] > >> That said, it does seem odd to me that we are precluding RDF from >> representing some legal fragments of XML 1.0 as XML Literals. Please >> interpret "odd" as massive English understatement. >> >> This situation has arisen because we have been striving to be good >> citizens, especially with respect to internationalization and have >> adopted good practice earlier than some other specs. This does not >> play well when we embed fragments of language conforming to those >> other specs in our language. This is a situation when one has to >> consider the wisdom of trying to be "ahead of the pack". > > > I think for RDF specifically, doing matching to build up the graph, > and having this very clearly defined, was one of the basic requirements, > and reasons for 'early' adoption. > From a user point of view, different normalizations should match. > But from a machine point of view, this can be a lot of work. > Requiring clean data to start with is what we have proposed, and > you have adopted. I see. A possible alternative would be to not strictly > require clean data, but to clearly blame any responsibility for > matching problems on the side providing the dirty data. That looks like a possible compromise - language of the form "SHOULD be in NFC" rather than "MUST be in NFC, as I suggested later in my email: [...] >> >> I also wonder whether this issue might be addressed by toning down the >> language from MUST to SHOULD e.g. >> >> [...] >> >>> which includes the additional following para: >>> [[ >>> The string in both plain and typed literals is required to >>> be in Unicode Normal Form C [NFC]. This requirement is motivated >>> by [Charmod] particularly section 4 Early Uniform Normalization. >>> ]] >> >> >> becomes something like >> >> [[ >> The string in both plain and typed literals SHOULD be in Unicode >> Normal Form C [NFC]. This is motivated by anticipation that >> [Charmod], particularly section 4 Early Uniform Normalization will >> become standardized practice. Implementations SHOULD accept strings >> which are not in Normal Form C and MAY issue a warning in such >> circumstances. >> ]] I think I heard you say that you think such an approach would be acceptable to I18N. Right? Peter, would it work for you? Brian
Received on Thursday, 2 October 2003 11:06:35 UTC