- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Thu, 02 Oct 2003 16:51:27 +0100
- To: Brian McBride <bwm@hplb.hpl.hp.com>
- Cc: Martin Duerst <duerst@w3.org>, "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, w3c-i18n-ig@w3.org, w3c-rdfcore-wg@w3.org
I will try and propose what it would take, in about 18 hours time, (i.e. tomorrow morning in Europe). It seems that Martin wanted a bit more than Brian's text - discussion of matching problem etc. I suspect a MUST NOT normalize might help too. Jeremy Brian McBride wrote: > > > Martin Duerst wrote: > >> >> At 11:00 03/10/02 +0100, Brian McBride wrote: >> >>> Well, the overnight developments I had hoped for aren't going to happen. >> >> >> >> If that's referring to a response from our side, sorry. > > > It wasn't. I'm sorry that wasn't clear Martin. I was not casting any > aspertions about timeliness in your direction, and in fact, I'm grateful > for this quick response. > > [...] > >> >> >> I agree with 'rarely in practice'. NFC was designed to align with current >> practice where possible. >> >> The example you give is not exactly typical, here are some: >> - A string contains a base character and a combining character, and there >> is a precomposed character for this combination >> - A string contains a character (e.g. Angstrom) that is a canonical >> equivalent (i.e. exact copy) of another (A-ring). >> - A string starts with a combining character with nothing to combine with > > > Thanks Martin, thats helpful. > >> >> >>> One issue of concern to Peter is that the current specs prohibit us >>> saying in say Owl that some string (which is not in normal form C) is >>> not in normal form C. I think this is wrong, in that it is possible >>> to invent a datatype whose lexical space consists of strings in >>> normal form C, but whose value space is not, that would allow the >>> representation of all strings. The same could be done for XML >>> fragments, though would then loose the benefit of the >>> parseType="Literal" convenience syntax. >>> Thus whilst the RDF specs would not be providing a standard way of >>> representing non-NFC strings, it would not be preventing their >>> expression. >> >> >> >> I'm a bit confused here, but I'll try to use my own words. >> >> RDF would always be able to represent non-NFC strings, e.g. by >> defining them as a collection/sequence of integers represented >> by a graph. There is in my understanding nothing one can or should >> do or be able to do to prevent that if somebody really wants to >> do that. > > > Right. I think that's the essential point - there are other ways of > representing non-nfc strings if you really want to. > >> >> What you propose above seems to be somewhat different, i.e. >> normalized strings would represent unnormalized strings. >> But this would run into all kinds of problems, > > > Then I won't pursue it - I merely meant it as an example of one > approach. If that doesn't work no matter, as the one you have suggested > does. > > [...] > >> >>> That said, it does seem odd to me that we are precluding RDF from >>> representing some legal fragments of XML 1.0 as XML Literals. Please >>> interpret "odd" as massive English understatement. >>> >>> This situation has arisen because we have been striving to be good >>> citizens, especially with respect to internationalization and have >>> adopted good practice earlier than some other specs. This does not >>> play well when we embed fragments of language conforming to those >>> other specs in our language. This is a situation when one has to >>> consider the wisdom of trying to be "ahead of the pack". >> >> >> >> I think for RDF specifically, doing matching to build up the graph, >> and having this very clearly defined, was one of the basic requirements, >> and reasons for 'early' adoption. >> From a user point of view, different normalizations should match. >> But from a machine point of view, this can be a lot of work. >> Requiring clean data to start with is what we have proposed, and >> you have adopted. > > > I see. > > A possible alternative would be to not strictly > >> require clean data, but to clearly blame any responsibility for >> matching problems on the side providing the dirty data. > > > That looks like a possible compromise - language of the form "SHOULD be > in NFC" rather than "MUST be in NFC, as I suggested later in my email: > > [...] > >>> >>> I also wonder whether this issue might be addressed by toning down >>> the language from MUST to SHOULD e.g. >>> >>> [...] >>> >>>> which includes the additional following para: >>>> [[ >>>> The string in both plain and typed literals is required to >>>> be in Unicode Normal Form C [NFC]. This requirement is motivated >>>> by [Charmod] particularly section 4 Early Uniform Normalization. >>>> ]] >>> >>> >>> >>> becomes something like >>> >>> [[ >>> The string in both plain and typed literals SHOULD be in Unicode >>> Normal Form C [NFC]. This is motivated by anticipation that >>> [Charmod], particularly section 4 Early Uniform Normalization will >>> become standardized practice. Implementations SHOULD accept strings >>> which are not in Normal Form C and MAY issue a warning in such >>> circumstances. >>> ]] >> > > I think I heard you say that you think such an approach would be > acceptable to I18N. Right? > > Peter, would it work for you? > > Brian > >
Received on Thursday, 2 October 2003 12:06:23 UTC