- From: Peter F. Patel-Schneider <pfps@research.bell-labs.com>
- Date: Thu, 02 Oct 2003 17:52:18 -0400 (EDT)
- To: bwm@hplb.hpl.hp.com
- Cc: duerst@w3.org, jjc@hplb.hpl.hp.com, w3c-i18n-ig@w3.org, w3c-rdfcore-wg@w3.org
From: Brian McBride <bwm@hplb.hpl.hp.com> Subject: Re: Fwd "a comment on NFC" Date: Thu, 02 Oct 2003 15:43:13 +0100 > Martin Duerst wrote: > > > > At 11:00 03/10/02 +0100, Brian McBride wrote: [...] > >> One issue of concern to Peter is that the current specs prohibit us > >> saying in say Owl that some string (which is not in normal form C) is > >> not in normal form C. I think this is wrong, in that it is possible > >> to invent a datatype whose lexical space consists of strings in normal > >> form C, but whose value space is not, that would allow the > >> representation of all strings. The same could be done for XML > >> fragments, though would then loose the benefit of the > >> parseType="Literal" convenience syntax. > >> Thus whilst the RDF specs would not be providing a standard way of > >> representing non-NFC strings, it would not be preventing their > >> expression. > > > > > > I'm a bit confused here, but I'll try to use my own words. > > > > RDF would always be able to represent non-NFC strings, e.g. by > > defining them as a collection/sequence of integers represented > > by a graph. There is in my understanding nothing one can or should > > do or be able to do to prevent that if somebody really wants to > > do that. > > Right. I think that's the essential point - there are other ways of > representing non-nfc strings if you really want to. Hmm. I am not aware of any other way of representing strings in RDF besides untyped literals and typed literals with datatypes related to xsd:string. None of these methods provides any way of representing non-NFC strings. In any case, this seems to be a rather silly way of representing non-nfc strings. Why should anyone who wants to represent a non-NFC Unicode string be forbidden to do so? Yes, in many circumstances this is a bad thing, but RDF is not about forbidding people from doing such bad things. [...] > A possible alternative would be to not strictly > > require clean data, but to clearly blame any responsibility for > > matching problems on the side providing the dirty data. > > That looks like a possible compromise - language of the form "SHOULD be > in NFC" rather than "MUST be in NFC, as I suggested later in my email: > > [...] > > >> > >> I also wonder whether this issue might be addressed by toning down the > >> language from MUST to SHOULD e.g. > >> > >> [...] > >> > >>> which includes the additional following para: > >>> [[ > >>> The string in both plain and typed literals is required to > >>> be in Unicode Normal Form C [NFC]. This requirement is motivated > >>> by [Charmod] particularly section 4 Early Uniform Normalization. > >>> ]] > >> > >> > >> becomes something like > >> > >> [[ > >> The string in both plain and typed literals SHOULD be in Unicode > >> Normal Form C [NFC]. This is motivated by anticipation that > >> [Charmod], particularly section 4 Early Uniform Normalization will > >> become standardized practice. Implementations SHOULD accept strings > >> which are not in Normal Form C and MAY issue a warning in such > >> circumstances. > >> ]] > > I think I heard you say that you think such an approach would be > acceptable to I18N. Right? > > Peter, would it work for you? I see no problems with this approach. > Brian peter
Received on Thursday, 2 October 2003 17:52:31 UTC