- From: Brian McBride <bwm@hplb.hpl.hp.com>
- Date: 25 Jul 2003 09:35:58 +0100
- To: Martin Duerst <duerst@w3.org>
- Cc: "Peter F. "Patel-Schneider <pfps@research.bell-labs.com>, jjc@hplb.hpl.hp.com, Pat Hayes <phayes@ai.uwf.edu>, www-rdf-comments@w3.org, i18n <w3c-i18n-ig@w3.org>
Thank you Martin, particularly for the specific answer to the question I asked and the references. I tried searching for the answer in the specs myself, but wasn't sure I'd uncovered enough evidence to convince Peter. I also note your broader concerns. However, I think we were trying to nail down precisely the formal semantics of the present design, rather than debate the merits of that design. Thanks again. Brian On Thu, 2003-07-24 at 21:06, Martin Duerst wrote: > Hello Brian, others, > > At 16:54 03/07/24 +0100, Brian McBride wrote: > >On Thu, 2003-07-24 at 16:31, Peter F. Patel-Schneider wrote: > > > > So the question boils down to whether octets and Unicode characters are > > > disjoint. > > > >I believe they are. From > > > > http://www.unicode.org/book/uc20ch1.html > > > >[[ > >The character identified by a Unicode code value is an abstract entity, > >such as "LATIN CAPITAL LETTER A" or "BENGALI DIGIT 5". > >]] > > > >i.e. characters are distinct from their encodings. > > > >Martin, Jeremy: confirm? > > > I have looked at > http://www.w3.org/2001/sw/RDFCore/20030123-issues/#pfps-04 > http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0091.html > > and wasn't sure why the question below is relevant for adressing issue pfps-04. > > Based on a conversation with Brian that I had a week or two ago, > I suspect that it may be related to some technical issue of how > to distinguish between the values of plain literals, string, and > XML literals. Looking at > http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0064.html > seems to confirm this suspicion: > > >>>>>>>> > Peter: > > > > Therefore for the RDF entailment rules to be complete, no XML > Literal can > > > > have a character string as its denotation. > > Brian: > > > Right. The denotation of an XML Literal is an octet sequence, as > > > defined by the xml canonicalization spec, see the note in: > > > > > > > > > > http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-XMLLi > teral > > Peter: > > Unfortunately this does not answer the question. Octet sequence is > > undefined in http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/. At > > least some places in this document appear to indicate that an octet > > sequence is just a sequence of (Unicode?) characters. > >>>>>>>> > > (the short and simple summary of the above discussion is: > "In order to be able to say that there is a difference between > plain text and XML, can we claim that plain text is sequences > of characters and XML is sequences of octets?" > > > My answer to the question that Brian asked is: Yes, octets and > Unicode characters are different. The Unicode standard certainly > explains that, as does the Character Model: > http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Storage > > But this is the wrong question to ask. It is totally inappropriate > to use different layers of an encoding model to make semantic > distinctions that are not related to this encoding model. > Although such a statement is not explicitly made in the Character > Model (because, frankly speaking, we didn't immagine that anybody > would come up with such an idea), it should be quite clear from > Section 3.5 Reference Processing Model > (http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-RefProcModel) > that this is very inappropriate. > > It seems that the encoding to UTF-8, inherited by Exclusive XML > Canonicalization from Canonical XML, and very suitable as a > preparation for digital signing and encryption or for parser > testing, is confusing. I will request a clarification to that > specification and will cc the RDF Core WG on that request. > > I am sure that a different and more appropriate way to make the > distinction can be found. > > > Regards, Martin. > >
Received on Friday, 25 July 2003 04:37:11 UTC