Re: pfps-04

Thank you Martin, particularly for the specific answer to the question I
asked and the references.  I tried searching for the answer in the specs
myself, but wasn't sure I'd uncovered enough evidence to convince Peter.

I also note your broader concerns.  However, I think we were trying to
nail down precisely the formal semantics of the present design, rather
than debate the merits of that design.

Thanks again.

Brian


On Thu, 2003-07-24 at 21:06, Martin Duerst wrote:
> Hello Brian, others,
> 
> At 16:54 03/07/24 +0100, Brian McBride wrote:
> >On Thu, 2003-07-24 at 16:31, Peter F. Patel-Schneider wrote:
> 
> > > So the question boils down to whether octets and Unicode characters are
> > > disjoint.
> >
> >I believe they are.  From
> >
> >   http://www.unicode.org/book/uc20ch1.html
> >
> >[[
> >The character identified by a Unicode code value is an abstract entity,
> >such as "LATIN CAPITAL LETTER A" or "BENGALI DIGIT 5".
> >]]
> >
> >i.e. characters are distinct from their encodings.
> >
> >Martin, Jeremy: confirm?
> 
> 
> I have looked at
> http://www.w3.org/2001/sw/RDFCore/20030123-issues/#pfps-04
> http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0091.html
> 
> and wasn't sure why the question below is relevant for adressing issue pfps-04.
> 
> Based on a conversation with Brian that I had a week or two ago,
> I suspect that it may be related to some technical issue of how
> to distinguish between the values of plain literals, string, and
> XML literals. Looking at
> http://lists.w3.org/Archives/Public/www-rdf-comments/2003JulSep/0064.html
> seems to confirm this suspicion:
> 
>  >>>>>>>>
> Peter:
>  > > > Therefore for the RDF entailment rules to be complete, no XML 
> Literal can
>  > > > have a character string as its denotation.
> 
> Brian:
>  > > Right.  The denotation of an XML Literal is an octet sequence, as
>  > > defined by the xml canonicalization spec, see the note in:
>  > >
>  > >
>  > > 
> http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-XMLLi
> teral
> 
> Peter:
>  > Unfortunately this does not answer the question.  Octet sequence is
>  > undefined in http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/.  At
>  > least some places in this document appear to indicate that an octet
>  > sequence is just a sequence of (Unicode?) characters.
>  >>>>>>>>
> 
> (the short and simple summary of the above discussion is:
> "In order to be able to say that there is a difference between
> plain text and XML, can we claim that plain text is sequences
> of characters and XML is sequences of octets?"
> 
> 
> My answer to the question that Brian asked is: Yes, octets and
> Unicode characters are different. The Unicode standard certainly
> explains that, as does the Character Model:
> http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-Storage
> 
> But this is the wrong question to ask. It is totally inappropriate
> to use different layers of an encoding model to make semantic
> distinctions that are not related to this encoding model.
> Although such a statement is not explicitly made in the Character
> Model (because, frankly speaking, we didn't immagine that anybody
> would come up with such an idea), it should be quite clear from
> Section 3.5 Reference Processing Model
> (http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-RefProcModel)
> that this is very inappropriate.
> 
> It seems that the encoding to UTF-8, inherited by Exclusive XML
> Canonicalization from Canonical XML, and very suitable as a
> preparation for digital signing and encryption or for parser
> testing, is confusing. I will request a clarification to that
> specification and will cc the RDF Core WG on that request.
> 
> I am sure that a different and more appropriate way to make the
> distinction can be found.
> 
> 
> Regards,    Martin.
> 
> 

Received on Friday, 25 July 2003 04:37:11 UTC