- From: Martin J. Duerst <duerst@w3.org>
- Date: Tue, 21 Mar 2000 18:40:15 +0900
- To: "John Boyer" <jboyer@PureEdge.com>
- Cc: "IETF/W3C XML-DSig WG" <w3c-ietf-xmldsig@w3.org>, <w3c-xsl-wg@w3.org>
At 00/03/16 10:26 -0800, John Boyer wrote: >-----Original Message----- >From: Martin J. Duerst [mailto:duerst@w3.org] >The i18n WG/IG was also rather lost on seeing $exprEncoding and $exprBOM. ><John> >The problem that this lowly piece of dark matter saw was simply that the >transform input could be in a different encoding than the document >containing the signature (and hence the XPath expression). > >Suppose a signature in a UTF-16 document contains a URI to an XML document >that is encoded in UTF-8. The result of the URI dereference is a UTF-8 >document, whose tag names, attributes, etc. are incomparable to the >conditions set forth in the XPath expression. Unless you convert the XPath >expression to the same encoding as the XML document, or convert the XML >document to the same encoding as the expression, then you will not be able >to evaluate the expression. > >At a minimum, a particular implementation could throw an exception if it >couldn't handle the conversion. > >The alternative to this solution is to standardize on a particular encoding, >which would imply that every XML document would have to be converted before >running XPath, which is not a very efficient solution. ></John> It's not very inefficient either, and it's more or less what XPath is requiring anyway. Trying to implement it with anything else that 'Unicode inside' will hit various problems sooner or later. >As for the BOM, the same arguments apply. In addition, the XPath >expression is element (or attribute?) content, so there is never >a BOM in front of it. To distinguish between BE and LE, just use >UTF-16-BE and UTF-16-LE if needed. > ><John> >The XML document we receive as input to the transform MUST have a BOM in >front of it if it is UTF-16, according to the XML spec. So, obviously >exprBOM doesn't refer to that. > >You are exactly right that there is no BOM in front of the XPath expression >string, which is exactly why the context needed to be told what the BOM >should be Well, yes, but please distinguish between the evocation context of the XPath engine, and the evaluation context of the XPath expression. The former needs to know details about encoding, the later should not have to know them. >(unless you weren't planning to solve the encoding problem). The encoding problem would be solved by using UTF-16BE or UTF-16LE where necessary. >The problem is that if one application reads a UTF-8 document and leaves it >in UTF-8, then the output will be UTF-8, which implies one digest value. If >another tool reads the UTF-8 then converts to UTF-16 because of some >limitation on their XPath expression engine, then the output will be UTF-16 >(unless they take the special effort of converting back to UTF-8 (???) to >overcome the limitation of their toolset). So, a signature created by the >first product would not verify in the second product. I guess the only thing that makes sense here is to define that the XPath serializer produce output in a single specific encoding. I guess that would most probably be UTF-8. Regards, Martin.
Received on Tuesday, 21 March 2000 20:32:10 UTC