Xpath transform changes and questions from John Boyer on 2000-03-17 (w3c-ietf-xmldsig@w3.org from January to March 2000)

From: John Boyer <jboyer@PureEdge.com>
Date: Fri, 17 Mar 2000 12:47:44 -0800
To: "TAMURA Kent" <kent@trl.ibm.co.jp>, "IETF/W3C XML-DSig WG" <w3c-ietf-xmldsig@w3.org>
Cc: "Jonathan Marsh" <jmarsh@microsoft.com>, "Martin J. Duerst" <duerst@w3.org>, "Christopher R. Maden" <crism@exemplary.net>, "James Clark" <jjc@jclark.com>
Message-ID: <BFEDKCINEPLBDLODCODKCEPLCBAA.jboyer@PureEdge.com>
Hi,

I have a few concerns that I think can be worked out, so I am requesting
feedback on the information below.  If everything works out, then yes, we
could remove parse() and exact order.

1) On parse(),

I would be in favor of dumping parse() if we can solve ALL of the
implementation problems it solves.  It  solved some interesting, but not too
important problems for XPath transform users, but it also specified how
implementers were to solve certain problems.

Please have a closer look at the output expectations of serialize().  The
serialize() function cannot operate without several features of parse().  In
particular,

i) Serialization of the root node requires that we output the byte order
mark and xmldecl read by parse() on input.  If parse() is not under our
control, we cannot specify that it retains this information.  This would
seem to suggest that root node serialization should result in the empty
string, which in turn suggests that serialize should output in UTF-8
regardless of the input encoding.  That would be OK with me.

ii) Attribute and namespace serialization require a namespace prefix.  Based
on a new read of XPath I believe this information must be available, but I
want to be sure.

iii) If everything else checks out, we can get rid of exact order and just
use lex order provided that lex ordering in UTF-16 results in the same order
as lex ordering in UTF-8 (which is Christopher Maden's claim).

Also, parse() has an additional feature that would need to be dealt with in
some other way:

iv) If the parser used to implement parse() is non-validating, then parse()
is required to throw an exception if it encounters an external reference
that would cause it to interpret the document differently than a validating
parser.  This exception is necessary since an unverifiable signature is
different than an invalid signature.


2) On eliminating exprBOM and exprEncoding.

Sounds fine.  Sounds like any difference of encoding between the Xpath
expression and the transform input will be handled implicitly by the XPath
transform implementation.


3) On automatic serialization,

There was some concern that serialization should be automatic ("why call
serialize() when that's always what we want to do").  Please see the first
paragraph of section 6.6.3.4, which already includes this feature.

Thus if we remove parse(), then there is no *need* to start expressions with
a function call.


4) On providing an initial namespace context

We can provide an initial namespace context as is done in XPointer.

I was reviewing how XPath handles namespaces, and realized that it is
different than what I had previously understood, and it seems broken (so
there must be some good reason why it is done the way its done, or maybe I'm
just misreading the spec).

Section 2.3 says "A QName in the node test is expanded into an expanded-name
using the namespace declarations from the expression context. This is the
same way expansion is done for element type names in start and end-tags
except that the default namespace declared with xmlns is not used: if the
QName does not have a prefix, then the namespace URI is null (this is the
same way attribute names are expanded). It is an error if the QName has a
prefix for which there is no namespace declaration in the expression
context".

This seems to indicate that the input XML document's namespace declarations
are ignored and the expression context's namespace declarations are used
solely.

When XPath claims to be XML namespace compliant, I thought that meant it
would interpret a node's namespace in the context of the namespace
declarations in the document, but that appears not to be true.  To clear
this up, suppose I have an element x:E, and the document containing this
element associates x with www.w3.org, but the expression context associates
x with www.ietf.org, then which value will the namespace-uri() function
applied to x:E return?

If it returns www.ieft.org, then although it seems weird to me that we
aren't using the namespace declarations in the document, it would at least
be good in the sense that it implies XPath implementations have the
namespace prefix kicking around for serialize().

It also would mean that Jonathan Marsh is correct in requiring an initial
namespace context since we could not do ANY namespace comparisons without it
(the XPath seems to say that it is an error to use a QName containing a
namespace prefix in an expression if that namespace prefix is not defined in
the expression context).

I would appreciate your feedback, esp. from those who have sent prior emails
and therefore seem to be most interested in how this turns out.  If you
would please give this some extra priority, I will prepare an alternative
document for consideration before or during the meeting in Adelaide (I
regret that I will not be at that meeting, but I will be at the following
one in Victoria ;-).

John Boyer
Software Development Manager
PureEdge Solutions, Inc. (formerly UWI.Com)
jboyer@PureEdge.com


-----Original Message-----
From: w3c-ietf-xmldsig-request@w3.org
[mailto:w3c-ietf-xmldsig-request@w3.org]On Behalf Of TAMURA Kent
Sent: Friday, March 17, 2000 12:59 AM
To: IETF/W3C XML-DSig WG
Subject: Re: XSL WG comments on XML Signatures



> <John>
> XPath filtering will not be substantially rewritten.  Based on Clark's
> feedback, we can remove the parse function and instead simply assert that
> the transform input is parsed and provided to XPath as a node set.  The
> notions of lex and exact order will be removed (since we cannot directly
> specify the parse).

That's good!  It would be easy to understand, easy to implement,
easy to use.

--
TAMURA Kent @ Tokyo Research Laboratory, IBM
Received on Friday, 17 March 2000 15:45:48 UTC