- From: Jonathan Marsh <jmarsh@microsoft.com>
- Date: Fri, 17 Mar 2000 15:43:18 -0800
- To: "'John Boyer'" <jboyer@PureEdge.com>, TAMURA Kent <kent@trl.ibm.co.jp>, IETF/W3C XML-DSig WG <w3c-ietf-xmldsig@w3.org>
- Cc: "Martin J. Duerst" <duerst@w3.org>, "Christopher R. Maden" <crism@exemplary.net>, James Clark <jjc@jclark.com>
> -----Original Message-----
> From: John Boyer [mailto:jboyer@PureEdge.com]
>
> Hi Jonathan,
>
> Perhaps I've missed something huge here. You said that node-sets are
> automatically serialized. Could you please point out where
> in the XPath
> spec it says anything about this?
I was referring to the XML Sig spec section 6.6.3.4: "If the result of the
XPath expression is a node-set, then the XPath transform output is the
string result of calling serialize() on the node-set."
So as a user I don't have to call serialize() every time myself, it gets
invoked automatically if my expression returns a nodeset. Or am I missing
something?
>
> Thanks,
> John Boyer
> Software Development Manager
> PureEdge Solutions, Inc. (formerly UWI.Com)
> jboyer@PureEdge.com
>
>
> -----Original Message-----
> From: Jonathan Marsh [mailto:jmarsh@microsoft.com]
> Sent: Friday, March 17, 2000 3:06 PM
> To: 'John Boyer'; TAMURA Kent; IETF/W3C XML-DSig WG
> Cc: Martin J. Duerst; Christopher R. Maden; James Clark
> Subject: RE: Xpath transform changes and questions
>
>
>
>
> > -----Original Message-----
> > From: John Boyer [mailto:jboyer@PureEdge.com]
> > Sent: Friday, March 17, 2000 12:48 PM
> > To: TAMURA Kent; IETF/W3C XML-DSig WG
> > Cc: Jonathan Marsh; Martin J. Duerst; Christopher R. Maden;
> > James Clark
> > Subject: Xpath transform changes and questions
> >
> >
> > Hi,
> >
> > I have a few concerns that I think can be worked out, so I am
> > requesting
> > feedback on the information below. If everything works out,
> > then yes, we
> > could remove parse() and exact order.
> >
> > 1) On parse(),
> >
> > I would be in favor of dumping parse() if we can solve ALL of the
> > implementation problems it solves. It solved some
> > interesting, but not too
> > important problems for XPath transform users, but it also
> > specified how
> > implementers were to solve certain problems.
> >
> > Please have a closer look at the output expectations of
> > serialize(). The
> > serialize() function cannot operate without several features
> > of parse(). In
> > particular,
> >
> > i) Serialization of the root node requires that we output the
> > byte order
> > mark and xmldecl read by parse() on input. If parse() is not
> > under our
> > control, we cannot specify that it retains this information.
>
> It seems useful that the BOM and encoding are preserved through
> re-serializing the document. If these are inputs to the serialization
> mechanism, this is added incentive not to expose the
> serialization mechanism
> as a function. Otherwise these would have to be passed
> through as variables
> (as you are doing) and users must not forget to use them, and
> must use them
> appropriately (e.g. not change them), if they want correct results.
>
> If serialization is not exposed as a function, but is performed
> automatically, these problems are avoided. Since it seems that this
> capability already exists (nodesets are automatically
> serialized), at least
> the simple cases are already handled. Thus an explicit
> serialize() and BOM
> and encoding variables seems to be an advanced feature. It's
> necessity in
> solving important problems should be weighed against its potential for
> abuse. I haven't seen you justify your design yet. Mainly
> I'm curious
> here, not trying to kill serialize(). Of course, if you
> can't justify it,
> drop it.
>
> > This would
> > seem to suggest that root node serialization should result in
> > the empty
> > string, which in turn suggests that serialize should output in UTF-8
> > regardless of the input encoding. That would be OK with me.
> >
> > ii) Attribute and namespace serialization require a namespace
> > prefix. Based
> > on a new read of XPath I believe this information must be
> > available, but I
> > want to be sure.
>
> Note in XPath that Namepace Nodes are different in quantity
> and position
> than the attributes used to declare namespaces. Your current
> serialization
> does not take this into account.
>
> On the other hand, serialization does not necessarily have to
> be limited to
> the XPath Data Model. This model of a document is used when
> locating the
> nodes, but given a document and a set of locations, your
> serializer can
> describe what to do on it's own terms. Specifically, any namespace
> attributes in the source are copied through unchanged, and namepace
> attributes are added to elements taken out of scope (e.g.
> parents trimmed)
> to represent the namespace nodes of that element. Retain the
> prefixes in
> all cases - otherwise QNames in content (e.g. XPaths) will
> break under the
> transformation. Maybe the experts will have some comments on
> this idea...
>
> > iii) If everything else checks out, we can get rid of exact
> > order and just
> > use lex order provided that lex ordering in UTF-16 results in
> > the same order
> > as lex ordering in UTF-8 (which is Christopher Maden's claim).
> >
> > Also, parse() has an additional feature that would need to be
> > dealt with in
> > some other way:
> >
> > iv) If the parser used to implement parse() is
> > non-validating, then parse()
> > is required to throw an exception if it encounters an
> > external reference
> > that would cause it to interpret the document differently
> > than a validating
> > parser. This exception is necessary since an unverifiable
> > signature is
> > different than an invalid signature.
>
> I don't see that moving parsing out of parse() changes this. The
> restriction still applies, and needs to be stated. An
> implementation would
> either need to initialize their parser to provide such an
> exception during
> parsing, or do some post-parsing pre-filtering checks.
>
> > 2) On eliminating exprBOM and exprEncoding.
> >
> > Sounds fine. Sounds like any difference of encoding
> between the Xpath
> > expression and the transform input will be handled implicitly
> > by the XPath
> > transform implementation.
> >
> >
> > 3) On automatic serialization,
> >
> > There was some concern that serialization should be automatic
> > ("why call
> > serialize() when that's always what we want to do"). Please
> > see the first
> > paragraph of section 6.6.3.4, which already includes this feature.
>
> The question perhaps is better stated as "why do we need an
> explicit way to
> serialize, instead of always relying on the default that nodesets are
> serialized?
>
> For one thing, this design deeply entwines serialization with
> pruning of the
> tree, which makes it difficult to use off-the-shelf
> serialization components
> in an implementation.
>
> For another, I would not in general expect XPath
> implementations to handle
> large strings such serialize() generates particularly
> efficiently, since
> such strings are at this point pretty rare. For example, I
> can't really
> imagine returning strings from XPath asynchronously, which I
> would expect
> from a serializing component.
>
> > Thus if we remove parse(), then there is no *need* to start
> > expressions with
> > a function call.
> >
> >
> > 4) On providing an initial namespace context
> >
> > We can provide an initial namespace context as is done in XPointer.
> >
> > I was reviewing how XPath handles namespaces, and realized
> that it is
> > different than what I had previously understood, and it seems
> > broken (so
> > there must be some good reason why it is done the way its
> > done, or maybe I'm
> > just misreading the spec).
> >
> > Section 2.3 says "A QName in the node test is expanded into
> > an expanded-name
> > using the namespace declarations from the expression context.
> > This is the
> > same way expansion is done for element type names in start
> > and end-tags
> > except that the default namespace declared with xmlns is not
> > used: if the
> > QName does not have a prefix, then the namespace URI is null
> > (this is the
> > same way attribute names are expanded). It is an error if the
> > QName has a
> > prefix for which there is no namespace declaration in the expression
> > context".
> >
> > This seems to indicate that the input XML document's
> > namespace declarations
> > are ignored and the expression context's namespace
> > declarations are used
> > solely.
>
> Yes. Prefixes are scoped, and can change throughout the
> document, so it is
> not possible to use these declarations in a global context
> such as an XPath.
> Also, the namespace rec implies that prefixes can be changed without
> changing the underlying names. Since XPath has it's own namespace
> declarations, it is unaffected by prefix changes in the
> source document.
>
> > When XPath claims to be XML namespace compliant, I thought
> > that meant it
> > would interpret a node's namespace in the context of the namespace
> > declarations in the document, but that appears not to be
> > true. To clear
> > this up, suppose I have an element x:E, and the document
> > containing this
> > element associates x with www.w3.org, but the expression
> > context associates
> > x with www.ietf.org, then which value will the
> > namespace-uri() function
> > applied to x:E return?
>
> namespace-uri(x:E) will either return the namespace declared
> in the context
> (www.ietf.org) or an empty string if no element {www.ietf.org
> : E) exists in
> the document, which appears to be the case in your question.
> In short,
> XPath cannot talk about an element without knowing it's full
> name, including
> the namespace uri. This is an essential component to it's namespace
> awareness.
>
> > If it returns www.ieft.org, then although it seems weird to
> me that we
> > aren't using the namespace declarations in the document, it
> > would at least
> > be good in the sense that it implies XPath implementations have the
> > namespace prefix kicking around for serialize().
> >
> > It also would mean that Jonathan Marsh is correct in
> > requiring an initial
> > namespace context since we could not do ANY namespace
> > comparisons without it
> > (the XPath seems to say that it is an error to use a QName
> > containing a
> > namespace prefix in an expression if that namespace prefix is
> > not defined in
> > the expression context).
>
> Yep. This really isn't a problem for XPaths appearing in XML
> documents,
> since there is a ready set of namespace declarations (and an
> existing syntax
> for declaring them) to pass into XPath. We couldn't actually
> make this part
> of XPath because certain uses of XPath do not appear in an
> XML document
> context - namely XPointers embedded in URIs.
>
> I don't veiw this issue as a conceptual mistake, but as a
> cheap fix with
> large author simplicity benefits. It's virtually free (no new syntax
> needed) - just add one line saying that the namespaces are
> initialized to
> the namespaces in scope on the <xpath> element.
>
> > I would appreciate your feedback, esp. from those who have
> > sent prior emails
> > and therefore seem to be most interested in how this turns
> > out. If you
> > would please give this some extra priority, I will prepare an
> > alternative
> > document for consideration before or during the meeting in
> Adelaide (I
> > regret that I will not be at that meeting, but I will be at
> > the following
> > one in Victoria ;-).
> >
> > John Boyer
> > Software Development Manager
> > PureEdge Solutions, Inc. (formerly UWI.Com)
> > jboyer@PureEdge.com
> >
> >
> > -----Original Message-----
> > From: w3c-ietf-xmldsig-request@w3.org
> > [mailto:w3c-ietf-xmldsig-request@w3.org]On Behalf Of TAMURA Kent
> > Sent: Friday, March 17, 2000 12:59 AM
> > To: IETF/W3C XML-DSig WG
> > Subject: Re: XSL WG comments on XML Signatures
> >
> >
> >
> > > <John>
> > > XPath filtering will not be substantially rewritten. Based
> > on Clark's
> > > feedback, we can remove the parse function and instead
> > simply assert that
> > > the transform input is parsed and provided to XPath as a
> > node set. The
> > > notions of lex and exact order will be removed (since we
> > cannot directly
> > > specify the parse).
> >
> > That's good! It would be easy to understand, easy to implement,
> > easy to use.
> >
> > --
> > TAMURA Kent @ Tokyo Research Laboratory, IBM
> >
>
Received on Friday, 17 March 2000 18:45:05 UTC