RE: Xpath transform changes and questions from Jonathan Marsh on 2000-03-18 (w3c-ietf-xmldsig@w3.org from January to March 2000)

From: Jonathan Marsh <jmarsh@microsoft.com>
Date: Fri, 17 Mar 2000 16:41:10 -0800
To: "'John Boyer'" <jboyer@PureEdge.com>, IETF/W3C XML-DSig WG <w3c-ietf-xmldsig@w3.org>
Message-ID: <5F68209F7E4BD111A5F500805FFE35B91D3FDE7E@RED-MSG-54>
> -----Original Message-----
> From: John Boyer [mailto:jboyer@PureEdge.com]
> 
> Hi Jonathan,
> 
> Right, section 6.6.3.4 is referring to the serialize() 
> function defined in
> the XPath transform spec in section 6.6.3.3.  You suggested 
> that I should
> take serialize() out because I haven't justified keeping it, yet your
> rationale for removing it is based on its existence.
> 
> So, I'm deducing that what you mean is that we should keep 
> the description
> of how we will serialize a node-set, but don't add it to the function
> library.  The justifications I gave for putting it in the 
> function library
> (and hence making it available to users) were
> 
> 1) The method defined by XPath for adding functionality to 
> XPath is adding
> to the function library.  The serialize functionality was 
> therefore added as
> a function.

Yes, but not solely as a function, so the benefits of using the XPath
mechanism are dubious.  An implementor would have to do extra work to make
it available in both places.  A user would have to do extra work to figure
out when to use each.

> 2) If a user decides to use XPath string function to modify 
> the result, then
> they can do so with an explicit serialize function.

But the scenarios where a user could profitably grunge around arbitrarily in
serialized XML without breaking something severe are dubious.  I can't see
that anything of value can be concatenated to or trimmed from this string
without breaking well-formedness, and mucking around in strings of markup
really needs a compelling user scenario beyond, "they might want to".  If
they really want to do string manipulations, some kind of subsequent
perl/regexp transform seems better suited.  I ask again, can you provide a
real example of a scenario that benefits or requires this functionality?
The more I think about it the less I see any usefulness.

> Finally, whether serialize() stays as a function or a 
> description is not a
> big deal to me, but I do not believe that your desire to hide 
> serialize()
> has anything to do with my question.  Whether implicit or explicit,
> serialize will either generate the BOM and XMLDecl or it will 
> not.  It can
> only generate the BOM and XMLDecl if it was previously 
> preserved by the
> parse. This has nothing to do with whether serialize() is implicit or
> explicit.

Yes, my concern is not with the need to serialize, only with the redundancy
of providing this functionality twice, the potential for the unwary user to
screw things up, and minimaly constraining the implementation strategies.

> John Boyer
> Software Development Manager
> PureEdge Solutions, Inc. (formerly UWI.Com)
> jboyer@PureEdge.com
> 
> 
> -----Original Message-----
> From: w3c-ietf-xmldsig-request@w3.org
> [mailto:w3c-ietf-xmldsig-request@w3.org]On Behalf Of Jonathan Marsh
> Sent: Friday, March 17, 2000 3:43 PM
> To: 'John Boyer'; TAMURA Kent; IETF/W3C XML-DSig WG
> Cc: Martin J. Duerst; Christopher R. Maden; James Clark
> Subject: RE: Xpath transform changes and questions
> 
> 
> > -----Original Message-----
> > From: John Boyer [mailto:jboyer@PureEdge.com]
> >
> > Hi Jonathan,
> >
> > Perhaps I've missed something huge here.  You said that 
> node-sets are
> > automatically serialized.  Could you please point out where
> > in the XPath
> > spec it says anything about this?
> 
> I was referring to the XML Sig spec section 6.6.3.4: "If the 
> result of the
> XPath expression is a node-set, then the XPath transform output is the
> string result of calling serialize() on the node-set."
> 
> So as a user I don't have to call serialize() every time 
> myself, it gets
> invoked automatically if my expression returns a nodeset.  Or 
> am I missing
> something?
> 
> >
> > Thanks,
> > John Boyer
> > Software Development Manager
> > PureEdge Solutions, Inc. (formerly UWI.Com)
> > jboyer@PureEdge.com
> >
> >
> > -----Original Message-----
> > From: Jonathan Marsh [mailto:jmarsh@microsoft.com]
> > Sent: Friday, March 17, 2000 3:06 PM
> > To: 'John Boyer'; TAMURA Kent; IETF/W3C XML-DSig WG
> > Cc: Martin J. Duerst; Christopher R. Maden; James Clark
> > Subject: RE: Xpath transform changes and questions
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: John Boyer [mailto:jboyer@PureEdge.com]
> > > Sent: Friday, March 17, 2000 12:48 PM
> > > To: TAMURA Kent; IETF/W3C XML-DSig WG
> > > Cc: Jonathan Marsh; Martin J. Duerst; Christopher R. Maden;
> > > James Clark
> > > Subject: Xpath transform changes and questions
> > >
> > >
> > > Hi,
> > >
> > > I have a few concerns that I think can be worked out, so I am
> > > requesting
> > > feedback on the information below.  If everything works out,
> > > then yes, we
> > > could remove parse() and exact order.
> > >
> > > 1) On parse(),
> > >
> > > I would be in favor of dumping parse() if we can solve ALL of the
> > > implementation problems it solves.  It  solved some
> > > interesting, but not too
> > > important problems for XPath transform users, but it also
> > > specified how
> > > implementers were to solve certain problems.
> > >
> > > Please have a closer look at the output expectations of
> > > serialize().  The
> > > serialize() function cannot operate without several features
> > > of parse().  In
> > > particular,
> > >
> > > i) Serialization of the root node requires that we output the
> > > byte order
> > > mark and xmldecl read by parse() on input.  If parse() is not
> > > under our
> > > control, we cannot specify that it retains this information.
> >
> > It seems useful that the BOM and encoding are preserved through
> > re-serializing the document.  If these are inputs to the 
> serialization
> > mechanism, this is added incentive not to expose the
> > serialization mechanism
> > as a function.  Otherwise these would have to be passed
> > through as variables
> > (as you are doing) and users must not forget to use them, and
> > must use them
> > appropriately (e.g. not change them), if they want correct results.
> >
> > If serialization is not exposed as a function, but is performed
> > automatically, these problems are avoided.  Since it seems that this
> > capability already exists (nodesets are automatically
> > serialized), at least
> > the simple cases are already handled.  Thus an explicit
> > serialize() and BOM
> > and encoding variables seems to be an advanced feature.  It's
> > necessity in
> > solving important problems should be weighed against its 
> potential for
> > abuse.  I haven't seen you justify your design yet.  Mainly
> > I'm curious
> > here, not trying to kill serialize().  Of course, if you
> > can't justify it,
> > drop it.
> >
> > > This would
> > > seem to suggest that root node serialization should result in
> > > the empty
> > > string, which in turn suggests that serialize should 
> output in UTF-8
> > > regardless of the input encoding.  That would be OK with me.
> > >
> > > ii) Attribute and namespace serialization require a namespace
> > > prefix.  Based
> > > on a new read of XPath I believe this information must be
> > > available, but I
> > > want to be sure.
> >
> > Note in XPath that Namepace Nodes are different in quantity
> > and position
> > than the attributes used to declare namespaces.  Your current
> > serialization
> > does not take this into account.
> >
> > On the other hand, serialization does not necessarily have to
> > be limited to
> > the XPath Data Model.  This model of a document is used when
> > locating the
> > nodes, but given a document and a set of locations, your
> > serializer can
> > describe what to do on it's own terms.  Specifically, any namespace
> > attributes in the source are copied through unchanged, and namepace
> > attributes are added to elements taken out of scope (e.g.
> > parents trimmed)
> > to represent the namespace nodes of that element.  Retain the
> > prefixes in
> > all cases - otherwise QNames in content (e.g. XPaths) will
> > break under the
> > transformation.  Maybe the experts will have some comments on
> > this idea...
> >
> > > iii) If everything else checks out, we can get rid of exact
> > > order and just
> > > use lex order provided that lex ordering in UTF-16 results in
> > > the same order
> > > as lex ordering in UTF-8 (which is Christopher Maden's claim).
> > >
> > > Also, parse() has an additional feature that would need to be
> > > dealt with in
> > > some other way:
> > >
> > > iv) If the parser used to implement parse() is
> > > non-validating, then parse()
> > > is required to throw an exception if it encounters an
> > > external reference
> > > that would cause it to interpret the document differently
> > > than a validating
> > > parser.  This exception is necessary since an unverifiable
> > > signature is
> > > different than an invalid signature.
> >
> > I don't see that moving parsing out of parse() changes this.  The
> > restriction still applies, and needs to be stated.  An
> > implementation would
> > either need to initialize their parser to provide such an
> > exception during
> > parsing, or do some post-parsing pre-filtering checks.
> >
> > > 2) On eliminating exprBOM and exprEncoding.
> > >
> > > Sounds fine.  Sounds like any difference of encoding
> > between the Xpath
> > > expression and the transform input will be handled implicitly
> > > by the XPath
> > > transform implementation.
> > >
> > >
> > > 3) On automatic serialization,
> > >
> > > There was some concern that serialization should be automatic
> > > ("why call
> > > serialize() when that's always what we want to do").  Please
> > > see the first
> > > paragraph of section 6.6.3.4, which already includes this feature.
> >
> > The question perhaps is better stated as "why do we need an
> > explicit way to
> > serialize, instead of always relying on the default that 
> nodesets are
> > serialized?
> >
> > For one thing, this design deeply entwines serialization with
> > pruning of the
> > tree, which makes it difficult to use off-the-shelf
> > serialization components
> > in an implementation.
> >
> > For another, I would not in general expect XPath
> > implementations to handle
> > large strings such serialize() generates particularly
> > efficiently, since
> > such strings are at this point pretty rare.  For example, I
> > can't really
> > imagine returning strings from XPath asynchronously, which I
> > would expect
> > from a serializing component.
> >
> > > Thus if we remove parse(), then there is no *need* to start
> > > expressions with
> > > a function call.
> > >
> > >
> > > 4) On providing an initial namespace context
> > >
> > > We can provide an initial namespace context as is done in 
> XPointer.
> > >
> > > I was reviewing how XPath handles namespaces, and realized
> > that it is
> > > different than what I had previously understood, and it seems
> > > broken (so
> > > there must be some good reason why it is done the way its
> > > done, or maybe I'm
> > > just misreading the spec).
> > >
> > > Section 2.3 says "A QName in the node test is expanded into
> > > an expanded-name
> > > using the namespace declarations from the expression context.
> > > This is the
> > > same way expansion is done for element type names in start
> > > and end-tags
> > > except that the default namespace declared with xmlns is not
> > > used: if the
> > > QName does not have a prefix, then the namespace URI is null
> > > (this is the
> > > same way attribute names are expanded). It is an error if the
> > > QName has a
> > > prefix for which there is no namespace declaration in the 
> expression
> > > context".
> > >
> > > This seems to indicate that the input XML document's
> > > namespace declarations
> > > are ignored and the expression context's namespace
> > > declarations are used
> > > solely.
> >
> > Yes.  Prefixes are scoped, and can change throughout the
> > document, so it is
> > not possible to use these declarations in a global context
> > such as an XPath.
> > Also, the namespace rec implies that prefixes can be changed without
> > changing the underlying names.  Since XPath has it's own namespace
> > declarations, it is unaffected by prefix changes in the
> > source document.
> >
> > > When XPath claims to be XML namespace compliant, I thought
> > > that meant it
> > > would interpret a node's namespace in the context of the namespace
> > > declarations in the document, but that appears not to be
> > > true.  To clear
> > > this up, suppose I have an element x:E, and the document
> > > containing this
> > > element associates x with www.w3.org, but the expression
> > > context associates
> > > x with www.ietf.org, then which value will the
> > > namespace-uri() function
> > > applied to x:E return?
> >
> > namespace-uri(x:E) will either return the namespace declared
> > in the context
> > (www.ietf.org) or an empty string if no element {www.ietf.org
> > : E) exists in
> > the document, which appears to be the case in your question.
> > In short,
> > XPath cannot talk about an element without knowing it's full
> > name, including
> > the namespace uri.  This is an essential component to it's namespace
> > awareness.
> >
> > > If it returns www.ieft.org, then although it seems weird to
> > me that we
> > > aren't using the namespace declarations in the document, it
> > > would at least
> > > be good in the sense that it implies XPath 
> implementations have the
> > > namespace prefix kicking around for serialize().
> > >
> > > It also would mean that Jonathan Marsh is correct in
> > > requiring an initial
> > > namespace context since we could not do ANY namespace
> > > comparisons without it
> > > (the XPath seems to say that it is an error to use a QName
> > > containing a
> > > namespace prefix in an expression if that namespace prefix is
> > > not defined in
> > > the expression context).
> >
> > Yep.  This really isn't a problem for XPaths appearing in XML
> > documents,
> > since there is a ready set of namespace declarations (and an
> > existing syntax
> > for declaring them) to pass into XPath.  We couldn't actually
> > make this part
> > of XPath because certain uses of XPath do not appear in an
> > XML document
> > context - namely XPointers embedded in URIs.
> >
> > I don't veiw this issue as a conceptual mistake, but as a
> > cheap fix with
> > large author simplicity benefits.  It's virtually free (no 
> new syntax
> > needed) - just add one line saying that the namespaces are
> > initialized to
> > the namespaces in scope on the <xpath> element.
> >
> > > I would appreciate your feedback, esp. from those who have
> > > sent prior emails
> > > and therefore seem to be most interested in how this turns
> > > out.  If you
> > > would please give this some extra priority, I will prepare an
> > > alternative
> > > document for consideration before or during the meeting in
> > Adelaide (I
> > > regret that I will not be at that meeting, but I will be at
> > > the following
> > > one in Victoria ;-).
> > >
> > > John Boyer
> > > Software Development Manager
> > > PureEdge Solutions, Inc. (formerly UWI.Com)
> > > jboyer@PureEdge.com
> > >
> > >
> > > -----Original Message-----
> > > From: w3c-ietf-xmldsig-request@w3.org
> > > [mailto:w3c-ietf-xmldsig-request@w3.org]On Behalf Of TAMURA Kent
> > > Sent: Friday, March 17, 2000 12:59 AM
> > > To: IETF/W3C XML-DSig WG
> > > Subject: Re: XSL WG comments on XML Signatures
> > >
> > >
> > >
> > > > <John>
> > > > XPath filtering will not be substantially rewritten.  Based
> > > on Clark's
> > > > feedback, we can remove the parse function and instead
> > > simply assert that
> > > > the transform input is parsed and provided to XPath as a
> > > node set.  The
> > > > notions of lex and exact order will be removed (since we
> > > cannot directly
> > > > specify the parse).
> > >
> > > That's good!  It would be easy to understand, easy to implement,
> > > easy to use.
> > >
> > > --
> > > TAMURA Kent @ Tokyo Research Laboratory, IBM
> > >
> >
>
Received on Friday, 17 March 2000 19:42:04 UTC