W3C home > Mailing lists > Public > xproc-dev@w3.org > December 2008

Re: Entity Expansion

From: Norman Walsh <ndw@nwalsh.com>
Date: Sat, 27 Dec 2008 11:58:04 -0500
To: XProc Dev <xproc-dev@w3.org>
Message-ID: <m2y6y1v9lv.fsf@nwalsh.com>
"David A. Lee" <dlee@calldei.com> writes:
> Which leads to another question which I don't expect an answer (but
> would love to have one)...
> What technology exists which preserves all required infoset properties,
> and could be used as the implementation of the "pipe" in xproc ?
> I've thought about StaX which I believe calabash uses (some of?) but
> I've discovered and you've commented its not good enough ... (looses
> the base URI's ... )

I think it could be done with either SAX or StAX, but both would
require some extension, I think.

> I believe an *early intent* of the spec was that text serialization
> *could* be used by an implementation ... but I've yet to figure out a
> text serialization that preserves the base URI's without adding extra
> attributes (which is in violation of the spec).

It's not absolutely clear to me that adding xml:base attributes is a
violation of the spec, but...

Just because you serialize between steps doesn't mean that the
serialization used has to be the standard XML serialization. Since it's
only for communication between steps, it could be completely different:

  <xproc:serialization>
    <xproc:docsequence>
      <xproc:document>
        <xproc:element name="qname" base-uri="...">
          <xproc:attribute name="qname2" value="...">
          <xproc:element name="other-qname" base-uri="...">
       </xproc:element>
    ...
  </xproc:serialization>

> I did some research on "Binary XML" but I don't think the spec is
> quite mature enough yet but I could be wrong ...   Clearly a totally
> in-memory structure (such as DOM or saxon Tree's) could be made to
> work.  And there are various proprietary things (I think some of
> Oracle's streaming API's might work ... but I haven't really dug into
> those).

Binary XML is just an alternate serialization of the infoset, I don't
think that's relevant at all.

> It probably comes down to "roll your own".  Other suggestions are very
> much welcome !

I'd probably look at extending StAX if I was doing it that way again.

But I also might try to roll my own. I'm not really happy with any of
the existing streaming APIs and this would be an interesting
opportunity to try to "get it right".

> Quote NW:
> ---------------
> If your implementation doesn't expand entities, I think you could
> argue that you pass that test if the results you give are consistent
> with unexpanded entities.
> --------------
>
> This would lead me to think that perhaps a more complex (yuck) test
> format is necessary.
> One that allows for implementation allowed variances.

Nah. I don't actually think that's going to come up very often (at
all) and if it does, I'm content to deal with those cases by hand.

If that becomes a burden, then I might look into making the test suite
more flexible.

> I'm not going to push hard for this one case, but I suspect more will
> arise where test cases are coded with implicit assumptions that in
> fact are not the only allowed result. 

Hopefully that won't be very common, but we'll have to wait and see.

> Ultimately this will have to be
> addressed as there need be an "objective" determination of what
> "passes a test".
> I could provide my own test suite with my own output that I claim are
> "passes" but who knows ... who decides if my interpretation of "pass"
> is actually correct ?

For the purposes of getting to Recommendation, the test suite is input
to an approval process by the W3C Director. The Chair and the Staff
Contact (Henry Thompson) will have to present the case to the
Director, so if you submit your own test suite and results, he and I
will have to review it and decide whether we can, in good conscience,
assert that your implementation is conformant. (So please don't :-)

                                        Be seeing you,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com> | A hen is only an egg's way of making
http://nwalsh.com/            | another egg.--Samuel Butler (II)

Received on Saturday, 27 December 2008 16:58:45 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 27 December 2008 16:58:45 GMT