Re: ACTION-512 - Research getting line and column information from XPath

Hi all,
Here's the latest on line number info in Java:

We've decided to implement a static method in DOMUtils that given any node
will return the line number of that node in the original document. To do
this, we're leveraging off of Saxon's DocumentBuilderFactoryImpl. When
certain properties of this class are set, the basic Node is wrapped with
information that includes the original line number. The subsequent method in
DOMUtils called getNodeLineNumber(Node node) basically unwraps this
information and returns the line number.

This final implementation was decided upon after a few dead ends with Xerces
and Saxon's TinyTree. Both offer a class of ElementImpl (or variation of
that name) which contain a method called getLineNumber() and
getColumnNumber(). However, after trying to implement these methods
unsuccessfully, we learned that these methods were inherited from an
interface and had remained unimplemented. After some more searching, it
seems as if there are no other methods that we can use to get column number
without some intense hacking on our side by possibly wrapping the nodes
ourselves with this information.

So, in the end, we have line number but no column number. I'm going to open
this up for input and get some thoughts. If no one has any objections, I'll
commit the changes soon.

Cheers,
Laura

On 6/19/07, Sean Owen <srowen@google.com> wrote:
>
> On 6/19/07, Roland Gülle <roland@7val.com> wrote:
> > After thinking a bit more at returning line and column direct from XSLT,
> > I think we need more than only the Saxon line-number() or Xerces
> > function.
> >
> > The document we transform with the XSLT is the moki document,
> > so we will get the position in the moki document.
> > We can return the URI (or internal ID) and (original) XPath and solve
> > this in a separate process,
> > or call an own function with the URI and XPath that get the line and
> > column information from the original document.
> >
> > So... only XSLT (1.0 or 2.0) will not solve this problem, there is
> > Java code needed (and you know - I'm not the Java guy).
> > Maybe with a small XSLT (and a Xerces process?),
> > where the XSLT gets the XPath dynamic from the function and the
> > source document represented by the URI.
>
>
> Yes, ideally the underlying Document / DOM representing the original
> source document includes line number information.
>
> It looked like Xerces provided this, but it actually doesn't. DOM
> Level 3 doesn't say this is coming as part of the spec either.
>
> I agree with Roland, Saxon's line number support only helps XSLT
> transforms, not parsing. Maybe we can "parse" documents with an
> identity transform to get line info.
>
> Or maybe Saxon's counterpart parser, Aelfred, also has this magic.
>
> Laura is looking at these.
>
> Beyond that... I'm not sure what we can do except concoct some kind of
> SAX parser that finds nodes and positions, locates the matching DOM
> node and somehow annotates it by replacing it with a wrapper
> implementation that also includes position info. Not impossible, but
> not nice. I think we should pursue this if Aelfred can't help us out.
>
> Sean
>

Received on Wednesday, 20 June 2007 18:01:33 UTC