Re: Streaming ITS processor from Asgeir Frimannsson on 2008-06-16 (public-i18n-its-ig@w3.org from June 2008)

From: Asgeir Frimannsson <asgeirf@redhat.com>
Date: Mon, 16 Jun 2008 17:47:43 +1000
To: Felix Sasaki <fsasaki@w3.org>
Cc: public-i18n-its-ig@w3.org
Message-Id: <200806161747.43941.asgeirf@redhat.com>

On Sunday 15 June 2008 00:35:07 Felix Sasaki wrote:
> Asgeir Frimannsson さんは書きました:
> > Felix, Jirka, all,
> >
> > On Saturday 14 June 2008 01:43:02 Felix Sasaki wrote:
> >>> Reading through the ITS spec, it seems like ITS only uses a subset of
> >>> xpath, limited to the child and attribute axes (same as xslt patterns).
> >>
> >> in XSLT patterns you can have predicates, like "*[predicate]" , which
> >> can make use of any axis. Would you limit the content of these too?
> >
> > This would make streaming-implementations slightly more complicated yes
> > :) Thank you both for pointing out this issue.
>
> sorry for the overlap in replies, I did not see Jirka's mail while
> sending mine.
>
> > I guess for many (if not most) formats, limiting the content of
> > predicates would be feasible, and this would also speed up the xpath
> > processing. Creating a streaming-like ITS processor that could handle
> > "most documents" in a more efficient manner could perhaps be a useful
> > alternative to a memory-intensive processor that can handle all
> > documents...
>
> I've cretated a Wiki page
> http://www.w3.org/International/its/wiki/ITS_Simplified_XPath
> linked from
> http://www.w3.org/International/its/wiki/ITS_Processing
> Which contains a proposal for a simplified EBNF. Asgeir or others: Could
> you see if it fits your needs, and edit the page accordingly? If you
> have problems with the Wiki account please tell me.

This blog-post by Jeni Tennison gives a very good overview of the 
streamability-problem:
http://www.jenitennison.com/blog/node/61

Quote: "there is no clear line that can be drawn between a streamable XPath 
and an unstreamable one, only a scale between “buffering nothing” and “buffering 
everything” (building an object model). Second, you can’t judge the 
streamability of an XPath expression on its own: there are multiple other 
factors that effect how streamable a given XPath expression is for a particular 
processor."

I guess this is one of the areas where you have a gut feeling that something 
could be done better, but have no implementations to justify that claim :) 
Some of the main drawbacks with ITS at the moment are:
- Having to load the instance document into memory for processing
- Having to traverse the in-memory DOM for each rule, as most xpath processors 
take one expression and returns a node set.
This is naturally costly compared to a solution that could:
1) Compile a state machine based on a set of rules
2) Apply those rules on in a pseudo-streaming fashion

ITS is nevertheless much more powerful that the existing approaches to 
identifying i18n aspects of XML documents :)

cheers,
asgeir

Received on Monday, 16 June 2008 07:48:46 UTC