W3C home > Mailing lists > Public > public-i18n-its-ig@w3.org > June 2008

Re: Streaming ITS processor

From: Jim Hargrave <jhargraveiii@gmail.com>
Date: Tue, 17 Jun 2008 22:48:01 -0600
Message-ID: <48589381.9000302@gmail.com>
To: public-i18n-its-ig@w3.org

I wonder if there could be a compromise. The XML package I use in Python 
(ElementTree) lets you iterate over chunks of XML using a list of 
supplied xpath expressions. You get the benefits of in-memory 
processing, while limiting processing to these smaller chunks of XML. In 
most of the XML instances we localize it is possible to break it up into 
smaller, self-contained pieces. I could imagine an XML filter having a 
"chunking" parameter which would take a list of xpath expressions that 
would allow the filter to iterate over these chunks in the proper 
sequence. You could then apply the ITS rules to these smaller pieces.

The tricky part, for some XML instances, would be choosing the right  
atomic units that would provide complete self contained context for the 
ITS rule set. I wonder if this could this be determined automatically? 
If not setting it manually would be no more difficult then writing the 
ITS rules in the first place.


Felix Sasaki wrote:
> Jirka Kosek さんは書きました:
>> Asgeir Frimannsson wrote:
>>> I guess this is one of the areas where you have a gut feeling that 
>>> something could be done better, but have no implementations to 
>>> justify that claim :) Some of the main drawbacks with ITS at the 
>>> moment are:
>>> - Having to load the instance document into memory for processing
>>> - Having to traverse the in-memory DOM for each rule, as most xpath 
>>> processors take one expression and returns a node set.
>> Please note that as long as you stick to XPath patterns (not full 
>> expressions) you can use internal pattern matching API of XSLT 
>> processor which is optimized for this task and gives much better 
>> performance then naive evaluating of each XPath against document tree.
> Asgeir, thanks for pointing to the Blog from Jeni, and Jirka, thanks 
> for pointing out the benefit of using XPath (XSLT) patterns here. I'm 
> wondering if these patterns would do the job for Asgeir, and I'm aware 
> that this is no perfect solution. If you, Asgeir, still want to have 
> something more streamable, "Compile a state machine based on a set of 
> rules", it would be good to know how you want to construct these 
> rules: based on XPath, a subset of XPath (like the XSLT patterns or 
> the EBNF in the Wiki), or something completely different.
> Felix
Received on Wednesday, 18 June 2008 14:19:34 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:11:27 UTC