Re: Streaming ITS processor from Felix Sasaki on 2008-06-19 (public-i18n-its-ig@w3.org from June 2008)

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 19 Jun 2008 13:30:44 +0900
To: Asgeir Frimannsson <asgeirf@redhat.com>
CC: public-i18n-its-ig@w3.org
Message-ID: <4859E0F4.9020801@w3.org>
Hi Asgeir,

Asgeir Frimannsson さんは書きました:
> Hi Felix, all,
>
> On Tuesday 17 June 2008 11:04:49 Felix Sasaki wrote:
>   
>> Jirka Kosek さんは書きました:
>>     
>>> Asgeir Frimannsson wrote:
>>>       
>>>> I guess this is one of the areas where you have a gut feeling that
>>>> something could be done better, but have no implementations to
>>>> justify that claim :) Some of the main drawbacks with ITS at the
>>>> moment are:
>>>> - Having to load the instance document into memory for processing
>>>> - Having to traverse the in-memory DOM for each rule, as most xpath
>>>> processors take one expression and returns a node set.
>>>>         
>>> Please note that as long as you stick to XPath patterns (not full
>>> expressions) you can use internal pattern matching API of XSLT
>>> processor which is optimized for this task and gives much better
>>> performance then naive evaluating of each XPath against document tree.
>>>       
>> Asgeir, thanks for pointing to the Blog from Jeni, and Jirka, thanks for
>> pointing out the benefit of using XPath (XSLT) patterns here. I'm
>> wondering if these patterns would do the job for Asgeir, and I'm aware
>> that this is no perfect solution. If you, Asgeir, still want to have
>> something more streamable, "Compile a state machine based on a set of
>> rules", it would be good to know how you want to construct these rules:
>> based on XPath, a subset of XPath (like the XSLT patterns or the EBNF in
>> the Wiki), or something completely different.
>>     
>
> A bit of background: This topic initially started over a conversation between 
> Yves (Savourel), myself and Jim (Hardgrave), where Yves briefly mentioned  his 
> work on the ITS api. I - perhaps prematurely - argued that there had to be a 
> better solution than using a memory-intensive DOM parser for converting XML 
> documents to/from typical localisation formats. 
>
> Now, much thanks to the wisdom of Jirka and Felix, I do see that this problem 
> is not as simple a I initially thought :)
>
> The deeper question I'm asking is perhaps if the full ITS spec is a bit 
> overkill for many situations. For most formats (docbook, dita, etc), isn't  a 
> very limited knowledge of the structure of a document enough to determine 
> these i18n attributes? Look e.g. at the example ITS rules in the 'best 
> practices' document, where the majority of rules uses a very simple 
> "contextual subset" of xpath. In most cases the namespace+element names (or 
> attribute + parent element) are enough information to determine the i18n-
> attributes. This looks more than a 'schema' like language than the ITS 
> pattern-based approach, and perhaps a way of annotating the schema/dtd/etc 
> would be a better approach for many formats?
>   

In the "beginning" of ITS 1.0 (the first public draft), we had something 
called schemaRule, see below which is taken from
http://www.w3.org/TR/2005/WD-its-20051122/

<xs:element name="p">
 <xs:annotation>
  <xs:appinfo>
   <its:schemaRules>
    <its:schemaRule translate="yes"/>
    <its:schemaRule locInfo="This has to be handled carefully"
     locInfoType="alert"/>
   </its:schemaRules>
  </xs:appinfo>
 </xs:annotation> ...
</xs:element>

we dropped schemaRule for various reasons, but for streaming we might 
think of something similar.

> Now, I'm NOT suggesting a change to ITS itself, as it serves many other use-
> cases than what I deal with. And once we go beyond the use-case I described 
> above, ITS suddenly becomes very powerful and attractive. I do not have an 
> immediate need for a streaming ITS processor, hence neither time to work 
> develop one. ...Although at some point when we do start using ITS more 
> heavily, I might have to revisit this. It's nevertheless a very interesting 
> problem :)
>   

Of course :)

Felix
Received on Thursday, 19 June 2008 04:31:53 UTC