- From: Asgeir Frimannsson <asgeirf@redhat.com>
- Date: Thu, 19 Jun 2008 10:17:57 +1000
- To: Felix Sasaki <fsasaki@w3.org>
- Cc: public-i18n-its-ig@w3.org
Hi Felix, all, On Tuesday 17 June 2008 11:04:49 Felix Sasaki wrote: > Jirka Kosek さんは書きました: > > Asgeir Frimannsson wrote: > >> I guess this is one of the areas where you have a gut feeling that > >> something could be done better, but have no implementations to > >> justify that claim :) Some of the main drawbacks with ITS at the > >> moment are: > >> - Having to load the instance document into memory for processing > >> - Having to traverse the in-memory DOM for each rule, as most xpath > >> processors take one expression and returns a node set. > > > > Please note that as long as you stick to XPath patterns (not full > > expressions) you can use internal pattern matching API of XSLT > > processor which is optimized for this task and gives much better > > performance then naive evaluating of each XPath against document tree. > > Asgeir, thanks for pointing to the Blog from Jeni, and Jirka, thanks for > pointing out the benefit of using XPath (XSLT) patterns here. I'm > wondering if these patterns would do the job for Asgeir, and I'm aware > that this is no perfect solution. If you, Asgeir, still want to have > something more streamable, "Compile a state machine based on a set of > rules", it would be good to know how you want to construct these rules: > based on XPath, a subset of XPath (like the XSLT patterns or the EBNF in > the Wiki), or something completely different. A bit of background: This topic initially started over a conversation between Yves (Savourel), myself and Jim (Hardgrave), where Yves briefly mentioned his work on the ITS api. I - perhaps prematurely - argued that there had to be a better solution than using a memory-intensive DOM parser for converting XML documents to/from typical localisation formats. Now, much thanks to the wisdom of Jirka and Felix, I do see that this problem is not as simple a I initially thought :) The deeper question I'm asking is perhaps if the full ITS spec is a bit overkill for many situations. For most formats (docbook, dita, etc), isn't a very limited knowledge of the structure of a document enough to determine these i18n attributes? Look e.g. at the example ITS rules in the 'best practices' document, where the majority of rules uses a very simple "contextual subset" of xpath. In most cases the namespace+element names (or attribute + parent element) are enough information to determine the i18n- attributes. This looks more than a 'schema' like language than the ITS pattern-based approach, and perhaps a way of annotating the schema/dtd/etc would be a better approach for many formats? Now, I'm NOT suggesting a change to ITS itself, as it serves many other use- cases than what I deal with. And once we go beyond the use-case I described above, ITS suddenly becomes very powerful and attractive. I do not have an immediate need for a streaming ITS processor, hence neither time to work develop one. ...Although at some point when we do start using ITS more heavily, I might have to revisit this. It's nevertheless a very interesting problem :) cheers, asgeir
Received on Thursday, 19 June 2008 00:19:01 UTC