- From: <bugzilla@wiggum.w3.org>
- Date: Fri, 10 Mar 2006 22:26:12 +0000
- To: public-i18n-its@w3.org
- CC:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=2878 ------- Comment #3 from ysavourel@translate.com 2006-03-10 22:26 ------- This is a note from Andrzej and Yves: We discussed the "segmentation/inliness" topic today and we came up with a proposal for it. Here it is: ===1: The name Since we wanted to avoid 'inline' for its other meanings in the the domain of representation/rendering and 'segment' for its meaning in localization, we came up with "Elements within text" as the name for this data category. ===2: The aim The aim of this data category is to identify the elements that are within text content and do not contain a text node that belongs to a different text unit. Knowing these elements allow linguistics-related tool to break down the text of the document into text units that are meaningful. No schema information or programmatic methods allow to detect all cases of such elements. Example, in the following code: <p><b>Palouse</b> horses<fn callout="#">A Palouse horse is the same as an <b>Appaloosa</b>.</fn> have spotted coats.</p> The element <b> is the only to be defined as "within text". In the following OpenDocument code: <text:p text:style-name="Standard"> Palouse horses <text:note text:id="ftn1" text:note-class="footnote"> <text:note-citation>1</text:note-citation> <text:note-body> <text:p text:style-name="Footnote"> A Palouse horse is the same as an Appaloosa.</text:p> </text:note-body> </text:note> have spotted coats.</text:p> None of the elements is to be defined at "within text". The processing expectation for this data category is to break down the text of a document in separate text units where: a) Any element identified as 'within text' remain with its enclosing text. b) And any other element is removed or left in the form of a place-holder. > ===3: ITS Markup We came up with two different possible solutions to code this information in ITS: One using XPath expression, the other using a list of element names. With XPath: <its:documentRules> <its:withinTextRule its:selector="//em" its:withinText="yes" /> <its:withinTextRule its:selector="//strong" its:withinText="yes" /> ... </its:documentRules> With list: <its:documentRules> <its:withinTextRule its:list="em strong..." its:withinText="yes" /> </its:documentRules> -- Yves is of the opinion to use the list (but could live with the selector): Using XPath would force (at least in DOM) to decorate the document to know whether an element is "wintin text" or nor when traversing the document tree. There are no easy or unexpensive way to know if a given element is matching or not an XPath expression when accessing the tree directly. Since we have not been able so far to come up with cases where an element would be "within text" or not depending on its context, it seems using XPath is not as justified here as it is in other data categories. -- Andrzej preferes XPath: It provides more control and might well be required in certain conditions. One can imagine that there could well be situations where an element is 'within text' in one context, and not in another, so XPath provides the maximum flexibility. -Andrzej and Yves
Received on Friday, 10 March 2006 22:46:20 UTC