W3C home > Mailing lists > Public > public-i18n-its@w3.org > January to March 2006

[ESW Wiki] Update of "its0601ReqInlineElements" by YvesSavourel

From: <w3t-archive+esw-wiki@w3.org>
Date: Fri, 03 Feb 2006 22:01:22 -0000
To: w3t-archive+esw-wiki@w3.org
Message-ID: <20060203220122.19091.77937@localhost.localdomain>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by YvesSavourel:
http://esw.w3.org/topic/its0601ReqInlineElements


------------------------------------------------------------------------------
  
  '''[http://esw.w3.org/topic/its0505WikiProcess  Status: Initial Draft]''' (Req Doc)
  ie. please focus on technical content, rather than wordsmithing at this stage.
+ 
+ 
+ = Segmentation Hints =
+ 
+ [R025] Methods, independent of the semantic, of the elements must exist to provide hints on how to break down document content into meaningful runs of text.
+ 
+ Many applications that process content for linguistic-related tasks need to be able to perform a basic segmentation. They need to be able to do this without knowing about the semantic of the elements. The elements marking up the document content should provide generic clues to help such process.
+ 
+ In this requirement a 'text run' is defined as the longest collection of sequentially traversable nodes that, if you remove the tags, has a continuous linguistic meaning. (for example a paragraph with two consecutive sentences is a single text run, but two sentences with one embedded in the other constitute two text runs.) '''[YS- Not sure about this definition]'''
+ 
+ From this viewpoint, one can distinguish several types of element:
+ 
+  * Type 1: Elements that do not contain direct text nodes. For example: <table> in XHTML.
+ 
+  * Type 2: Element that contain mixed nodes belonging to one or more text runs. For example: <p> in XHTML.
+ 
+  * Type 3: Elements that contain mixed nodes belonging to a single text run. For example: <img> in XHTML 2, or <image> in DITA. (Note: <img> XHTML 2 is different from <img> of XHTML 1.1).
+ 
+  * Type 4: Elements that contain mixed nodes belonging to an element of type 2 or 3. For example: <strong> or <span> in XHTML.
+ 
+  * '''[And possibly]''' Type 5: Empty elements belonging to an element of type 2, 3, or 4 and indicating a strong possibility of sub-segmentation. For example: <br/> in XHTML. '''[But I'm not sure this belong to ITS, because: a) it affects sub-segmentation not the text runs, b) such elements would probably be considered bad practice).]'''
+ 
+ A processor should be able to know from a method or infer from the content to which category or categories each element belongs.
+ 
+ 
+ '''[YS- Previous text is below just in case we need it back]'''
+ ----------
  
  = Identifying inline elements =
  
Received on Saturday, 4 February 2006 05:18:45 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:43:06 UTC