W3C home > Mailing lists > Public > public-i18n-its@w3.org > January to March 2006

[ESW Wiki] Update of "its0601ReqInlineElements" by YvesSavourel

From: <w3t-archive+esw-wiki@w3.org>
Date: Fri, 27 Jan 2006 22:58:45 -0000
To: w3t-archive+esw-wiki@w3.org
Message-ID: <20060127225845.6797.2284@localhost.localdomain>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by YvesSavourel:
http://esw.w3.org/topic/its0601ReqInlineElements


------------------------------------------------------------------------------
  '''[http://esw.w3.org/topic/its0505WikiProcess  Status: Initial Draft]''' (Req Doc)
  ie. please focus on technical content, rather than wordsmithing at this stage.
  
- = Indentifying inline elements =
+ = Identifying inline elements =
  
  Initial input: [http://people.w3.org/rishida/localizable-dtds/#inline-elements]
  
  == Summary ==
  
- [R025] Methods must exist to allow to distinguish in a document the element that are "inline" (i.e. part of a mixed content) and the ones that are not.
+ [R025] Methods must exist to allow the distinction between block and inline elements.
  
  '''[YS]- Andrzej, because we want to move forward quickly on this topic, and we were not sure if you would have the time to work on it, I've taken the action item to get it started. Obviously, feel free to edit it as needed.'''
  
  
  == Challenges ==
  
- Knowing which elements are inline and which ones are not is important for most linguistic-related process:
+ Most applications preparing data for linguistic-related processes need to be able to make the distinction between elements that associate properties to spans of text content (e.g. formatting properties), and elements that structure the content.
  
-  * The segmentation of the document into sentences has to be in part driven from the structural elements.
+ Some of the reasons such distinction is often necessary are the following.
  
-  * Inline elements must remain withing the text so they can be modified if necessary.
+  * Segmentation
  
+  The translatability of a document is greatly enhanced by the ability to segment its content into sentences, and such segmentation has to be, in part, driven from the knowledge whether elements are block or inline. For example, given the following content:
+ 
+  {{{<section><title><kw>select</kw> Element<title>
+ <p><a>The main problems are:</a></p><ul><li>
+ <p>users may not have the fonts needed to display the text and graphics
+ cannot be used</p></li><li><p>it is hard to find a <kw>label</kw> for 
+ the list that is not language-specific</p>
+ </li><li><p>users cannot see or access the links <em>straight away</em></p>
+ </li></ul></section>}}}
+ 
+  A processor without specific semantic knowledge of the tags or the text "sees" the content like this:
+ 
+  {{{<x>
+  <x>
+   <x>
+    zzzzzz
+   </x>
+   Zzzzzzz
+  <x>
+  <x>
+   <x>
+    Zzz zzzz zzzzzzzz zzz:
+   </x>
+  </x>
+  <x> 
+   <x>
+    <x>
+     zzzzz zzz zzz zzzz zzz zzzzz zzzzzz zz zzzzzzz zzz zzzz zzz zzzzzzzz zzzzzz zz zzzz
+    </x>
+   </x> 
+   <x>
+    <x>
+     zz zz zzzz zz zzzz zz 
+     <x>
+      zzzzz
+     </x>
+      zzz zzz zzzz zzzz zz zzz zzzzzzzz-zzzzzzzz
+    </x>
+   </x> 
+   <x>
+    <x>
+     zzzzz zzzzzz zzz zz zzzzzz zzz zzzzz 
+     <x>
+      zzzzzzzz zzzz
+     </x>
+    </x>
+   </x> 
+  </x>
+ </x>}}}
+ 
+  While a process with a simple knowledge of whether elements are inline or block can "sees" the content like this:
+ 
+  {{{...
+  <B><I>zzzzzz</I> Zzzzzzz<B>
+  <B><I>Zzz zzzz zzzzzzzz zzz:</I></B>
+  ...
+  <B>zzzzz zzz zzz zzzz zzz zzzzz zzzzzz zz zzzzzzz zzz zzzz zzz zzzzzzzz zzzzzz zz zzzz</B>
+  ...
+  <B>zz zz zzzz zz zzzz a <I>zzzzz</I> zzz zzz zzzz zzzz zz zzz zzzzzzzz-zzzzzzzz</B>
+  ...
+  <B>zzzzz zzzzzz zzz zz zzzzzz zzz zzzzz <I>zzzzzzzz zzzz</I></B>
+  ...}}}
+ 
+  This later view of the content provides much better chances to perform successfully linguistic tasks such as machine translation, terminology extraction, translation memory matching, spell-checking, or grammar verification.
+ 
+ 
+  * Modification
+ 
+  During linguistic-related processes inline elements are needed along with the text so they can be:
+ 
+   * modified
+   * deleted
+   * moved around
+   * used as anchor for text alignment
+   * used in text comparison
+   * used to help identifying part-of-speech
+ 
+  while the block elements are most of the time, left alone.
+ 
+ 
-  * Infering which elements are inline only from the document context is always enough. For example in the following code:
+ Inferring whether an element is inline or block using only the document context is not always enough. For example in the following code:
   
-  {{{<p><em>Special text</em></p>
+ {{{<title><em>Special text</em></title>
- <p>Less special text</p>}}}
+ <li><p>Less special text</p></li>}}}
  
-  there is no programatic way of guessing that <para> is structural, but <em> is an inline element.
+ without sementic knowledge of the tags there is no programmatic way of guessing that <p> is block, and <em> is inline.
  
+ 
+ Provisions may also be needed to address the cases where a block element may have the characteristics of an inline element (or conversely). For example, in the following code:
+ 
+ {{{<para>Palouse horses<footnote>A Palouse horse is the same as 
+ an Appaloosa.</footnote> have spotted coats.</para>}}}
+ 
+ the content of the <footnote> element should be treated as a separate block of text from the content of <para>.
+ 
Received on Saturday, 28 January 2006 05:08:18 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:43:06 UTC