W3C home > Mailing lists > Public > public-i18n-its@w3.org > October to December 2005

[ESW Wiki] Update of "its0908LinguisticMarkup" by GoutamSaha

From: <w3t-archive+esw-wiki@w3.org>
Date: Sun, 16 Oct 2005 01:26:28 -0000
To: w3t-archive+esw-wiki@w3.org
Message-ID: <20051016012628.29395.40680@localhost.localdomain>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by GoutamSaha:
http://esw.w3.org/topic/its0908LinguisticMarkup


------------------------------------------------------------------------------
  and would like to assist Goutam in developing this topic when we are ready to deliberate it in detail.
  Well done!.]]'''
  
- [GS- It will be nice if AZ, MJD, FS or anyone share their knowledge on this recently proposed scheme.]]
+ [[GS- Excellent remarks in deed. It will be nice if AZ, MJD, FS, RI,YS or anyone share and add their knowledge on this recently proposed scheme.]]
  
  '''[FS'''- Maybe it was not clear in the minutes of the ITS f2f at ERCIM in September, but that was what we decided to do. Goutam, I hope that I understood you correct that you would agree on what Martin formulated - that we solve the current simple (they are hard enough) problems of ITS first and come back to linguistic markup later.''']]'''
  
@@ -61, +61 @@

            exceptionally special with respect to his/her source language aspects. 
            For example: for the '''Phrases_Idioms "cats and dogs" in english''' we can markup as
  
+ {{{
+ <!-- Markup for Phrases and Idioms -->
+ 
- {{{ <sentence_cat name="phrases_idioms" meaning="heavily"> cats and dogs
+ <sentence_cat name="phrases_idioms" meaning="heavily"> cats and dogs
+ 
- </sentence_cat> }}}
+ </sentence_cat>
+ }}}
  
            Such metadata will be of an immense help to localazation process (in order
            to find an appropriate phrases & idioms in a target language) without
@@ -73, +78 @@

            we can markup as
  
  {{{
- <!-- Markup to handle phrases and idioma -->
+ <!-- Markup for phrases and idioma -->
  
  <sentence_cat name="phrases_idioms" meaning="rarely visible">
  
@@ -88, +93 @@

  
            '''Again, for a link inside an XML PCDATA/ text content, we might differentiate
               links from the text by the following markup to treat them separately, 
-              for an example,        __Click Here for Sign Up__
+              for an example,        __Click Here for Sign Up__ '''
               
  {{{
- <!-- Markup to handle a Link -->
+ <!-- Markup for a Link -->
  
  <sentence_cat name="link"> __Click Here for Sign Up__
  
@@ -112, +117 @@

  For the following Bengali or Bangla ''dialect'' sentence "Kaam (Kaaj in Bangla or Work in english)  Saira Falo (Shesh Koro in Bangla or Complete in english)," we should markup the text with the ''three-layer metadata information'' in the following way:
  
  {{{
+ <!-- Markup for Dialect -->
+ 
  <text xml:lang="ben">
+ 
  <content_domain name="dialect">
+ 
  <!-- content domain metadata -->
+ 
  .... other sentences
+ 
  ....
+ 
   <sentence_cat name="imperative"> 
+ 
   <!-- sentence level metadata is optional here -->
+ 
       <pos_cat name="noun" meaning="work"> Kam </pos_cat>
+ 
       <pos_cat name="verb" meaning="complete"> Saira Falo </pos_cat>
+ 
   <!-- word level parts-of-speech -->
+ 
   </sentence_cat> 
+ 
  ......
+ 
  </content_domain>
+ 
  </text>
+ 
  }}}
     
  ''''''Metadata information about the domain, sentence type or specific words
@@ -136, +157 @@

  without such information. 
   
  For example: 
+ 
  {{{
+ <!-- Markup for Word/Phrase Sense Disambiguation in Context Dependent Usage -->
+ 
  <content_domain name="factory">
  
  Paul works in a factory. 
@@ -195, +219 @@

  (ii) ''' Anaphoric to ''' (or ''' to ''' without verb, e.g., "Yes, I would love ''' to''' . " (the omitted verb after ''to '' say, "dance" is to be learnt through '''discourse analysis ''' ). ) '' 
  
  {{{
+ <!-- Markup for Anaphoric -->
  
  <sentence_cat name="assertive"> 
  
@@ -300, +325 @@

   The sentence "Light a light light." can be marked up with word-level parts-of-speech metadata information in the following way without using finer parts-of-speech categories (depending on the requirements of a translation parser for a specifc language-pair).
  
  {{{
+ <!-- Markup for word-level POS Disambiguation -->
+ 
  <pos_cat name="verb"> Light </pos_cat> a
+ 
  <pos_cat name="adjective"> light </pos_cat> 
+ 
  <pos_cat name="noun"> light </pos_cat> .
  }}} 
  
@@ -406, +435 @@

  </content_domain>
  
  }}}
+ 
+ In many '''sentences''' we often use '''multilingual words'''. For example, in Hindi, "Kaam joldi start Kijiye" (i.e., in English- Start the work immediately. Lexicons: Kaam/ Work, Joldi/ immediately, Kijiye/ Do). Please note that there is an English word "start" in that Hindi sentence. Such usage of multilingual wordings are very common in any urban area. As we are adding the meaning of a foreign language word (e.g., start) in a sentence of some other source language, say, Hindi, so there won't be any problem for understanding a multilingual sentence for a translation parser.
+ 
+ {{{
+ <!-- Markup for a Sentence having Multilingual Words -->
+ 
+ <text xml:lang="hin">
+ 
+ <sentence_cat name="imperative">
+ 
+ Kam Joldi 
+ 
+ <pos_cat name="verb" type="compound" meaning="start"> start kijiye </pos_cat>
+ 
+ </sentence_cat>
+ 
+ </text>
+ 
+ }}}
+ 
+  
+ 
+ 
  
  == Challenges ==
  
Received on Sunday, 16 October 2005 11:26:32 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:43:06 UTC