W3C home > Mailing lists > Public > public-i18n-its@w3.org > October to December 2005

[ESW Wiki] Update of "its0908LinguisticMarkup" by GoutamSaha

From: <w3t-archive+esw-wiki@w3.org>
Date: Sun, 16 Oct 2005 07:21:04 -0000
To: w3t-archive+esw-wiki@w3.org
Message-ID: <20051016072104.10911.8249@localhost.localdomain>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by GoutamSaha:
http://esw.w3.org/topic/its0908LinguisticMarkup


------------------------------------------------------------------------------
  == Summary ==
  
  {{{
- The proposed scheme is to demonstrate how to embed syntactic, semantic and computational linguistic related metadata information in the structure of an XML document towards faster and better meaningful Translation through useful Markups meant for both the Internationalization (I18N) & 
+ The proposed scheme is to demonstrate how to embed syntactic, semantic and computational linguistic related metadata information in the structure of an XML document towards faster and better meaningful Translation through useful Markups meant for both the Internationalization (I18N)& Localization (L10N) Processes.
- Localization (L10N) Processes.
  }}}
  
  [R023] To improve the translation process, it should be easy to take advantage of the capability of XML to embed linguistic-related metadata information in the structure of a document.
  
  '''[YS- This is just a first try to get things rolling. Goutam, Andrzej, anyone, please, to comments/add/etc.]]'''
+ 
+ '''[AZ- This is a tremendous piece of work by Goutam. It is also an area that I am very interested in and would like to assist Goutam in developing this topic when we are ready to deliberate it in detail. Well done!.]]'''
  
  '''[MJD- I'm totally amazed by the amount of work that is going into this page. However, this
  shows that the whole field is huge. The properties/elements proposed in this page already
@@ -32, +33 @@

  continue working on linguistic markup, but for the actual work on a Recommendation concentrate
  on the other, simpler problems first, and maybe save linguistic markup for Version 2 of ITS.]]'''
  
- '''[AZ- This is a tremendous piece of work by Goutam. It is also an area that I am very interested in
- and would like to assist Goutam in developing this topic when we are ready to deliberate it in detail.
- Well done!.]]'''
- 
- 
  '''[FS'''- Maybe it was not clear in the minutes of the ITS f2f at ERCIM in September, but that was what we decided to do. Goutam, I hope that I understood you correct that you would agree on what Martin formulated - that we solve the current simple (they are hard enough) problems of ITS first and come back to linguistic markup later.''']]'''
  
  '''[[GS- Excellent remarks/comments in deed. It will be nice if AZ, MJD, FS, RI,YS or anyone add and share their knowledge on this recently proposed scheme.]]'''
  
               
+ {{{
+ The 3-Tier XML Schema approach is useful for an XML content-author to embed a source human language specific metadata information in an XML document. This is a significant step forward toward internationalization and localization processes. An author does not need to markup every parts of his/ her document. Use such markups only at very language specific parts and thus the content does not get overweighted with extra markups.
-    [[GS- The 3-Tier XML Schema approach is useful for an XML content
-          author to embed a source human language specific metadata
-          information in an XML document. This is a significant step
-          forward toward internationalization and localization processes.
-          An author does not need to markup every parts of his/ her document.
-          Use such markups only at very language specific parts and thus the 
-          content does not get overweighted with extra markups.]]
-   
-    [[GS- This is a write-up on how to embed syntactic, semantic and 
-          computational linguistic related metadata information in the
-          structure of an XML document towards better translation.]] 
  
+ }}}
+   
+ This is a write-up on how to embed syntactic, semantic and computational linguistic related metadata information in the structure of an XML document towards better translation. 
+ 
-           '''XML Schema authors should prefer to use attributes for metadata information because of
+ '''XML Schema authors should prefer to use attributes for metadata information because of
-           their better flexibility and portability.''' XML content authors having school level
+ their better flexibility and portability.''' XML content- authors ''having school level
-           language grammar knowledge will not find any difficulty in marking up such language
+ language grammar knowledge''' will not find any difficulty in marking up such language
-           specific information. '''It is not mendatory for an author to add finer classified 
+ specific information. '''It is not mendatory for an author to add finer classified 
-           metadata''' at all. He/she has to add metadata at some parts of his/her content which are
+ metadata''' at all. He/she has to add metadata at some parts of his/her content which are
-           exceptionally special with respect to his/her source language aspects. 
+ exceptionally special with respect to his/her source language aspects. 
+ 
-           For example: for the '''Phrases_Idioms "cats and dogs" in english''' we can markup as
+ For example: for the '''Phrases_Idioms "cats and dogs" in english''' we can markup as
  
  {{{
  <!-- Markup for Phrases and Idioms -->
@@ -217, +209 @@

  
  The forms of a ''' Non-finite Verb ''' is invariant because it is not affected by the (subject-verb) concord system: "He likes ''to swim'' .", "They like ''to swim'' .", "He likes ''eating'' .", " ''Having worked'' hard he felt tired."  
  ''' Non-finite verbs ''' are not essential in a sentence. They are needed just to expand a sentence in order to express various kinds of meanings, so we cannot have a sentence with '' subject + non-finite verb '' without a finite verb. For example, we don't say: "Children '' to fly'' kites ."  Instead we say: "Children '' like '' to fly kites." Here, '' like '' is a finite verb and '' to fly '' is a non-finite verb. Non-finite verb has the structures: (i) '' to + verb '',
- (ii) ''' Anaphoric to ''' (or ''' to ''' without verb, e.g., "Yes, I would love ''' to''' . " (the omitted verb after ''to '' say, "dance" is to be learnt through '''discourse analysis ''' ). ) '' 
+ (ii) ''' Anaphoric to ''' (or ''' to ''' without verb, e.g., "Yes, I would love ''' to''' . " (the omitted verb after ''to '' say, "dance" here, is to be learnt through '''discourse analysis ''' ). ) '' 
  
  {{{
  <!-- Markup for Anaphoric -->
+ 
+ <!-- Markup for "Yes, I Would love to. " -->
  
  <sentence_cat name="assertive"> 
  
@@ -341, +335 @@

  
  }}} 
  
-  The sentence "Light a light light." can be marked up with word-level parts-of-speech metadata information in the following way without using finer parts-of-speech categories (depending on the requirements of a translation parser for a specifc language-pair).
+  The sentence '''"Light a light light."''' can be marked up with word-level parts-of-speech metadata information in the following way without using finer parts-of-speech categories (depending on the requirements of a translation parser for a specifc language-pair).
  
  {{{
  <!-- Markup for word-level POS Disambiguation -->
@@ -426, +420 @@

  
  <!-- to add metadata information for not translating a content -->
  
+ Om Ganga.
+ 
  <!-- default is to translate -->
  
  .... sentences ... 
@@ -463, +459 @@

  {{{
  <!-- Markup for a Sentence having Multilingual Words -->
  
+ <!-- Markup for the Hindi sentence "Kaam Joldi Start Kijiye" -->
+ 
  <text xml:lang="hin">
  
  <sentence_cat name="imperative">
@@ -477, +475 @@

  
  }}}
  
-  
+ '''In many cases we need to skip slang words or slang sentences''' because, we don't want them to be processed or be presented. The markup to skip words or sentences is shown below.
  
+ {{{
  
+ <!-- Markup to skip a word -->
  
+ <pos_cat name="skip">
+ 
+ slang_word
+ 
+ </pos_cat>
+ 
+ }}}
+ 
+ {{{
+ 
+ <!-- Markup to skip a slang sentence -->
+ 
+ <sentence_cat name="skip">
+ 
+ slang_sentence
+ 
+ </sentence_cat>
+ 
+ }}}
+ 
+  
  == Challenges ==
  
     The proposed 3-Tier XML Schema aims to markup both syntactic and semantic metadata 
Received on Sunday, 16 October 2005 17:21:54 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:43:06 UTC