[ESW Wiki] Update of "its0908LinguisticMarkup" by GoutamSaha

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by GoutamSaha:
http://esw.w3.org/topic/its0908LinguisticMarkup


------------------------------------------------------------------------------
  
  <!-- "Ash" stands for the Bangla Calender Month: Ashwin -->
  
+ <pos_cat name="date" type="dd She MMMM, yyyy" meaning="bangla_date"> 22 She Ashwin, 1412 </pos_cat>
+ 
+ <!-- 22nd Ashwin -->
+ 
+ <pos_cat name="date" type="dd I MMMM, yyyy" meaning="bangla_date"> 12 I Ashwin, 1412 </pos_cat>
+ 
+ <!-- 12th Ashwin -->
+ 
+ 
  }}} 
  
  In various languages, we express '''Time''' in various ways. The markup for '''time''' is shown below. 
@@ -665, +674 @@

  
                                <B>simplifies</B></FONT> 
  
-                              DHTML with his powerful library'">
+                              DHTML with powerful library'">
  
          DHTML Library
  </A>
@@ -679, +688 @@

  
  In order to find out the content domain for a paragraph of text, we normally find that content domain is nothing but the '''most frequently occurred word''' (e.g. a noun) in that paragraph. For example, in a paragraph, if we see that the word-frequency of a word say, "football" is the maximum among other words' frequencies, then the content domain is "football" only.  
  Again, a word with the maximum '''word-desnsity''' may often be a Content Domain. The ratio of the number of times a word appears in a document to the size (total number word counts) of the document is called the word density. It is a measure of how important a word is to the overall content of the document. A higher word density results in a higher relevance ranking. 
+ We should not consider preposition, interjection (e.g., Hallo, Sir etc.,) in counting the word density. In many speech/ communication we see "Sir" as the highest word density and it may mislead in finding out the Content Domain. Rather, we should consider the noun words in finding the most frequently occurred word towards Content Domain.   
  
  == Challenges ==
  

Received on Saturday, 29 October 2005 11:34:27 UTC