W3C home > Mailing lists > Public > public-i18n-its@w3.org > October to December 2005

[ESW Wiki] Update of "its0908LinguisticMarkup" by GoutamSaha

From: <w3t-archive+esw-wiki@w3.org>
Date: Mon, 24 Oct 2005 20:09:02 -0000
To: w3t-archive+esw-wiki@w3.org
Message-ID: <20051024200902.25766.26567@localhost.localdomain>
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification.

The following page has been changed by GoutamSaha:
http://esw.w3.org/topic/its0908LinguisticMarkup


------------------------------------------------------------------------------
   
  =='''Understanding Content Domain Level Markups:-'''==
  
- In order to find out the content domain for a paragraph of text, we normally find that content domain is nothing but the most frequently occurred word (e.g. a noun) in that paragraph. For example, in a paragraph, if we see that the word-frequency of a word say, "football" is the maximum among other words' frequencies, then the content domain is "football" only.  
+ In order to find out the content domain for a paragraph of text, we normally find that content domain is nothing but the '''most frequently occurred word''' (e.g. a noun) in that paragraph. For example, in a paragraph, if we see that the word-frequency of a word say, "football" is the maximum among other words' frequencies, then the content domain is "football" only.  
+ Again, a word with the maximum '''word-desnsity''' may often be a Content Domain. The ratio of the number of times a word appears in a document to the size (total number word counts) of the document is called the word density. It is a measure of how important a word is to the overall content of the document. A higher word density results in a higher relevance ranking. 
  
  == Challenges ==
  
Received on Tuesday, 25 October 2005 06:48:37 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:43:06 UTC