W3C home > Mailing lists > Public > www-html@w3.org > August 2005

Re: tag for notion and compound indication

From: Christophe Strobbe <christophe.strobbe@esat.kuleuven.be>
Date: Thu, 04 Aug 2005 20:55:39 +0200
Message-Id: <>
To: www-html@w3.org

Dear Benjamin,

At 20:33 4/08/2005, acc10-2005-67@gmx.de wrote:

>Orion Adrian wrote:
> > The only things that should be marked up
> > are those things that a computer cannot
> > do itself.
>Ok, but thats why the marking of notions and compound break points is an
>issue for the coder.
>How should a machine know on its own the notion structure of a text or the
>compounds? I know that their is a lot of research in artifical inteligence
>but I do not expect my machine to get in touch with it soon ;-). In fact
>only the comound break point analysis could be done automatically, but only
>by checking the text against highly qualified dictionaries and I do not see
>this as an appropriate solution, when you can store this information in the
>document itself.

Wouldn't it be easier for the author to store his "notions" in a database
(or other persistence mechanism) and to use a crawler that is aware of this
database to generate the index? If the website is generated from another
XML format that you define, you could mark up notions in that format and
generate the XHTML 2 pages and the index from the same XML.
You can also use the dfn element type, although that is limited to marking
up the "defining instance" of a word or phrase (i.e. the instance that has
some kind of definition in the direct neighbourhood). You could then tell
your "indexer" to find dfn elements and then find all other occurrences of
these words and phrases (i.e. if 'Bundesregierung' is somewhere marked up
with 'dfn', find also all other occurrences of Bundesregierung).



Christophe Strobbe
K.U.Leuven - Departement of Electrical Engineering - Research Group on 
Document Architectures
Kasteelpark Arenberg 10 - 3001 Leuven-Heverlee - BELGIUM
tel: +32 16 32 85 51
Received on Thursday, 4 August 2005 18:56:39 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:06:11 UTC