- From: Christophe Strobbe <christophe.strobbe@esat.kuleuven.be>
- Date: Thu, 04 Aug 2005 20:55:39 +0200
- To: www-html@w3.org
Dear Benjamin, At 20:33 4/08/2005, acc10-2005-67@gmx.de wrote: >Orion Adrian wrote: > > > The only things that should be marked up > > are those things that a computer cannot > > do itself. > >Ok, but thats why the marking of notions and compound break points is an >issue for the coder. > >How should a machine know on its own the notion structure of a text or the >compounds? I know that their is a lot of research in artifical inteligence >but I do not expect my machine to get in touch with it soon ;-). In fact >only the comound break point analysis could be done automatically, but only >by checking the text against highly qualified dictionaries and I do not see >this as an appropriate solution, when you can store this information in the >document itself. Wouldn't it be easier for the author to store his "notions" in a database (or other persistence mechanism) and to use a crawler that is aware of this database to generate the index? If the website is generated from another XML format that you define, you could mark up notions in that format and generate the XHTML 2 pages and the index from the same XML. You can also use the dfn element type, although that is limited to marking up the "defining instance" of a word or phrase (i.e. the instance that has some kind of definition in the direct neighbourhood). You could then tell your "indexer" to find dfn elements and then find all other occurrences of these words and phrases (i.e. if 'Bundesregierung' is somewhere marked up with 'dfn', find also all other occurrences of Bundesregierung). Regards, Christophe -- Christophe Strobbe K.U.Leuven - Departement of Electrical Engineering - Research Group on Document Architectures Kasteelpark Arenberg 10 - 3001 Leuven-Heverlee - BELGIUM tel: +32 16 32 85 51 http://www.docarch.be/
Received on Thursday, 4 August 2005 18:56:39 UTC