- From: Matthew Paul Thomas <mpt@myrealbox.com>
- Date: Tue, 26 Dec 2006 01:50:31 +1300
On Dec 22, 2006, at 3:23 AM, Benjamin Hawkes-Lewis wrote: > > Henri Sivonen wrote: > ... >> Also, it seems to me that the usefulness of non-heuristic machine >> consumption of semantic roles of things like dialogs, names of >> vessels, biological taxonomical names, quotations, etc. has been >> vastly exaggerated. > > I'm not entirely sure what "non-heuristic machine consumption" is, An example of non-heuristic machine consumption is where Google Glossary thinks: "In an HTML 3.2 or earlier document containing the code '<dl><dt>foo<dt> <dd>bar</dd></dl>', 'bar' is a definition of 'foo'". (It probably thinks the same about HTML 4 documents, too, which is applying a small "ignore that nonsense about dialogues" heuristic.) An example of heuristic machine consumption is where Google Glossary thinks: "In an HTML document containing the code '<p><b>foo:</b> bar</p>', 'bar' is probably a definition of 'foo', especially if the page has several consecutive paragraphs with that structure and different bold text." Non-heuristic machine consumption fails when semantic elements are abused, and becomes practical when elements have multiple popular meanings (examples of the latter include <dl> in HTML 4, and <p> in HTML 5). Heuristic machine consumption fails occasionally by the very nature of heuristics (examples currently include <http://www.google.com/search?q=define:author> and <http://www.google.com/search?q=define:editor>.) -- Matthew Paul Thomas http://mpt.net.nz/
Received on Monday, 25 December 2006 04:50:31 UTC