Re: Structure vs. appearance in HTML from Jon Wallis on 1995-09-22 (www-html@w3.org from September 1995)

From: Jon Wallis <jw@scitsc.wlv.ac.uk>
Date: Fri, 22 Sep 1995 08:35:40 +0100
To: Philippe-Andre Prindeville <philipp@res.enst.fr>
Cc: www-html@w3.org
Message-Id: <m0sw2e5-000o6gC@ccug.wlv.ac.uk>

>> > For that matter, HTML doesn't really _have_ that much usable
>> > structure.
>> True, but it's got enough to do a *few* useful things:
[snip]
>>    * Build a keywords-based search index, giving higher weight
>>       to keywords, emphasized phrases, and stuff in headings
>
>This last one is dubious.  I have no way of saying, find me all
>occurences of "Sprint" (as a proper noun, ie. name) in a document
>or set of documents, skipping "sprint" the verb or noun.  Obiously,
>"... winning the men's 100m sprint." does not pertain to
>telecommunications or American corporate culture.

If the author has used a capital S for "Sprint" the company you'd then have
a chance of parsing it properly with search engine.  

Also, if the search tool does promixity indexing you could index the page
according to what other terms occurred near to "sprint".  

Better still, if the page were *classified* (e.g., using Dewey), you'd know
whether the page was about athletics (796) or telecomms (384).  Page
classification makes searching much more powerful (especially if combined
with text indexing).  Then if you wanted to find all occurences of "Sprint"
(as a proper noun, ie. name) in a document you could choose only to search
documents that were about telecomms or corporate culture or whatever.

The only alternative I can see would be to tag *every* word with its
grammatical value (noun, verb, whatever), on the basis that it *might* be
useful to someone, somewhere, sometime in the future. Not a pleasant prospect.

>A lot of the things you mention are superficial.  They still don't
>scratch the surface of *semantic* tagging of information.
>
>We are creating and stocking quantities of information that will
>be used well into the next century.  Machines will be used to
>search these enormous quantites of data.  If it isn't tagged
>meaningfully now (at its inception), it never will be.  And that
>will be a real shame.

I entirely agree.  But I suspect it's battle that's already being lost.

>> And let's not forget:
>> 
>>     * Render it on just about any output device, with reasonably
>>       good results.
>
>Whoopie.

I can't tell if this sarcastic or not.  Platform-independent display is a
real bonus.

Regards

--
Jon Wallis         Senior Lecturer in Information Systems Engineering
School of Computing & I.T., University of Wolverhampton, UK - WV1 1SB
   Personal WWW Home Page   <URL:http://www.scitwlv.ac.uk/~cm1906>
     University WWW Home Page <URL:http://www.scit.wlv.ac.uk/> 
-----------------"That's some catch, that catch-22"------------------

Received on Friday, 22 September 1995 03:37:37 UTC