Re: What are the problems with IDML? (fwd) from MegaZone on 1996-08-24 (www-html@w3.org from August 1996)

From: MegaZone <megazone@livingston.com>
Date: Fri, 23 Aug 1996 18:19:04 -0700 (PDT)
To: www-html@w3.org
Message-Id: <199608240119.SAA07597@server.livingston.com>
Once upon a time Doug Donohoe shaped the electrons to say...
>What tools?  Are you referring to enhancements to existing
>HTML editors?  I know most support META now, but are they going
>to add support for the various META schemas that people use?

There are many tools that are free/shareware and that evolve very fast.
Most of their authors are skilled and respond fast to new developments.
So yes, I believe that if a real standard came to be editors would quickly
support them.  And if it was backed by the W3C I expect major players like
SoftQuad would add support just as quickly as they are support for new
tags - which is usually very quickly.

And many of the editors that do support META also support user defined
macros - so a skilled user (and people who care about indexing usually are
the skilled ones) can create macros to do the headers.

Aside from that, it wouldn't be a tough job for Perl to generate the
headers and add them to a file.  If I get some time (hahaha  I've pulled a
couple 50+ hour shifts this week, like I have time...) I might whip a basic
one up.  (Anyone else with a more realistic schedule is welcome to, since
the realist in me knows I have things I *need* to do that I don't have time
for, let alont side projects...  I probably shouldn't be reading this list
for that matter...)

>True.  Is there a specification or draft which describes what
>these config/rules files look like?  I imagine that would be

It would be up to the author, I don't know of any draft - there really isn't
a standard for tools.  I was thinking that the author who created a tool
for parsing out the data would design the tool with an external rule set
instead of hardcoding it.  Well, I would given the time and inclination to
create such a bot.  That would make it expandable without needing to hack
the source every time a new rev of some standard comes out.

>complex to specify all the types of data validation one might

Sure, if you want to start making it completely idiot proof a spider could
be required to do a great deal of checking.  But most spiders today don't
bother, the basic assumption is that authors know what they want to ad.

If you are talking about the tool for creating the tags, the content should
be up to the author, but you could have it generate prompts, like:
address:
city:
state:
zip:

Ad nausem, and the tool would take those fields and generate a location
tag string from it.

>I'm sure you know, but I still feel that putting stuff that belongs
>together in seperate META tags introduces more problems.  For example,

I think the only problem it introduces is gluing the data back together.

>when writing HTML, I mess up tables far more often than <IMG>
>tags because tables require beginning and ending tags.  In IMG,

</tr></td> and </th> are optional.  I'm not sure if </table> is off the
top of my head.

My point is, forgetting a '"' can be just as debilitating to a document as
forgetting </table>.  I have seen pages where hunks of text are missing
because Netscape was looking for the closing quote on a tag.  Or pages where
slews of text were made an anchor all due to one missing quote.

And error is an error, and the smallest error can be just as catastrophic
as a seemingly 'big' error.

>the only thing I usally get wrong is forgetting a closing quote.

That can be a big 'only', it can ruin an entire page if the wrong quote is
missing.

>All I'm saying is that for a while longer, people are going
>to be writing this stuff by hand -- and with copying & pasting and
>moving stuff around, you are more likely to mess something up
>when you have to keep 8 tags in a block versus one.  Perhaps this

I'm probably not a good example, since I've been playing with HTML for 5 or
so years - from the very early days when a friend had a friend at CERN
who told us about this cool new thing some guy named Tim was working on. ;-)
(It's a small net after all, it's a...)  I write almost all of my HTML by
hand in emacs, and 95%+ of the time it validates first pass.  For the 
remainder most of the errors are typos, only a few are mistakes in tag
placement.

But the real point is the word 'validate' - I don't trust myself, I validate
my work.  I run all of my pages through HTML-Check and Weblint.  Validators
are, IMHO, a vital part of the authoring process.  They are like spell 
checkers for word processing, using 'perl -cw' on a new Perl script, or
compiling C with -Wall (or -Wall -pendantic) to check the fine points.

I say that most of the skilled users won't be making too many mistakes,
but they, of course, will.  And those mistakes would then be caught by
validation tools.  SGML validators like HTML-Check would ensure the tags
had the correct structure, and some Perl script could check for content
syntax.  Maybe an extension to the already popular Weblint, or a new 
tool similar to Weblint but just for index tags.  Weblint is Perl and
easily extensible.

>will not be a problem when the HTML editors mature.  Still, are
>the editors going to have support for all the various META schemas
>out there?

All the META schemes?  Probably not.  The one or two most popular, the
de facto standards, probably.  And if they have any META support, they
will probably allow user configured tags.  For instance, Netscape Enterprise
Server uses META tags for the built in cataloging agent.  And I believe the
built in Verity Search Engine can utilize them too - since I'll be installing
that server real soon now (just waiting on the new Sparc to arrive) and I'll
need to add meta tags to make full use of it, I'd MUCH prefer a sheme that
used the same kinds of tags the existing server users.  So I don't end up
with tag soup, one set of tags for each schema.

>both ways).  If someone has experience using META for more than
>just keywords and description, please let me know (and send

Netscape Enterprise Server uses"
<META NAME="Classification" CONTENT="Your classification here">
<META NAME="Author" CONTENT="Author's name here">

It allows the catalog agent to sort based on classification and/or author.
I haven't had hands on with this yet but the documentation hints that other
META tags could be used as it mentions a MetaData directive for the 
filter.conf file used by the agent which can sort/filter based on "any
meta-data listed in META tags in the HTML document."

-MZ
--
Livingston Enterprises - Chair, Department of Interstitial Affairs
Phone: 800-458-9966 510-426-0770 FAX: 510-426-8951 megazone@livingston.com
For support requests: support@livingston.com  <http://www.livingston.com/> 
Snail mail: 6920 Koll Center Parkway  #220, Pleasanton, CA 94566
Received on Friday, 23 August 1996 21:19:30 UTC