Re: MCF's new implementation (via XML)

Benjamin Franz (snowhare@netimages.com)
Mon, 16 Jun 1997 06:47:00 -0700 (PDT)


Date: Mon, 16 Jun 1997 06:47:00 -0700 (PDT)
From: Benjamin Franz <snowhare@netimages.com>
To: www-html@w3.org
Subject: Re: MCF's new implementation (via XML)
In-Reply-To: <33A5238C.E8FF4EDF@calum.csclub.uwaterloo.ca>
Message-ID: <Pine.LNX.3.95.970616061355.28734B-100000@ns.viet.net>

On Mon, 16 Jun 1997, Paul Prescod wrote:

> Benjamin Franz wrote:

> > Does anyone know of a link to a public copy of the actual proposal by
> > Netscape? My reading of the Apple MCF documents did not encourage me to
> > believe this is simple enough to work in the real world (read in a world
> > where metadata schemas are beyond 99% of author's comprehension levels)
> > and the W3C link goes into 'member only' areas.
> 
> There are two links in the same paragraph. The second goes to
> http://developer.netscape.com/mcf.html . 

Ah. I overlooked that.

> 
> As far as working comprehension levels, the ability of end-users to read
> the meta-data schema definition format is probably not crucial. Millions
> of people who use HTML do not know what a DTD looks like. Someone else
> maintains the DTD and they write code. The same makes sense for
> metadata.

It *does* matter. Even as simple a META data schema as the one developed
by Sandia National Labs has been badly corrupted in practice. The ORIGINAL
specification calls for FOUR META elements: 

<meta name="description" content="">
<meta name="keywords" content="Internet, net">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">

In practice this has been stripped down to:

<meta name="description" content="">
<meta name="keywords" content="Internet, net">

*Because users didn't understand the rest*

Examining Netscape's proposal, it is MUCH too complex for successful
widespread use without automation tools completely hiding it from the
end user: Its a "programmer's language."

Pointing to HTML as a successful counter-example isn't a good
counter-example: I would estimate that less than 5% of existing HTML
documents can pass an SGML validator with *any* DTD (assuming they HAD a
valid public DOCTYPE) because they use constructs that violate the the
underlying SGML parse. You can start with failure to quote values that
MUST be quoted, and continue on through raw 8 and 16 bit text, use of
non-standard character sets, multiple identical attributes for applets,
non-compliant comments, failure to close tags, undefined character
callouts, incorrect tag nesting, overlapping tags, and a thousand OTHER
'interesting' ways to break HTML in ways that are not compatible with
SGML at any level. You can then followup with the ways of breaking HTML
that *could* be made into SGML - but do not match any known DTD.

Fortunately for HTML, it s a SIMPLE enough construct that even when you do
it wrong, there is a good chance the browser will be able to figure out
fairly closely what you *MEANT* to do. A task greatly assisted by the fact
that rather than generating a final product that must be used by other
programs, it generates something that humans read: Humans are *very* good
at figuring out what was MEANT - even when the information is badly
damaged in transit.

This is not true of MCF - which is targeted at use by other programs. If
you screw it up - it just isn't going to work. 

-- 
Benjamin Franz