Re: meta information

Daniel W. Connolly (connolly@hal.com)
Thu, 02 Jun 1994 15:11:20 -0500


Message-Id: <9406022011.AA17861@ulua.hal.com>
To: cwilson@spry.com
Cc: Multiple recipients of list <www-html@www0.cern.ch>
Subject: Re: meta information 
In-Reply-To: Your message of "Thu, 02 Jun 1994 16:35:48 +0200."
             <9406021409.AA29622@homer.spry.com> 
Date: Thu, 02 Jun 1994 15:11:20 -0500
From: "Daniel W. Connolly" <connolly@hal.com>

Gee... this thing has really blown up. I see three issues:

1. How does the author express stuff that the server should
use and stick in the HTTP headers? My answer:
	<EXPIPRES http>...</expires>
or, until implementations are fixed,
	<EXPIRES http content="...">

2. How does the author communicate with sophisticated indexing engines?
My answer: I dunno. Build the indexing engine and I'll tell you.

You might say that META is a general hook into a relational indexing
engine. When an author writes <META name="colname" value="val">, he's
saying "when you build the record for this document, stick val in
the colname column." I'll buy that. But what is the namespace of
columns? Where's the schema? What table are we talking about?

As a general way to express
	"For this document, the value of the C column in table T 
		in database D is V"
I suggest:
	<C DT>V</C>
or
	<C DT content="V">
for example
	<KEYWORDS IAFA content="Modula-3,programming languages">
where C and DT are names in the DTD.


3. How do we manage the namespace of HTML element names?

Long term: Architectural forms.
Short term: clients can parse any tag. They ignore the ones they
	don't recognize (possibly with a warning). Tag names get
	proposed, and eventually standardized.


In message <9406021409.AA29622@homer.spry.com>, Chris Wilson writes:
>
>>How would they make use of that information? Unless and until there's
>>a public agreement about what such data represents, you're talking
>>about private techniques. When such general consensus is reached,
>>then we add it to the spec. No?
>
>_BUT_, they can be private techniques which won't break current and
>future WWW clients.

I see three choices for extensions that don't break current and
future clients:
	1. special comment syntax, like the server-side includes
	in NCSA's httpd. These are really only a good idea if they
	get eliminated before the stuff goes over the wire, since
	a client doesn't know when it peeks into a comment and sees
	its special syntax whether the author put it there on
	purpose or as a coincidence.

	2. processing instructions. Great for private applications.

	3. New element names. It has worked so far.


>  Otherwise, each new data element will require
>not only the discussion/consensus cycle, but also the implementation
>cycle for each Web browser or server out there.

How many WWW implementations don't include the "skip tags you
don't recognize" convention? I don't believe you have to write
code each time you want to _ignore_ another tag. And I don't believe
you can _act_ on a new tag _without_ writing more code.

>None, given that 1) you don't mind that an indiscriminately large
>number of tags may need to be added, which may take eternity, and
>2) you (as a document producer) don't mind that some browsers and
>servers are going to choke on your data due to unrecognized tags.

re 1) What is the problem with lots of tag names? How many do you really
expect to see? You don't see a special Meta: header in RFC822, do
you?

re 2) What browsers and servers choke on unknown tags? An author
_will_ have to deal with the fact that some of the markup he writes
will not make it to the reader. This is why I like to keep the list
of standard idioms expressible in SGML. An author can validate the
document w.r.t. the DTD that his audience groks, and thus be sure
that s/he has used no funny stuff.

>I, as a client and server developer, would much rather support one
>tag with genericized values for representing arbitrary information
>(most of which I can probably safely ignore) than having to code in
>support for a bunch of new tags, or get a load of errors parsing
>documents because my browser doesn't understand all the new tags
>someone is using in their document.

What's the difference between ignoring
	<expires content="...">
and ignoring 
	<meta name="expires" value="...">
???

And if you're going to flag one as an error, why wouldn't you flag
the other as an error?

>  With future
>"strict compliance" modes in browsers, this is a can of worms that
>I really just don't see the need to open, when we could skirt the
>issue so easily.

Exactly what issue is it that we can "skirt" so easily?
* introduction of new idioms into the language? Hardly.
* http headers? There are more direct ways to deal with this.
* indexing? Again, use a syntax that exactly matches the features.

>In a sense, it is a hack... if a hack could be defined as a widely
>(perhaps even "uncomfortably") extensible system for adding
>open-ended information to documents.

Hmmm... isn't this about like declaring an element X with
attributes A1, A2, ... up to, oh, let's say A9. Use them
for whatever you like. But beware of certain conventions
about how they're used that aren't expressed in SGML, even
though they could be...

Come on! It's not that tough to maintain the DTD as a community,
is it? Do we _have_ to escape out of SGML all over the place?

Dan