Re: meta information

Roy T. Fielding (fielding@simplon.ICS.UCI.EDU)
Wed, 01 Jun 1994 19:06:40 -0700


To: Multiple recipients of list <www-html@www0.cern.ch>
Subject: Re: meta information 
In-Reply-To: Your message of "Thu, 02 Jun 1994 00:52:04 +0200."
             <9406012246.AA16944@ulua.hal.com> 
Date: Wed, 01 Jun 1994 19:06:40 -0700
From: "Roy T. Fielding" <fielding@simplon.ICS.UCI.EDU>
Message-Id:  <9406011906.aa13948@paris.ics.uci.edu>

This discussion should really be on www-html, so I'm moving it in a
rather arbitrary fashion...sorry (I'm beginning to dislike the split).

I wrote on www-talk:
----------------------------------------------------------------------
<!--
 The META element can be used to embed document metainformation not
 defined by other HTML+ elements for use by servers/clients capable
 of extracting that information.

 Servers should read the document head to generate HTTP headers
 corresponding to any META elements with the HEADER attribute,
 e.g. if the document contains:

     <meta header name="Expires" value="Tue, 04 Dec 1993 21:29:02 GMT">

 The server should include the header:

     Expires: Tue, 04 Dec 1993 21:29:02 GMT

 as part of the HTTP response to a GET or HEAD request for that document.
 When the HEADER attribute is not present, the server should not generate
 an HTTP header for this metainformation; e.g.

     <meta name="IndexType" value="Service">

 would not generate an HTTP header but would still allow clients or
 other tools to make use of that metainformation.

 Other likely names are "Keywords", "Created", "Owner" (a name)
 and "Reply-To" (an email address).  
-->

<!ELEMENT META - O EMPTY>
<!ATTLIST META
        id      ID      #IMPLIED -- to allow meta info                  --
        header (header) #IMPLIED -- generate HTTP header                --
        name    CDATA   #IMPLIED -- metainformation name e.g. "Expires" --
        value   CDATA   #IMPLIED -- associated value                    -->
----------------------------------------------------------------------

Dan Connolly replied:

> In message <9406011511.aa29004@paris.ics.uci.edu>, "Roy T. Fielding" writes:
>> corresponding to any META elements with the HEADER attribute,
>> e.g. if the document contains:
>>
>>     <meta header name="Expires" value="Tue, 04 Dec 1993 21:29:02 GMT">
>>
>> The server should include the header:
>>
>>     Expires: Tue, 04 Dec 1993 21:29:02 GMT
>
>Good. Examples. I love examples. As a counterexample, consider:
>
>        <EXPIRES DATE="Tue, 04 Dec 1993 21:29:02 GMT">

That's fine, but it assumes that the server/tool knows the syntax of
the EXPIRES element.  The META element allows any name-value pair to
be expressed (and thus parsed) without knowing the purpose of that
name or value.  We can then add new names without changing the spec
and all implementations of the servers.

>> as part of the HTTP response to a GET or HEAD request for that document.
>> When the HEADER attribute is not present, the server should not generate
>> an HTTP header for this metainformation; e.g.
>>
>>     <meta name="IndexType" value="Service">
> 
> Counterexample:
> 	<?indextype service>
> 
>> would not generate an HTTP header but would still allow clients or
>> other tools to make use of that metainformation.
> 
> How would they make use of that information? Unless and until there's
> a public agreement about what such data represents, you're talking
> about private techniques. When such general consensus is reached,
> then we add it to the spec. No?

No.  There is a substantial difference between a public agreement (which
may include only a small subset of webspace) and general consensus, and
even when they do coincide there is generally too long of a lag time between
the need and the spec and then also between the spec and the implementation. 
In these situations, it is useful to have a general means for applying
extensions which would not interfere with people who are not aware of
those extensions.

>> Other likely names are "Keywords", "Created", "Owner" (a name)
>> and "Reply-To" (an email address).  
> 
> Yes... all these belong in the <HEAD>...</HEAD> of an HTML document.
> The HEAD is by design isomorphic to the HTTP headers, or the headers
> of a mail message or a news article. You don't need an extra META
> element to say this.

If this were true with the current implementation of HTML, then I would
agree and we could all get on with our work without any need for the META
element.  If the HTML 2.0 spec is written such that <HEAD></HEAD>
and <BODY></BODY> are required and explicit instructions are included
that browsers not render anything within <HEAD>...</HEAD>, and then all
offending browsers are fixed accordingly, then we can talk about using
any element name within the HEAD as a response header. 

If you don't want to require that in HTML 2.0, then we are stuck with using
META since it is the only way to provide such information across different
versions of HTML -- a necessary requirement for my application.

One question I have regarding use of HTML element names as headers is what
are the character limitations on element names?  From the DTD they appear
to be close enough to rfc822 contraints, but is 34 characters the actual
length restriction or just a uniqueness restriction (or am I misreading it)?

>>> What is the meaning of the META element? I've heard several
>>> things:
>>> 
>>> Proposal: It's for http headers:
>>> 	<META name="Expires" value="Tue Aug 12, 1994 10:33:32 CST">
>>> Answer: Then why not write:
>>> 	<HTTP-HEADER name="Expires" ...>
> 
>>Because metainformation may or may not also be useful as header information,
>>depending on the capabilities of a given server and the existence of
>>future tools which make use of that information.  Nevertheless, it is still
>>metainformation whether or not it is used within response headers.
> 
> I would still like to see a definition of this term "metainformation."
> You might say that the TITLE of a document is metainformation. You
> might say ADDRESS is metainformation. But I don't see how this distinction
> is useful.

I consider TITLE to be metainformation as well -- the only distinction is that
there exists a previously defined syntax for TITLE and none for EXPIRES. 
In fact, a syntax for EXPIRES could be defined as well, without having any
impact on the existence of META elements.

My definition:  Metainformation is information about a collection of
                information (usually in the form of a document) in terms
                of that collection.
(;-)

The ADDRESS element does not represent metainformation, although its contents
may include some metainformation.  This is because it is defined to be a
rendering element (and thus normal information) and may occur any number of
times within a single document.

>>> I can see the need for:
>>> 
>>> 	<EXPIRES DATE="...">
>>> 
>>> but not a general HTTP header escape mechanism.
>>
>>But can you anticipate the needs of everyone?
> 
> No, and I'm not trying to. New elements can and will be added over time.

How??? It is not good enough to say that they will be added -- there needs
to be a specific mechanism defined whereby they can be added without breaking
existing implementations.  We can't define a new content-type every time
we need a new element.

>>  My original proposal called
>>for an EXPIRES element like the above and an OWNER element like
>>
>>        <OWNER name="...">
>>
>>It was shot down because it does not satisfy the general need for document
>>metainformation which can be parsed without pre-knowledge of the purpose of
>>that metainformation.
> 
> I find that original proposal quite on target, and I don't see how
> the counterargument carries much weight. What examples motivate
> this "general need for document metainformation" that are not
> satisfied by new HEAD elements?

Try parsing the OWNER element above without knowing its purpose (i.e.
without knowing that you will find what you are looking for within the
attribute called "name".  Naturally, we could solve this dilemma by requiring
all such elements to have no attributes and just content, e.g.

          <OWNER>...</OWNER>

as is the case for TITLE.  Of course, this is assuming that existing browsers
are fixed. 

Another solution is to simply require all metainfo elements to have a simple,
consistent attribute, e.g.:

          <OWNER value="...">
          <EXPIRES value="...">

but that also begs the question of what to do about existing HEAD elements
that do not follow the same conventions, e.g. LINK, BASE, NEXTID, and (ugh)
ISINDEX.


>>> Proposal II: It's for private indexing techniques. Then why not
>>> 	use comments or processing instructions?
>>> 	<?keywords a,b,c,d>
>>> 	<?description lksjdflkjsdf>
>>> or
>>> 	<!-- @#@# KEYWORDS: a,b,c -->
>>> 	<!-- @#@# DESCRIPTION: ... -->
>>
>>Because it is not for PRIVATE indexing techniques.
> 
> This is news to me. What is this public agreement about how these
> indexing techniques work? I can imagine some sort of relational
> database abstraction behind it all or something... hmmm...
>
>>  There is a multitude
>>of uses for this information, most of which I did not think of when the
>>META element was originally proposed.
> 
> In how many cases is this information exchanged between parties, vs.
> the number of cases when it is only used privately by one party?
> 

For example, all users of the MOMspider tool have the option of providing
additional metainformation within their HTML files such that MOMspider
can use it in building its maintenance index.  Such information currently
includes LAST-MODIFIED, TITLE, OWNER, REPLY-TO, and EXPIRES.  The first
two are already provided via the server and HTML -- the last three can
be obtained from META elements with the appropriate names.

Currently, that group of users is extremely limited (i.e. me) and thus can
be considered private.  However, in a few weeks that will expand to several
dozen sites -- is it still private?  Within six months, I expect it to include
at least half the information providers in webspace (assuming the tool works
as expected).  If that occurs, httpd server authors will see the need to
include metainfo in response headers, thus allowing MOMspider to pick up
that info from any URL tested with a HEAD request instead of just those
files traversed at the local site.

This information will become available not by public agreement, but simply
because one tool (or possibly many) can make productive use of it.
If the information is readily available, other clients will make use of it
and thus whatever scheme is implemented first will become the defacto
standard.  Sound familiar?  Personally, I would prefer the scheme that
MOMspider starts with to be the most general possible, which is why I
proposed it six months ago (long before I started implementing the tool).


....Roy Fielding   ICS Grad Student, University of California, Irvine  USA
                   (fielding@ics.uci.edu)
    <A HREF="http://www.ics.uci.edu/dir/grad/Software/fielding">About Roy</A>