Re: meta information

Bert Bos (bert@let.rug.nl)
Mon, 6 Jun 1994 15:53:48 +0200 (METDST)


Message-Id: <9406061353.AA04680@freya.let.rug.nl>
From: Bert Bos <bert@let.rug.nl>
Subject: Re: meta information
To: www-html@www0.cern.ch
Date: Mon, 6 Jun 1994 15:53:48 +0200 (METDST)
In-Reply-To: <9406051937.AA19130@www0.cern.ch> from "Tim Berners-Lee" at Jun 5, 94 09:38:20 pm

The discussion about META has turned into much more than a simple
disagreement over the desirability of a META element...

I guess it's a question of what do we want to use HTML for. The
constraints are clear: we want *document* to express arbitrarily many
different things along many different dimensions, but we want *HTML*
to remain simple (and SGML conformant). Assuming the equality

		     MARK-UP = META-INFORMATION,

a few of the roles that mark-up can have are:

1) Lay-out

   HTML provides lightweight, display-oriented markup. When more
   visual aspects are needed, style sheets are the way to go.

2) Linking documents in a semantically meaningful way

   The LINK element defines a few types of relations, the PRINT
   attribute of the A tag adds a few more, but arbitrary semantics are
   difficult to express. The semantics are meant for automated
   searches, indexing, and other applications .

   a) Private schemes: using PIs is probably best: <? whatever>

   b) No machine-readable semantics at all: Ari Luotonen's WIT package
      (see <http://info.cern.ch/wit>) shows that semantic links
      (currently only `agree' and `disagree') can be expressed without
      new link types, at least to human readers.

   c) Fixed set of link types.

   d) Extensible, hierarchical classification: new roles can be added
      via IS-A relations. E.g, assuming `maker' is a primitive role,
      we can define new roles `painter', `writer', and `co-writer' as
      (sub-)subclasses of `maker':

	<!element writer - - (#pcdata|img|%emph;)*>
	<!attlist writer
		href %URL #implied
		is-a cdata #fixed "maker"	-- a kind of `maker' --
		role cdata #fixed "writer">
	<!element painter - - (#pcdata|img|%emph;)*>
	<!attlist painter
		href %URL #implied
		is-a cdata #fixed "maker"	-- a kind of `maker' --
		role cdata #fixed "painter">
	<!element co-writer - - (#pcdata|img|%emph;)*>
	<!attlist co-writer
		href %URL #implied
		is-a cdata #fixed "writer"	-- a kind of `writer' --
		role cdata #fixed "co-writer">

      Tags like these can be added to HTML (3.0) via the `cextra'
      entity and the RENDER element; there is no need to change HTML.
      The presence of the IS-A attribute flags to the indexer that
      this is a semantic link.

   e) Extensible, hierarchical classification with multiple
      inheritance: this allows us to express that a `writer' is not
      only a kind of `maker', but also a kind of `human'.

   For (c), (d) and (e) we will need a common set of primitive roles
   and a procedure for registering new primitives.

   The mechanism above is relatively simple and can be used without
   changes to HTML (3.0), but it might be too simple. In HyTime the
   position of the anchor of a link is marked in the text, but the
   link info itself (i.c. the target URL and the type of relation) is
   defined elsewhere, allowing for much more elaborate link
   descriptions.

   f) Don't try to use SGML at all: use

	<link rel="semantics" href="semantics.sem">

      and define a language for expressing semantics (in Prolog? Scheme?)

3) Providing parameters for the HTTP protocol

   The proposal by TBL (see message <9406051937.AA19130@www0.cern.ch>)
   and Roy Fielding (see message
   <9406060223.aa24242@paris.ics.uci.edu>) for what I would call an
   `HTTP architectural form' seems a good way to ensure that HTTP and
   HTML can continue to be developed independently.

   The idea is that none of the HTML elements is reserved for HTTP
   headers, not even META. And whether there is ever going to be an
   <EXPIRES> or <REPLY-TO> tag is immaterial. To find the HTTP headers
   that are hidden in the HTML document, the server only looks at
   attributes, never at element names. In this way, the following two
   lines would yield the same HTTP header:

	<meta http="Expires" content="Mon, 6 Jun 1994 11:24:21">
	<expires http="Expires" content="Mon, 6 Jun 1994 11:24:21">

   Alternative: omit the CONTENT attribute and add a </META> tag. This
   is possible provided Dan Connolly's rule-of-thumb for the HEAD
   element is adopted: all content in the HEAD is ignored by
   browsers.


SUMMARY: HTML is for simple, display-oriented markup, it has just
enough extensibility to allow extra info to be embedded invisibly. All
other dimensions of a document are added with mechanisms that can be
easily ignored by browers:

- extensions in the head -> use META
- extensions in the body -> use `cextra'
- sophisticated layout --> style sheets
- indexing and semantics --> `semantic link' architectural form
- HTTP info --> `HTTP' architectural form

-- 
                     __________________________________
                    / _   Bert Bos <bert@let.rug.nl>   |
           ()       |/ \  Alfa-informatica,            |
            \       |\_/  Rijksuniversiteit Groningen  |
             \_____/|     Postbus 716                  |
                    |     9700 AS GRONINGEN            |
                    |     Nederland                    |
                    |     http://tyr.let.rug.nl/~bert/ |
                    \__________________________________|