meta-draft

Davide Musella (davide@itim.mi.cnr.it)
Tue, 1 Apr 1997 16:02:04 +0200 (MET DST)


Date: Tue, 1 Apr 1997 16:02:04 +0200 (MET DST)
From: Davide Musella <davide@itim.mi.cnr.it>
To: www-html@w3.org
Subject: meta-draft
Message-ID: <Pine.SUN.3.95.970401155634.17411C-100000@sun6>

THis is the third version of the draft about meta tag.

Davide

---------------------------


INTERNET DRAFT       				     Davide Musella
draft-musella-html-metatag-03.txt 		Institute for Multimedia
      						      Technologies
						National Research Council

24 March 1997
Expires in six months

                       The META Tag of HTML
 
 
Status of this Memo 
 
This document is an Internet-Draft.  Internet-Drafts are working documents
of the Internet Engineering Task Force (IETF), its areas, and its working
groups.  Note that other groups may also distribute working documents as
Internet-Drafts.  Internet-Drafts are draft documents valid for a maximum
of six months and may be updated, replaced, or obsoleted by other
documents at any time.  It is inappropriate to use Internet- Drafts as
reference material or to cite them other than as ``work in pro gress.''
 
To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Cos t) or
ftp.isi.edu (US West Coast).
 
Distribution of this document is unlimited. Please send comments to:

Davide Musella
(e-Mail) davide@itim.mi.cnr.it
(voice) +39.(0)2.70643271
(fax) +39.(0)2.70643292
 
 
Abstract 

This document defines a strict synopsis to catalogue an HTML document
using the META tag of HTML.  The given definition wants to define a base
subset of cataloguing keys to provide a preliminary classification method.



1 - Introduction 

Now the synopsis of the META HTTP-EQUIV Tag is not severe, allowing so the
use of different key words to define the same thing.  The functions like
this:

<META 	HTTP-EQUIV = "authors"
	CONTENT = "Pennac, Benni">

or

<META 	HTTP-EQUIV = "writers"
	CONTENT = "Pennac, Benni"> 

could represent the same concepts with two different syntax.  The aim of
this Draft is to define the words which define the content of an HTML
document, without excluding a more specific classification realized with
different techniques.  The method used to accomplish this has been defined
at the "Distributed Indexing/Searching Workshop"
[http://www.w3.org/pub/WWW/Searching/9605- Indexing-Workshop/index.html]
and foresees to use a defined prefix to indicate which is the cataloguing
method used to describe a classification key.

2 - The META Tag
 
The META element is used within the HEAD element to embed documents meta-
information not defined by other HTML elements. Such information can be
extracted by servers/clients for use in identifying, indexing and
cataloguing specialized document meta-information.  It is generally
preferable to use named elements that have well defined semantics for each
type of meta-information. The Meta element is provided for situations here
strict SGML parsing is necessary and the local DTD is not extensible.  In
addition, HTTP servers can read the content of the document head to
generate response headers corresponding to any element defining a value
for the attribute HTTP-EQUIV.  This provides document authors with a
mechanism (not necessarily the preferred o ne) for identifying information
that should be included in the response headers of an HTTP request.
 
The META element has three attributes:  

  NAME  
  HTTP-EQUIV
  CONTENT 

It's possible to use the META tag everywhere in the HEAD part. Mor eMETA
tags referring to the same string must be considered tied, combining
contents (concatenated as a comma-separated list).

3 - NAME

This attribute can be used to define some properties such as "number of
pages" or "preferred browser" or any information an author wants to insert
in his document. An example:

<META 	NAME = "Maybe Published By"
	CONTENT = "McDraw Bill"> 

or

<META 	NAME = "keywords"
	CONTENT = "manual, scouting">

Do not use the META element to define information that should be
associated with an existing HTML element.


4 - HTTP-EQUIV

This attribute binds the element to an HTTP response header. If the
semantics of the HTTP response header named by this attribute is known,
then the contents can be processed based on a well defined syntactic
mapping, whether or not the DTD includes anyth ing about it.  An HTTP
server must process these tags for a HEAD HTTP request. Do not name an
HTTP-EQUIV attribute the same as a response header that should typically
only be generated by the HTTP server. Some inappropriate names are
"Server", "Date", and "Last-Modified".  Wether a name is inappropriate
depends on the particular ser ver implementation. It is recommended that
servers ignore any META element that specifies HTTP equivalents (case
insensitively) to their own reserved response headers.  The HTTP-EQUIV
attribute has the same semantic value as the NAME attribute with the only
exception of the HTTP repercussions.

5 - CONTENT

Used to supply a value for a named property. It can contain more than one
single information.

6 - Cataloguing an HTML document

To classify an HTML document it's possible to use the META tag; using this
method the author can control how his document is indexed.  The intention
is to define a base set of meta information "normal_user oriented". The
idea is that most of the authors of HTML documents have no specialist
background: they are not librarian nor Internet specialists so their
knowledge of the cataloguing problems is really low. A normal behavior of
an Internet-user is avoiding the use of what he does not know, therefore,
to improve the use of the meta information, I have defined the following
keys to do a first rough catalogue of a HTML document:

Author: to indicate the author/s of the document,
 	Ex:
	<META 	HTTP-EQUIV = "Author"
		CONTENT = "Plutarco">
	To differentiate the name from the surname it is required to
	separate them with an underscore character "_" (ASCII [95]), using first
	the name/s and then the surname; so an example could be:
	<META 	HTTP-EQUIV = "Author"
		CONTENT = "Milan_Kundera, Georg Wilhelm Friederich_Hegel,
		Leonardo_Da Vinci">

Description:  used to indicate the description of the document contents.
	It must be rationally shorter than the whole document.
	Ex:
	<META	HTTP-EQUIV ="Description"
                CONTENT ="This is the xxxxxx's home page. Here you'll find
	        a lot of photos of my last holiday and a really big FAQ
	        archive"

Expire: to indicate the expire date of the document (HTTP date format or
	"none" to indicate a document which content doesn't expire).
	Ex:
	<META	HTTP-EQUIV ="Expire"
		CONTENT ="13 Apr 1997 00:00 GMT">

Keywords: to indicate the keywords of the document. It's a sequence of
	comma separated phrases.
	To represent this concept with a boolean logic, we can say that the AND
	operator will be represented by the SPACE (ASCII[32]) and the OR
	operator by the COMMA (ASCII[44]). The AND operator is processed
	before the OR operator. So a string like this: "Red ball, White
	pen" means :"(Red AND ball) OR (White AND pen)".
	Ex: 
	<META 	HTTP-EQUIV = "Keywords"
		CONTENT = "Italian Products, Italian Tourism, Italy">
	The spaces between a comma and a word or vice versa are ignored.

Language: its content specifies the language in which the document is
	written: it is composed by two or three language-code letters, based on
	ISO-639 or ISO639/2 respectively, optionally followed both by a dash
	(ASCII[45]) and a ISO-3166 two country -code letters to represent the
	national variants.
	Ex:
	<META 	HTTP-EQUIV = "Language"
		CONTENT  ="it">

Publisher: to indicate the organization responsible of the document
	publishing in the actual form.
	Ex:
	<META	HTTP-EQUIV ="Publisher"
		CONTENT ="Mc Draw-Bill">

Timestamp: to indicate when the document is authored  (HTTP date format).
	Ex:
	<META	HTTP-EQUIV ="Timestamp"
		CONTENT ="25 Mar 1997 08:30 GMT">.

The TITLE information (concerning the title of the document) is considered
given by the TITLE tag content to avoid useless redundancies.  It's highly
recommended to use the HTTP-EQUIV properties instead of the NAME so to
give the possibility to an agent to have these meta information without
requiring the full document.  A more complex description of the text
content could be added, without erasing these meta information, using more
specific techniques, like the Dublin Core or the MCF.


Appendix 1

HTTP date format

The HTTP date format is defined as:

HTTP-date    = rfc1123-date | rfc850-date | asctime-date

where
          rfc1123-date = wkday "," SP date1 SP time SP "GMT"
          rfc850-date  = weekday "," SP date2 SP time SP "GMT"
          asctime-date = wkday SP date3 SP time SP 4DIGIT

but the RFC850 format and the asctime format are obsolete (they are used
for backward compatibility), so it is highly recommended to use the
rfc1123 format:


rfc1123-date = [wkday "," SP ] date SP time

date1 =	1*2DIGIT SP month SP 4DIGIT (day month year)
	Ex: 25 Feb 1997
        
time =	hour zone

hour =	2DIGIT ":" 2DIGIT [":" 2DIGIT] (hours:minutes[:seconds])
        Ex: 22:55:30

wkday =	"Mon" | "Tue" | "Wed"
              | "Thu" | "Fri" | "Sat" | "Sun"

month =	"Jan" | "Feb" | "Mar" | "Apr"
              | "May" | "Jun" | "Jul" | "Aug"
                      | "Sep" | "Oct" | "Nov" | "Dec"

zone =	"UT"  | "GMT"                         ; Universal Time
                                              ; North American : UT
              |  "EST" | "EDT"                ;  Eastern:  - 5 | - 4
              |  "CST" | "CDT"                ;  Central:  - 6 | - 5
              |  "MST" | "MDT"                ;  Mountain: - 7 | - 6
              |  "PST" | "PDT"                ;  Pacific:  - 8 | - 7
              |  1ALPHA                       ; Military: Z = UT;
              | ( ("+" | "-") 4DIGIT )        ; Local differential
                                              ;  hours+min. (HHMM)

rfc1123-date examples:

28 Apr 1997 19:30 GMT
Mon, 28 Apr 1997 19:30:00 GMT
28 Apr 1997 20:30 +0100



------------------------------------------------------------------------------
Davide Musella
Institute for Multimedia Technologies, National Research Council, Milan,ITALY
tel. +39.(0)2.70643271                  fax. +39.(0)2.70643292
e-mail: davide@itim.mi.cnr.it           http://jargo.itim.mi.cnr.it/