Re: LANG + Metadata + unknown attributes

Martin J. Duerst (
Sat, 8 Mar 1997 11:02:29 +0100 (MET)

Date: Sat, 8 Mar 1997 11:02:29 +0100 (MET)
From: "Martin J. Duerst" <>
To: Misha Wolf <>
cc: www-html <>, www-international <>
Subject: Re: LANG + Metadata + unknown attributes
In-Reply-To: <2146270108031997/A07422/REDMS1/11B3405B2A00*@MHS>
Message-ID: <Pine.SUN.3.95q.970308104818.245T-100000@enoshima>

On Sat, 8 Mar 1997, Misha Wolf wrote:

> I'd like some advice on the use of non-standard HTML attributes, in relation 
> both to the LANG attribute and to Metadata.

> A similar problem arises with the use of Metadata.  At the DC-4 Metadata 
> Workshop in Canberra (March 3-5), we agonised over a difficult choice:
>    1.  Use a clean syntax for "qualified" (explained below) Metadata, even 
>        though it would rely on a use of attributes not defined in HTML 2.0/3.2.
>    2.  Use a dirty (difficult to parse) syntax, conformant with HTML 2.0/3.2.
> Consider:

[Solution A]
>    <META NAME = "DC.DATE" CONTENT = "(scheme=ISO-8601) 1997-03-07">
> or:

[Solution B]
>    <META NAME = "DC.DATE" SCHEME = "ISO-8601" CONTENT = "1997-03-07">

> Furthermore, we want to be able to qualify the language of the value of the 
> CONTENT attribute, eg:
>    <META NAME = "DC.SUBJECT" SCHEME= "XYZ" LANG = "xy" CONTENT = "Something or other">

I have just heard about these things yesterday at a workshop on
cross-lingual information retrieval from a participant in Canberra.

I almost immediately made the comment that adopting solution A
for short-term, to be officially compatible with existing DTDs,
would be very clumsy. The problems with escaping the "(" have
been discussed elsewhere; to this comes the problem that in
the end, there will be two different syntaxes.

The introduction of a "SCHEME" attribute to META could be done
rather quickly, as others have pointed out. In particular, it
should be pushed strongly if:

- It's the only additional attribute (LANG comes anyway) needed
	for well-structured META-Data.
- It's (potentially) useful to other ways of providing META-data.

> We fed something like the above to Microsoft Word '97.  When we examined the 
> saved file, the unknown attributes (SCHEME and LANG) had vanished, together with 
> their values.  This is, I suppose, one possible interpretation of the phrase 
> "should be ignored", in the earlier quote from RFC 1866.

I think MS Word provides some convenient features for HTML<->Word conversion
in many practical cases, but I guess to really work reasonably with
DC (or whatever) metadata, tools would have to be updated anyway.

> In any event, even if Word were persuaded to ignore more gently, the proverbial 
> HTML validator would complain if offered HTML like the above.
> How do we reconcile (i) being, in some minor sense, on the leading edge and 
> (ii) wanting to encourage our users to generate "legal" HTML and to use validators 
> to make sure it is legal?

It's not such a big problem to construct a DTD and a validator to assure
it is legal "HTML X.X + LANG/SCHEME".

Regards,	Martin.