W3C home > Mailing lists > Public > www-international@w3.org > January to March 1997

Re: LANG + Metadata + unknown attributes

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Sat, 8 Mar 1997 11:02:29 +0100 (MET)
To: Misha Wolf <misha.wolf@reuters.com>
cc: www-html <www-html@w3.org>, www-international <www-international@w3.org>
Message-ID: <Pine.SUN.3.95q.970308104818.245T-100000@enoshima>
On Sat, 8 Mar 1997, Misha Wolf wrote:

> I'd like some advice on the use of non-standard HTML attributes, in relation 
> both to the LANG attribute and to Metadata.

> A similar problem arises with the use of Metadata.  At the DC-4 Metadata 
> Workshop in Canberra (March 3-5), we agonised over a difficult choice:
> 
>    1.  Use a clean syntax for "qualified" (explained below) Metadata, even 
>        though it would rely on a use of attributes not defined in HTML 2.0/3.2.
> 
>    2.  Use a dirty (difficult to parse) syntax, conformant with HTML 2.0/3.2.
> 
> Consider:


[Solution A]
>    <META NAME = "DC.DATE" CONTENT = "(scheme=ISO-8601) 1997-03-07">
> or:

[Solution B]
>    <META NAME = "DC.DATE" SCHEME = "ISO-8601" CONTENT = "1997-03-07">

> Furthermore, we want to be able to qualify the language of the value of the 
> CONTENT attribute, eg:
> 
>    <META NAME = "DC.SUBJECT" SCHEME= "XYZ" LANG = "xy" CONTENT = "Something or other">

I have just heard about these things yesterday at a workshop on
cross-lingual information retrieval from a participant in Canberra.

I almost immediately made the comment that adopting solution A
for short-term, to be officially compatible with existing DTDs,
would be very clumsy. The problems with escaping the "(" have
been discussed elsewhere; to this comes the problem that in
the end, there will be two different syntaxes.

The introduction of a "SCHEME" attribute to META could be done
rather quickly, as others have pointed out. In particular, it
should be pushed strongly if:

- It's the only additional attribute (LANG comes anyway) needed
	for well-structured META-Data.
- It's (potentially) useful to other ways of providing META-data.


> We fed something like the above to Microsoft Word '97.  When we examined the 
> saved file, the unknown attributes (SCHEME and LANG) had vanished, together with 
> their values.  This is, I suppose, one possible interpretation of the phrase 
> "should be ignored", in the earlier quote from RFC 1866.

I think MS Word provides some convenient features for HTML<->Word conversion
in many practical cases, but I guess to really work reasonably with
DC (or whatever) metadata, tools would have to be updated anyway.


> In any event, even if Word were persuaded to ignore more gently, the proverbial 
> HTML validator would complain if offered HTML like the above.
> 
> How do we reconcile (i) being, in some minor sense, on the leading edge and 
> (ii) wanting to encourage our users to generate "legal" HTML and to use validators 
> to make sure it is legal?

It's not such a big problem to construct a DTD and a validator to assure
it is legal "HTML X.X + LANG/SCHEME".

Regards,	Martin.
Received on Saturday, 8 March 1997 05:01:51 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:47 GMT