Date: Sat, 8 Mar 1997 11:02:29 +0100 (MET) From: "Martin J. Duerst" <email@example.com> To: Misha Wolf <firstname.lastname@example.org> cc: www-html <email@example.com>, www-international <firstname.lastname@example.org> Subject: Re: LANG + Metadata + unknown attributes In-Reply-To: <2146270108031997/A07422/REDMS1/11B3405B2A00*@MHS> Message-ID: <Pine.SUN.3.95q.970308104818.245T-100000@enoshima> On Sat, 8 Mar 1997, Misha Wolf wrote: > I'd like some advice on the use of non-standard HTML attributes, in relation > both to the LANG attribute and to Metadata. > A similar problem arises with the use of Metadata. At the DC-4 Metadata > Workshop in Canberra (March 3-5), we agonised over a difficult choice: > > 1. Use a clean syntax for "qualified" (explained below) Metadata, even > though it would rely on a use of attributes not defined in HTML 2.0/3.2. > > 2. Use a dirty (difficult to parse) syntax, conformant with HTML 2.0/3.2. > > Consider: [Solution A] > <META NAME = "DC.DATE" CONTENT = "(scheme=ISO-8601) 1997-03-07"> > or: [Solution B] > <META NAME = "DC.DATE" SCHEME = "ISO-8601" CONTENT = "1997-03-07"> > Furthermore, we want to be able to qualify the language of the value of the > CONTENT attribute, eg: > > <META NAME = "DC.SUBJECT" SCHEME= "XYZ" LANG = "xy" CONTENT = "Something or other"> I have just heard about these things yesterday at a workshop on cross-lingual information retrieval from a participant in Canberra. I almost immediately made the comment that adopting solution A for short-term, to be officially compatible with existing DTDs, would be very clumsy. The problems with escaping the "(" have been discussed elsewhere; to this comes the problem that in the end, there will be two different syntaxes. The introduction of a "SCHEME" attribute to META could be done rather quickly, as others have pointed out. In particular, it should be pushed strongly if: - It's the only additional attribute (LANG comes anyway) needed for well-structured META-Data. - It's (potentially) useful to other ways of providing META-data. > We fed something like the above to Microsoft Word '97. When we examined the > saved file, the unknown attributes (SCHEME and LANG) had vanished, together with > their values. This is, I suppose, one possible interpretation of the phrase > "should be ignored", in the earlier quote from RFC 1866. I think MS Word provides some convenient features for HTML<->Word conversion in many practical cases, but I guess to really work reasonably with DC (or whatever) metadata, tools would have to be updated anyway. > In any event, even if Word were persuaded to ignore more gently, the proverbial > HTML validator would complain if offered HTML like the above. > > How do we reconcile (i) being, in some minor sense, on the leading edge and > (ii) wanting to encourage our users to generate "legal" HTML and to use validators > to make sure it is legal? It's not such a big problem to construct a DTD and a validator to assure it is legal "HTML X.X + LANG/SCHEME". Regards, Martin.