- From: Misha Wolf <misha.wolf@reuters.com>
- Date: Sat, 08 Mar 1997 01:27:46 +0000 (GMT)
- To: www-html <www-html@w3.org>, www-international <www-international@w3.org>
I'd like some advice on the use of non-standard HTML attributes, in relation both to the LANG attribute and to Metadata. RFC 1866 (Hypertext Markup Language - 2.0) states, in section 4.2.1: To facilitate experimentation and interoperability between implementations of various versions of HTML, the installed base of HTML user agents supports a superset of the HTML 2.0 language by reducing it to HTML 2.0: markup in the form of a start-tag or end- tag, whose generic identifier is not declared is mapped to nothing during tokenization. Undeclared attributes are treated similarly. The entire attribute specification of an unknown attribute (i.e., the unknown attribute and its value, if any) should be ignored. I haven't found a similar statement in the HTML 3.2 spec, but assume the above is inherited from the HTML 2.0 spec. Now, in common with many others, I am keen on the implementation of the LANG attribute, specified in RFC 2070 (Internationalization of the Hypertext Markup Language). [This attribute is not part of HTML 3.2, but is to be included in the next version of HTML.] I am also keen on the use of HTML validators. There is a tension between these two desires. A validator which checks for HTML 2.0/3.2 conformance will flag as erroneous the use of the LANG attribute. A similar problem arises with the use of Metadata. At the DC-4 Metadata Workshop in Canberra (March 3-5), we agonised over a difficult choice: 1. Use a clean syntax for "qualified" (explained below) Metadata, even though it would rely on a use of attributes not defined in HTML 2.0/3.2. 2. Use a dirty (difficult to parse) syntax, conformant with HTML 2.0/3.2. Consider: <META NAME = "DC.DATE" CONTENT = "1997-03-07"> The above syntax is unproblematic [Note: DC stands for Dublin Core]. In some cases, though, it is useful to qualify the Metadata, by naming the particular "scheme" used to encode the value of the CONTENT attribute. For dates, one such scheme is ISO 8601. The two syntaxes we discussed in Canberra are: <META NAME = "DC.DATE" CONTENT = "(scheme=ISO-8601) 1997-03-07"> or: <META NAME = "DC.DATE" SCHEME = "ISO-8601" CONTENT = "1997-03-07"> An even stronger need for qualification applies to subject classification schemes (of which there are many), as in: <META NAME = "DC.SUBJECT" CONTENT = "(scheme=XYZ) Something or other"> or: <META NAME = "DC.SUBJECT" SCHEME= "XYZ" CONTENT = "Something or other"> Furthermore, we want to be able to qualify the language of the value of the CONTENT attribute, eg: <META NAME = "DC.SUBJECT" SCHEME= "XYZ" LANG = "xy" CONTENT = "Something or other"> We fed something like the above to Microsoft Word '97. When we examined the saved file, the unknown attributes (SCHEME and LANG) had vanished, together with their values. This is, I suppose, one possible interpretation of the phrase "should be ignored", in the earlier quote from RFC 1866. In any event, even if Word were persuaded to ignore more gently, the proverbial HTML validator would complain if offered HTML like the above. How do we reconcile (i) being, in some minor sense, on the leading edge and (ii) wanting to encourage our users to generate "legal" HTML and to use validators to make sure it is legal? Misha
Received on Friday, 7 March 1997 20:26:36 UTC