- From: Maciej Stachowiak <mjs@apple.com>
- Date: Sun, 03 Jan 2010 18:53:28 -0800
- To: Larry Masinter <masinter@adobe.com>
- Cc: "public-html@w3.org" <public-html@w3.org>
- Message-id: <19C2EBFA-34B8-4C18-B71A-8A699AC19399@apple.com>
On Jan 2, 2010, at 4:21 PM, Larry Masinter wrote: > The proposal was updated significantly, based on comments. I’ve > tried to address the “compound” issue as well. > > Here’s a version with all the parts of a change proposal into a > single document. Since the discussion has been long, the rationale > is long. Thanks for the revisions. I've updated the issue status list to point to the new version. - Maciej > > Summary: > Describe the DOCTYPE element and provide for allowing DOCTYPE > definitions. > Rationale: > > The DOCTYPE has been part of HTML since its earliest versions, and > is still required. This change proposal makes its history and use > clearer, without introducing any HTML interpreter changes. > The HTML is intended to replace previous versions of the HTML > specification and the definition of the text/html MIME type. > Redefining a MIME type should not make previously conforming > documents non-conforming; even if features are > “deprecated” (conforming but not recommended), the conforming but > not-recommended constructs described completely. > This feature is “in scope”. There was an argument that features only > intended for use in “controlled environments” were not in scope for > the HTML working group. (This is discussed in http://lists.w3.org/Archives/Public/public-html/2010Jan/0013.html > ) > In particular, the working group intends to support “polyglot” > documents which are both valid XML and XHTML and also valid as HTML > text/html; since XML workflows often require a !DOCTYPE with a > PublicIdentifier and a SystemIdentifier, this increases the > footprint of “polyglot” documents. > Other ideas for including a new versioning mechanism have been > floated, e.g., an attribute on the <html> element. However, those > alternatives have disadvantages – they would introduce the > possibility of inconsistencies, where the DOCTYPE contains one > version string and the version attribute contains another, and have > little or no benefit. In particular, there were claimed advantages > of a version attribute on the html element rather than using DOCTPYE: > It was claimed that such a version indicator “easier to type > correctly from memory”: > If a HTML author is relying on memory, the author should leave out > the HTML version string and use the <!DOCTYPE html> form, since it > is clearly not a “controlled environment”. > In any case, a simpler version indicator is not useful because in > fact HTML evolves more continuously and a version indicator that was > easy to remember would not actually address the use cases where a > version indicator is actually useful. > It was claimed that such a version indicator would be “easier to > read”: > Even if it were true, “ease of reading HTML markup directly” is not > a strong design goal for HTML, compared to other uses. > The proposal below recommends omitting a version indicator except in > limited situations, and recommends readers ignore the version > indicator except for specific purposes, so that “ease of reading” > only matters in limited situations anyway. > Whether something is “easy to read” is not an independent factor, > but dependent on context and familiarity. Since the DOCTYPE element > is there anyway, and web authors are familiar with it, and it is > documented in every book, online tutorial and other HTML reference, > using “DOCTYPE” for a version indicator will result in documents > that are “easier to read” because of familiarity. > There was an argument that the change proposal was somehow related > to “vastly increased reverse-engineering costs”. This argument does > not apply to this change proposal, see http://lists.w3.org/Archives/Public/public-html/2010Jan/0011.html > . > The current HTML5 spec says the DOCTYPE is “mostly useless”. This > wording should change: > It was claimed that this means the same thing as “of limited > utility”. In fact, an informal survey showed that “mostly useless” > and “of limited utility” meant different things to a number of people: > “mostly useless” was much “stronger” > “mostly useless” meant that in almost all situations, the utility > was zero, while “of limited utility” meant that the utility was less > than expected but not uniformly different. > Even if “mostly useless” and “of limited utility” could mean the > same thing in some contexts, “mostly useless” was called “childish” > or “petulant” and “inappropriate in a formal standards document”. > Many of the arguments made in previous discussions about versions > and doctypes were not careful to distinguish between “version of > specification” and “version of implementation”. It should be noted > that many *want* a version indicator to note “version of > implementation”, i.e., as an indicator of “best viewed by FireFox > 4.0 or later” or some such. However, this change proposal is very > clearly providing for a version of a “specification”, and, in > particular, of the HTML specification, with the possibility of “mix” > specifications added. > Many of the arguments in previous discussions were arguing against > version-specific browser behavior. But this change proposal > specifically does NOT allow for (any additional) version-specific > behavior, and in fact explicitly disallows it. > There was one suggestion that, instead of PublicIdentifier and > SystemIdentifier, that ONLY the SystemIdentifier be allowed, but > that the RFC 3151 URN version of the PublicIdentifier might be > supplied, e.g., > <!DOCTYPE SYSTEM “urn:publicid:-:W3C+HTMLWG+hixie:nonsgml+html > +20100401:en”> > rather than > <!DOCTYPE PUBLIC “-//W3C HTMLWG hixie//NONSGML HTML 20100401//EN” about:legacy-compat > > > This suggestion is interesting but doesn’t seem improve anything > (since the URN isn’t easily resolvable) when considering > compatibility with existing deployed XML editing workflows. > While everyone *hopes* there are never going to be any further > incompatible changes to HTML in the future, there *is* a possibility > that in some unfortunate situation, it will be necessary to > introduce incompatible changes. In that case, it will be necessary > to introduce a new version indicator, to allow (alas) processors to > determine which of the incompatible interpretations was meant. While > this will be unfortunate, it would be doubly unfortunate to have to > introduce a new “place” for a version indicator that was previously > non-conforming, which would cause even worse uproar, because > documents that *didn’t* want the new incompatible behavior would > have no place to say explicitly that which version of the > incompatible behavior they wanted. By *allowing* a verison indicator > in conforming content today, we can avert more serious damage. > Having a location for a version indicator, even if it isn’t > explicitly used, allows it to be used at some point in the future. > In the history of computer languages, there are no languages that > have not evolved, been extended, or otherwise "versioned" as long as > the language has been in use. This applies to network protocols, > character encoding standards, programming languages, and certainly > to every known technology found on the web. There are no known cases > where a language hasn't gone through some at least minor > incompatible change. The standards process is established as a way > of evolving specifications and implementations in a way to reduce > the likelihood of complete failure to interoperate, but certainly > not to guarantee that no incompatible changes will be needed in the > future. > There was a suggestion that the final “EN” in the PublicIdentifier > might be omitted, but that didn’t seem to be allowed in the FPI > syntax after all, and if we’re going to be FPI compatible, might as > well pick up the whole thing. That’s why “NONSGML” was added too. > > > See also background document http://www.w3.org/2001/tag/doc/versioning-html/versioning-html-20090611.html > “Architectural Considerations for Language Versioning on the Web”. > > For additional rationale and discussion, seethe HTML WG tracker > ISSUE-4: http://www.w3.org/html/wg/tracker/issues/4 > > Impact: > > This proposal does not add any new headers or elements to HTML. It > more clearly shows the evolution and reasons for no longer relying > on DOCTYPE to affect browser behavior. > > This proposal does not require any changes to any browser or HTML > interpreter; existing behavior is maintained. > > It allows but does not require some validators to perform additional > validation, in that there may be additional validation based on the > PublicIdentifier or SystemIdentifier. As behavior does not depend > on the DOCTYPE, validating the DOCTYPE is not required. > > This proposal allows some HTML documents that were previously > conforming to remain conforming. It also allows the continued use > of PublicIdentifier and/or SystemIdentifier DOCTYPEs to be valid in > new documents. > Specific proposal: > > replace section 9.1.1 of the HTML5 specification with: > > 9.1.1 The DOCTYPE > > The DOCTYPE header element is a required element. Originally, when > HTML was defined as an application of SGML (see [ISO8879]), a valid > HTML document declared what version of HTML was used in the > document, with a document type declaration which named the document > type definition (DTD) in use for the document. In practice, web > authors have not been careful to consistently label versions, and > many, if not most, HTML documents on the web do not conform to the > DTD that they specify. > > It is common for implementations to trigger wildly different > behavior (“quirks” modes) due to the presence of specific DOCTYPE > declarations, or the absence of a declaration altogether; see > section 9.2.5.4 for details of this behavior. > > For these reasons, the DOCTYPE header is REQUIRED for HTML content > served as text/html (and optional for content served as an XML media > type), but supplying an explicit version indicator is NOT > RECOMMENDED except in limited circumstances. > > The syntax of the DOCTYPE element is: > > <!DOCTYPE html> > <!DOCTYPE html PUBLIC “PublicIdentifier” “SystemIdentifier”> > <!DOCTYPE html SYSTEM “about:legacy-compat”> > > <!DOCTYPE html> is the simplest, recommended form of the DOCTYPE > declaration. > The use of public identifiers (required in HTML 4.01) is discouraged > in this specification; some public identifiers may trigger different > behavior in deployed browsers (Section [#quirks-mode] in this > document and [hsvonin]). > The SystemIdentifier is syntactically a URI (not a “URL” or “IRI”). > The SystemIdentifier was intended to be a locator for downloading a > DTD and entity sets in generic SGML and XML processors, and some XML > workflows designed to produce HTML require either a well- > knownPublicIdentifier , or else a SystemIdentifier that can actually > be fetched. > The special URI “about:legacy-compat” is reserved for use as a > SystemIdentifier in a declaration of the form: > <!DOCTYPE html SYSTEM “about:legacy-compat”>. > Except for explicitly defined behavior (used to trigger “quirks > mode”, see section [#parse-behavior], [#quirks-mode] and [hsvonin]), > implementations which consume HTML MUST NOT use the DOCTYPE element > to trigger different processing behavior. > Implementations which validate HTML content SHOULD use the latest > version of this specification to validate against; validating only > against older specifications, or only against the indicated version, > is likely to be much less useful. See Section [#validation]. > HTML documents not served as an XML media type MUST include a > DOCTYPE header, since many browsers, in the absence of a DOCTYPE > header, will trigger a “quirks” mode of rendering. > Documents served as an XML media type MAY include a DOCTYPE header, > either to allow compatible content (so-called “polyglot” documents > which are both valid HTML and also valid XHTML) or to support > version-specific XML processing. While the DOCTYPE header is not > required, including may help in XHTML/HTML crossover. > > “html”, “PUBLIC” and “SYSTEM” are case insensitive, may have > additional spaces around them. The “PublicIdentifier” and > “SystemIdentifier” may use either double or single (apostrophe) > quote marks. > > Note that XML allows additional forms of DOCTYPE declarations which > are; however, this proposal is compatible with most widely deployed > XML software. > > In most instances, the simple <!DOCTYPE html> form is all that is > required or recommended. The form with the “SYSTEM about:legacy- > compat” is provided to allow for XSLT processors. > > 9.1.1.1 Public Identifier > > A PublicIdentifier SHOULD NOT be used unless the content is being > managed in a controlled environment where the intended version is > known, and the document is well-formed; this might be the case in > some XML-based workflows and editing environments, or content > management systems and other production workflows. > > Even though HTML is no longer being defined as an SGML application, > previous versions of HTML were, and so the format of > PublicIdentifier was defined to be consistent with Formal Public > Identifiers of SGML (http://xml.coverpages.org/tauber-fpi.html). > > Until this specification is approved as a W3C recommendation, the > PublicIdentifier MAY identifying the specification referenced and > its date. The pattern for the PublicIdentifier is simple. The > primary template is only the date in yyyymmdd terms: > > “-//WHATWG//NONSGML HTML 20100401//EN" > for the 2010 April 1 version of the WhatWG edition of the > specification. > “-//W3C HTMLWG//NONSGML HTML 20100401//EN” for > the HTML working group editor’s draft of the same date. > > If multiple alternative specifications are available in a > committee, the draft’s or author’s nickname or handle may be used to > distinguish which specification is being referenced, e.g., > > “-//W3C HTMLWG hixie//NONSGML HTML 20100401//EN” > “-//W3C HTMLWG manu//NONSGML HTML 20100401//EN” > > When this specification becomes a W3C Recommendation, and only then, > the PublicIdentifier: > “-//W3C//NONSGML HTML 5.0//EN” > may be used. > > However, HTML documents MUST NOT use “-//W3C//NONSGML HTML 5.0//EN” > until the edition of this specification referenced is actually > approved and published as a W3C Recommendation. > > Note that non-standard behavior may ensue from using any of many > well-known Public Identifiers; these were chosen not to trigger any > such behavior. > > 9.1.1.2 PublicIdentifier for compound specifications > > Note that a PublicIdentifier only identifies a single specification, > not a complete implementation, a suite of specifications, or a > combination of vocabularies from multiple specifications. In order > to construct a PublicIdentifier for such a combination requires > publication of an actual specification which describes that > combination. > > Groups wishing to support the combination of HTML and other > specifications may supply short specifications showing how > additional vocabularies may be used with HTML; for example, a short > document “how to use RDFa with HTML” might be published. (This > document would reference RDFa and HTML but not include either > specification). In such case, the “+” format might be used: > > “-//W3C RDFAWG//NONSGML HTML+RDFa 20100401//EN” might reference the > HTML+RDFA document published by the RDFA working group. > > The W3C Hypertext coordination group is encouraged to coordinate > assignment of public identifiers. > > 9.1.1.3 SystemIdentifier > > The SystemIdentifier is a URL, either relative or absolute. > If no PublicIdentifier is supplied, the effect is to not have a > version at all. In this case, the SystemIdentifier “about:legacy- > compat” should be used: <!DOCTYPE SYSTEM “about:legacy-compat”> > > If a PublicIdentifier is supplied, the SystemIdentifier may be: > > An actual address (URL) of a DTD and other XML material, as per the > XML specification, which can be fetched and used by an XML > processor. Note that W3C does not intend to supply or publish any > such URLs or DTDs. Note that no current URL used in HTML would > occur. This usage should only be used if the URL is actually > resolvable. > The empty string, “” . This system identifier can be used in > situations where there is no fetchable material related to the XML > forms, but that a specific version indicator is wanted and supplied > by the PublicIdentifier. > > >
Received on Monday, 4 January 2010 02:54:04 UTC