RE: Suggested revised text for HTML/XML report intro from Larry Masinter on 2011-08-18 (public-html-xml@w3.org from August 2011)

From: Larry Masinter <masinter@adobe.com>
Date: Wed, 17 Aug 2011 18:36:00 -0700
To: Noah Mendelsohn <nrm@arcanedomain.com>, John Cowan <cowan@mercury.ccil.org>
CC: Anne van Kesteren <annevk@opera.com>, "public-html-xml@w3.org" <public-html-xml@w3.org>
Message-ID: <C68CB012D9182D408CED7B884F441D4D05D41A80D9@nambxv01a.corp.adobe.com>

Going back to:

"Where HTML goes to great lengths to defined how an agent must recover   from markup errors, XML is unforgiving in the face of markup errors."



and its possible replacement:

"Where HTML defines how an agent must process a document irrespective of  markup errors, XML requires an agent to halt processing in the face of  markup errors."

(and various follow-ons)



I want to suggest a different perspective, that this difference is not about the languages but about the specification styles of the current definitions of those languages.



In general, a communication protocol is a set of conventions for exchanging messages in or between computing systems, and  the formats of those message and components of those formats. Simple formats and components of them are protocol elements, while language is a complex message format.



In general, for robust communication, senders of messages (and thus senders of documents in a  language used in messages) should be conservative in what they send, while receivers of messages (parsers, interpreters) should be liberal in what they accept.  A language definition might include both the rules for conservative senders--how to construct 'correct' (or well-formed or valid) -- and also for liberal receivers (giving a liberal parsing algorithm).



In the development of HTML (at least in some parts of the community) the observation that many instances of HTML were generated by hand or by string manipulation led to an emphasis on specifying a normative behavior for liberal receivers - going to great lengths to define how an agent must process a document irrespective of  markup errors.



((  The TAG insisted on there also being a normative language definition (an 'authoring' specification) that could be reviewed independent of the conformance rules given for parsers; my hope was for a specification useful for conservative generators of HTML documents.))



In the development of XML and XHTML, the workflows of creation of XML-based documents (and thus XHTML documents) using structure-based software systems were more in the forefront of consideration, and the liberal handling of mal-formed documents not specified or even disallowed.



I don't think this difference is intrinsic to the HTML / XHTML languages as much as it is to the specification style and priority given to workflows.



I encourage the task force to review the report and more carefully distinguish those differences that are intrinsic to the languages vs. those differences that are attributable to the specification styles and the workflows emphasized. I think doing so might help make progress in reconciling some of the differences.



For example, if you say: "Where HTML goes to great lengths to defined how an agent must recover   from markup errors, XML is unforgiving in the face of markup errors."



But what "goes to great lengths" is not "HTML" but the current main W3C HTML specification (and not, for example, the normative language reference). What is "unforgiving" is not "XML" but rather an XML parser conforming to the current XML specification.



Larry

--

http://larry.masinter.net

Received on Thursday, 18 August 2011 01:37:02 UTC