RE: Suggested revised text for HTML/XML report intro from Robert Leif on 2011-08-19 (public-html-xml@w3.org from August 2011)

From: Robert Leif <rleif@rleif.com>
Date: Fri, 19 Aug 2011 14:26:09 -0700
To: "'Larry Masinter'" <masinter@adobe.com>, "'Noah Mendelsohn'" <nrm@arcanedomain.com>, "'John Cowan'" <cowan@mercury.ccil.org>
Cc: "'Anne van Kesteren'" <annevk@opera.com>, <public-html-xml@w3.org>
Message-ID: <0a7301cc5eb6$9d3fba80$d7bf2f80$@rleif.com>
Larry Masinter et al.

The last version of XHTML 1.0, which I used, had 3 levels of strictness:
strict, transitional, and frameset

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Frameset did not seem to be much used and I have no experience with it.

The important point is that future XHTML5 pages do not need to be all
validated to the same level. I learned by using strict validation that there
were some cases where it did not work. Therefore since only a default should
be set at the beginning of the page, I had to use transitional. The parsing
level needs to be able to be overruled at a finer level of granularity, such
as a div element or one of the new elements used to divide up a screen or a
page. As for DOCTYPE statements, they should be replaced by an XML element
and considered to be legacy code. At worst, we should be stuck with the
present HTML5 <!DOCTYPE html> and nothing more.

Microsoft Expression Web stated that the following was compatible with
HTML5. Can one do this with any other tool?

<!DOCTYPE html>
<html>
            <head>
                        <meta content="text/xhtml+xml" charset="utf-8"
http-equiv="Content-Type" />
                        <meta content="en-us" http-equiv="Content-Language"
/>
                        <title>Untitled 1</title>
            </head>
            <body> </body>
</html>

 

If xhtm5 can import and/or include other schemas, then it would be
extensible. Extensibility will provide flexibility by eliminating many of
the cases that would require upgrading a standard. Eventually, an exception
handler has to be included. There are many cases where it is hazardous for a
syntax error to abort the operation of an xhtml or xml page. Aborting a
control screen while an airplane is in flight is such an example. The
exception needs to include a recovery process. It often helps to log
exceptions and the values of the variables when it occurred. The use of
assertions as described in XSD1.1 should help. XSD1.1 includes a good part
of Schematron. 

 

Somewhere in the document, we need to say that the impediments present in
html5 to combine with xml are sufficient to indicate that the probability of
success is low; however, this appears to not be the case with xhtml5. Then a
new group, which could consist at least in part of the members of the old
group, could work on xhtml5. If that occurs, I would like to make a slightly
heretical suggestion. The initial scope of the deliverables  for the initial
version of xhtml5 be minimal but include one or more means for extensibility
and the capacity to include elements based on XML schemas that are
essentially invisible to the XHTML parser. The result XHTML5 pages would be
a mosaic of XHTML5 and XML elements derived from XML schemas. I would like
to follow the practice of the Digital Imaging and Communications in Medicine
(DICOM) (http://medical.nema.org/) standard and issue supplements to the
standard between versions. The use of supplements is consistent with the
spiral method of development. A hiatus of 5 to 10 years between versions of
a standard can greatly interfere with the adoption of a standard. For
instance, this lack of flexibility was a major cause of the lack of general
acceptance of the Ada programming language.

From: public-html-xml-request@w3.org [mailto:public-html-xml-request@w3.org]
On Behalf Of Larry Masinter
Sent: Wednesday, August 17, 2011 6:36 PM
To: Noah Mendelsohn; John Cowan
Cc: Anne van Kesteren; public-html-xml@w3.org
Subject: RE: Suggested revised text for HTML/XML report intro

 

Going back to: 

"Where HTML goes to great lengths to defined how an agent must recover
from markup errors, XML is unforgiving in the face of markup errors."

 

and its possible replacement:

"Where HTML defines how an agent must process a document irrespective of
markup errors, XML requires an agent to halt processing in the face of
markup errors."

(and various follow-ons)

 

I want to suggest a different perspective, that this difference is not about
the languages but about the specification styles of the current definitions
of those languages.

 

In general, a communication protocol is a set of conventions for exchanging
messages in or between computing systems, and  the formats of those message
and components of those formats. Simple formats and components of them are
protocol elements, while language is a complex message format.  

 

In general, for robust communication, senders of messages (and thus senders
of documents in a  language used in messages) should be conservative in what
they send, while receivers of messages (parsers, interpreters) should be
liberal in what they accept.  A language definition might include both the
rules for conservative senders--how to construct 'correct' (or well-formed
or valid) -- and also for liberal receivers (giving a liberal parsing
algorithm).

 

In the development of HTML (at least in some parts of the community) the
observation that many instances of HTML were generated by hand or by string
manipulation led to an emphasis on specifying a normative behavior for
liberal receivers - going to great lengths to define how an agent must
process a document irrespective of  markup errors.

 

((  The TAG insisted on there also being a normative language definition (an
'authoring' specification) that could be reviewed independent of the
conformance rules given for parsers; my hope was for a specification useful
for conservative generators of HTML documents.))

 

In the development of XML and XHTML, the workflows of creation of XML-based
documents (and thus XHTML documents) using structure-based software systems
were more in the forefront of consideration, and the liberal handling of
mal-formed documents not specified or even disallowed.

 

I don't think this difference is intrinsic to the HTML / XHTML languages as
much as it is to the specification style and priority given to workflows.

 

I encourage the task force to review the report and more carefully
distinguish those differences that are intrinsic to the languages vs. those
differences that are attributable to the specification styles and the
workflows emphasized. I think doing so might help make progress in
reconciling some of the differences.

 

For example, if you say: "Where HTML goes to great lengths to defined how an
agent must recover   from markup errors, XML is unforgiving in the face of
markup errors."

 

But what "goes to great lengths" is not "HTML" but the current main W3C HTML
specification (and not, for example, the normative language reference). What
is "unforgiving" is not "XML" but rather an XML parser conforming to the
current XML specification.

 

Larry

--

http://larry.masinter.net
Received on Friday, 19 August 2011 21:26:42 UTC