Re: The failure of Appendix C as a transition technique (was: Re: Draft Minutes of 2013-02-14 TAG telcon)

On Fri, Feb 22, 2013 at 9:22 AM, Larry Masinter <masinter@adobe.com> wrote:
> The implementors of Appendix C failed to implement it correctly.

Who are you referring to?

> Documents delivered as text/html should be parsed as HTML.

This is what browsers do and have done after the episode from 2000
that I recounted.

> Documents delivered as application/xhtml+xml should be parsed as XHTML/XML.

This is what browsers do.

>> In December of 2000,

s/December/summer/ as noted earlier.

>> before the release of Netscape 6, Gecko had an
>> HTML parser mode called the Strict DTD. The "DTD" wasn't an SGML DTD.
>> Instead, it was a C++ class that implemented the containment rules
>> declared in the SGML DTD. Strict DTD threw away a markup that violated
>> the HTML 4 Strict containment rules but didn't stop parsing up our
>> error.
>
> This doesn't make sense. Why would they do such a thing.

Supporting standards was in vogue and the big thing at Netscape
relative to Microsoft at that time. I guess the "Strict DTD" was Rick
Gessner's interpretation of supporting HTML4 and SGML. After all,
HTML4 isn't clear on what should happen.

> And what does
> it have to do with XML anyway?

The "Strict DTD" was used for XHTML-as-text/html for a short period of
time in 2000. (It didn't make it to the Netscape 6 release, though.)

>> See:
>> https://groups.google.com/d/topic/netscape.public.mozilla.layout/7sdgGdjjZf
>> U/discussion
>> (The entire thread is an interesting read with the benefit of
>> hindsight. You can see I was still an XHTML believer at that time.)
>>
>> The thread resulted in a telecon, where, among other things, it was decided:
>> "- Parse XHTML delivered as text/html using the XML content sink with
>> an HTML document. (Instead of using the Strict DTD, which we do
>> today.)"
>
> This was a serious mistake. Text delivered as text/html should be
> parsed as HTML.

Right. The decision to parse XHTML-as-text/html as XML didn't last for long.

>> That decision lasted for less than a month. IIRC, it was already too
>> late to parse even the front page of O'Reilly's xml.com as XML.
>
> A publisher reacting to a widely distributed but mistaken browser
> implementation isn't evidence of anything.

What O'Reilly did arose as the natural consequence of the behavior you
advocate: parsing text/html as HTML. It was evidence that it's not
practical to parse text/html as XML.

>> And so it has been ever since. Appendix C content wasn't transitioning
>> anywhere.
>
> This wasn't the fault of Appendix C but of confusion about how to apply it.

So if your position is that text/html must be parsed as HTML, how
could Appendix C have transitioned to XML parsing if confusion had
been absent?

> I think polyglot is useful, but only if people don't try to second-guess what
> is the publisher's responsibility to label content with a content-type that
> is appropriate for parsing the content.

I agree with you that text/html should be parsed as HTML, but I don't
see how polyglot is useful if one parses text/html as HTML with a
conforming HTML parser.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Friday, 22 February 2013 10:10:15 UTC