Re: Web site accessibility-layers

On Thu, 17 Apr 2003, David Woolley wrote:

> 
> > attributes. It becomes readily apparent when you try to validate an 
> > XHTML Transitional file with an XHTML Strict DTD.
> 
> You have been able to do the same with HTML 4.01 Strict for the last 5
> years.  XHTML 1.0 is essentially a rewrite of the grammars of HTML
> 4.01, both Transitional, and Strict, within the constraints of XML.
> That means that various content model rules can no longer be enforced,
> and means that the browser cannot infer tags, both opening and closing
> (note that HTML 4.01 with omitted tags is just as structured as XHTML
> as the location of the missing tags is precisely defined).  At best it
> makes one more aware that elements are strictly nested and where they end.

A good summary.  But worth adding that OpenSP offers options to
impose stricter parse rules.  Page Valet now offers three levels
as options to the user; the strictest of the three parse modes
will insist on all implied tags being explicit.

> It shouldn't take very long to create a variant HTML 4.01 DTD with
> all the "O O" and "- O"s replaced by "- -"s and therefore have HTML
> with no optional tags, if you want to validate for that case.  On
> the other hand, you can just run the document through sgmlnorm, to
> put in all the implied tags.

Indeed, there are various such DTDs publicly available.  But probably
of little value unless you plan to use a lesser SGML engine.  I'm
not aware of anyone parsing HTML with an SGML parser that isn't based
on SP or OpenSP.

> > Do you have examples of this? One thing I've noticed is that older 
> > browsers will break if you don't include a space before the closing 
> > slash, but given the space (and the text/html content type), I've never 
> > had a problem.
> 
> Newer browsers always break if you include the closing /, as the SGML
> definition of HTML invokes a behaviour in which <em/Emphasized Text/
> is valid markup.

Indeed, and it's worse than just that.

<blockquote cite="http://valet.webthing.com/page/parsemode.html">

  Strict SGML

    Strictly speaking, this is the only mode that offers true HTML
    validation. However, this allows various constructs that will break
    in most mainstream browsers, and should be used with caution.

    For example, the following is a valid HTML document:

        <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
        <title/HTML example/
        <p<p/This is the second paragraph in this document.
        The first was empty.

</blockquote>

Both Page Valet and the WDG validator have options to disallow this
kind of thing.

>	  Whilst I may have been overstating things slightly,
> any browser based on an SGML engine and using the published HTML 4.01
> DTDs will misparse any non-trivial XHTML 1.0 document.

That's not necessarily true.

I've been battling with the complexities of supporting Appendix C,
and I believe I've solved it in Page Valet.  The basic premise is
to sniff the document, and if an XHTML 1.0 FPI is encountered,
parse it using XML rules.  There are still ambiguous choices to
be made: if a document served as text/html has a BOM or an XML decl
but is not XHTML 1.0, how do we parse it?  Valet solves that by
parsing as XML but also complaining that the document can't legally
be served as text/html.


> > An mime type example for Patrick: I currently serve my site as valid 
> > XHTML 1.1 but with the incorrect mime type of "text/html" for 
> 
> This is a SHOULD NOT in the rules for using Content Types
> for XHTML.  Using text/html for XHTML 1.0 is a MAY.  See
> <http://www.w3.org/TR/xhtml-media-types/>.

Indeed.  The pertinent question here is *why*?  What do you gain
by calling a document XHTML 1.1 without explicitly using XML?

-- 
Nick Kew

Received on Friday, 18 April 2003 14:37:12 UTC