W3C home > Mailing lists > Public > www-validator@w3.org > December 2016

Re: v.Nu: is a conformant XHTML document also, by definition, conformant HTML?

From: Michael[tm] Smith <mike@w3.org>
Date: Tue, 27 Dec 2016 14:04:29 +0900
To: Graham Hannington <graham_hannington@fundi.com.au>
Cc: "www-validator@w3.org" <www-validator@w3.org>
Message-ID: <20161227050429.d4gxpnd2t4ijgdpi@sideshowbarker.net>
Graham Hannington <graham_hannington@fundi.com.au>, 2016-12-22 13:01 +0800:
> Archived-At: <http://www.w3.org/mid/OFDAE003D8.491D84FB-ON48258091.0014B93A-48258091.001BA5F7@LocalDomain>
> 
> I think the answer is yes.

The answer is not yes. There are some documents that are conforming when
parsed as XML but not when parsed as text/html.

A common example is a document that uses self-closing tags for elements
that require an end tag when parsed as text/html; for example:

<script src="foo.js"/> <- valid when parsed as XML, but not as text/html

A similar case in XML is, void elements can have end tags; e.g.,
<br></br> or even <link rel="stylesheet" href="foo.css"></link>

But in text/html those are errors.

Another difference is that an XML documents without a doctype is conformant
but a text/html document without a doctype is not.

Those are just a few of the differences. There are more.

> And I'm anticipating someone pointing out how one implicitly, by
> definition, follows from the other. Still, I'd be happier to see a
> concise, explicit statement to this effect in the HTML Living Standard.

The relevant part of the spec explaining that there are important
differences between the text/html and XML documents is here:

  https://html.spec.whatwg.org/multipage/introduction.html#html-vs-xhtml

If you believe it would help to have the spec say more than about it than
that, it would probably be good to raise a PR with some proposed language.

...
> I've just seen the following note in section 13.1 of the standard:
> 
> > The XML syntax for HTML was formerly referred to as "XHTML", but this
> specification does not use that term (among other reasons, because no such
> term is used for the HTML syntaxes of MathML and SVG).
> 
> "XHTML" is oldspeak, huh? ;-)

As mentioned in https://github.com/whatwg/html/commit/643d1bce and in that
paragraph cited above, we have no corresponding terms like “HSVG” (or
something) and “HMathML” to refer to SVG and MathML served in text/html.

Another reason that when we refer to “XHTML”, many (maybe most) people
still seem to assume we mean XHTML1, not HTML5.

Also, some people think “XHTML” can just mean serving a document with
quoted attribute values and no omitted end tags, etc., as text/html.

So it makes sense to avoid that ambiguity and to instead be very clear that
the difference is how the document gets parsed: if it’s parsed with an HTML
parser or if it’s parsed with an XML parser.

...
> Suppose I have an XHTML document (er, "an HTML document written in the XML
> syntax"?) for which v.Nu reports:
> 
> > Using the preset for XHTML...
> > The document validates according to the specified schema(s)
> 
> I want to know - without actually checking - that, if I were to use v.Nu to
> check the same document as HTML, v.Nu would still report "The document
> validates...".

You can’t know for certain. It may be an XML instance that has no conditions
that are non-conforming in text/html, or it may be one that does.

As I mentioned above, probably the simplest case is just a document lacking
a doctype. The checker will report no error for that document if it is served
as XML, but the checker will report an error if it is served as text/html.

...
> Should I be satisfied that the term "XML syntax for HTML" means,
> implicitly, that there is a way to express any HTML using XML syntax, and
> that the XML syntax will always be conformant HTML?
...

No, because as explained above, that’s not true.

  —Mike

-- 
Michael[tm] Smith https://people.w3.org/mike

Received on Tuesday, 27 December 2016 05:05:02 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 27 December 2016 05:05:06 UTC