Re: a more elegant fail on a kindlegen mobipocket pagebreak command from Michael[tm] Smith on 2012-02-10 (www-validator@w3.org from February 2012)

From: Michael[tm] Smith <mike@w3.org>
Date: Fri, 10 Feb 2012 17:34:23 +0900
To: Bowerbird@aol.com
Cc: www-validator@w3.org
Message-ID: <20120210083420.GV92826@sideshowbarker>

Bowerbird@aol.com, 2012-02-09 00:28 -0500:

> >   <mbp:pagebreak />
> it's the kindlegen command to throw a pagebreak
> when creating a mobipocket file (for the kindle)...

Wonderful.

If you're able to, try <mbp:pagebreak></mbp:pagebreak>

Browsers don't pay any attention to "/>" in HTML pages. If you want to use
that syntax you need to use XHTML and namespaces.

> here's an example file with the offending outrage:
> >    http://zenmagiclove.com/prapr-reg.html
> and the same file, without the devil pagebreaks:
> >    http://zenmagiclove.com/prapr-vme.html
> 
> now of course your validator should _raise_a_flag_
> when it encounters such a monstrosity, but lately,
> it has been choking completely, which is not good.

Choking how? You mean reporting "Element mbp:pagebreak not allowed as child
of element div in this context. (Suppressing further errors from this
subtree."?

> perhaps you could cause it to fail more elegantly?

How?

The validator has a conforming HTML5 parser that parses markup in the same
way browser parsers do (in fact its the same parser used in Firefox).

When you feed http://zenmagiclove.com/prapr-reg.html to a browser, this is
what you end up with in the DOM:

  http://software.hixie.ch/utilities/js/live-dom-viewer/?%3Cdiv%20id%3Dchunk8%3E%E2%86%A9%0A%3Chr%3E%3Cmbp%3Apagebreak%20%2F%3E%E2%86%A9%0A%E2%86%A9%0A%3Ca%20id%3D%22table_of_contents%22%3E%3C%2Fa%3E%E2%86%A9%0A%3Cp%20style%3D%22text-align%3Aright%3B%22%3E%E2%86%A9%0A%3Ca%20href%3D%23pride_and_prejudice%3E%26lt%3B-%3C%2Fa%3E%20%26nbsp%3B%20%26nbsp%3B%20%E2%86%A9%0A%3Ca%20href%3D%23pride_and_prejudice%3E-c-%3C%2Fa%3E%20%26nbsp%3B%20%26nbsp%3B%20%E2%86%A9%0A%3Ca%20href%3D%23chapter_1%3E-%26gt%3B%3C%2Fa%3E%26nbsp%3B%26nbsp%3B%26nbsp%3B%26nbsp%3B%3C%2Fp%3E%E2%86%A9%0A%3C%2Fdiv%3E%E2%86%A9%0A

  http://goo.gl/4KdTV

DIV id="chunk8"
| #text: ↩
| HR
| MBP:PAGEBREAK
| | #text: ↩ ↩
| | A id="table_of_contents"
| | #text: ↩
| | P style="text-align:right;"
| | #text: ↩
| | A href="#pride_and_prejudice"
| | #text: <-
| | #text:     ↩
| | A href="#pride_and_prejudice"
| | #text: -c-
| | #text:     ↩
| | A href="#chapter_1"
| | #text: ->
| | #text:     
| | #text: ↩

That is, everything after the <mbp:pagebreak /> is made a child node of
that mbp:pagebreak element. That's because HTML parsers don't pay any
attention to XML-isms like "/>" (self-closing tag syntax). So browsers
don't see that as an empty element -- they see it as a element with a start
tag but no end tag.

The validator is doing the right thing here by trying to alert you to
something that is seriously broken in your source. If you're serving your
content at HTML and want to make sure it's going to work the way you
intend, don't use <mbp:pagebreak />. Otherwise use XHTML5 and namespaces.

  --Mike

-- 
Michael[tm] Smith
http://people.w3.org/mike/+

Received on Friday, 10 February 2012 08:34:29 UTC