W3C home > Mailing lists > Public > public-html@w3.org > September 2008

Re: ISSUE-54 (html5-doctype-vs-xslt): XSLT 1.0 can not generate HTML5 documents [HTML 5 spec]

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 04 Sep 2008 16:18:15 +0200
Message-ID: <48BFEE27.2040609@gmx.de>
To: "Michael(tm) Smith" <mike@w3.org>
CC: Jirka Kosek <jirka@kosek.cz>, public-html@w3.org, Henri Sivonen <hsivonen@iki.fi>

Michael(tm) Smith wrote:
> I noticed that so far there's not been any specific response to
> the following part of one of Henri's messages in this thread -
> 
> Henri Sivonen <hsivonen@iki.fi>, 2008-08-28 13:08 +0300:
> 
>>  On Jul 5, 2008, at 00:44, Jirka Kosek wrote:
>>> Of course there is second issue on which you really elaborate in your
>>> email and this is how to extend some *future version* of XSLT language
>>> and its implementation to support all bits of HTML5. I almost agree with
>>> your analysis on this issue.
>>  The issues can be fixed without changing the XSLT language. I released 
>>  version 1.1.0 of the Validator.nu HTML Parser the other day. The package 
>>  comes with a sample program that uses an unmodified XSLT engine (whatever 
>>  you have set as the TrAX default) with an HTML5 parser and an HTML5 
>>  serializer. There's running code for addressing the issues *today*.
>>  http://about.validator.nu/htmlparser/
> 
> I suspect that if you object to it, the response is likely going
> to be that, as great as having something like that is, using that
> or something similar instead of just using a stock/off-the-shelf
> XSLT engine is something that creates and additional burden or
> hurdle for developers.

Indeed.

Also, it doesn't help with XSLT engines on other platforms, nor with 
XSLT engines inside UAs.

> ...
> So now all those new authors would have to learn -- and their
> teachers would have to teach -- that, well, things are a bit more
> complicated than just <!DOCTYPE html> because, for certain cases
> that they really are not likely to have any good understanding of
> at the time they first learn it, they need to know that the
> doctype can optionally also be in the form
> <!DOCTYPE HTML PUBLIC "FUBAR"> (or whatever). And they also need
> to know that they should never actually use a doctype in that form
> if they are using the normal kinds of authoring tools that they're
> likely to be learning with...
> ...

Understood, and agreed that this is a downside.

On the other hand, a *good* teacher would teach not to edit directly, 
but to use a proper set of tools. For instance, editing in XHTML, and 
then serializing to text/html.

We *know* that people get things wrong when they aren't using the proper 
tools for serializing the document, in particular as HTML5 virtually 
guarantees they'll never notice a mistake unless they use a validator.

So, trying to finally make my point... :-)

Making it easier to produce HTML5 "by hand" while making it harder to do 
it with existing libraries will make it more likely that broken content 
gets produced.

> This seems like a case where we really should be carefully
> considering our "Priority of Constituencies" design principle
> ("costs or difficulties to the user should be given more weight
> than costs to authors; which in turn should be given more weight
> than costs to implementors..."), and really looking carefully at
> who we want to put the costs on in this case.

Users are not involved here, as far I can tell.

Authors are. Our goal should be that authors use the proper tools to 
generate HTML, instead of directly editing it, or relying on simple 
string concatenation. We know where that leads to (parse errors, script 
injections...)

> What we have should we make <!doctype html> the only conformant
> doctype is]: If developers use stock XSLT engines to generate their
> output and If they try to validate that output using an HTML5
> conformance checker, they are going to get one message, one time,
> telling them that the doctype is not conformant -- or to put it
> into language that might more clearly mean something to them --
> that the document is using something that's no longer conformant
> because it's been "deprecated".
> 
> Do we really want to build a special exception into the spec just
> to prevent that special set of developers from seeing that
> message? (which is effectively just a warning message)

As far as I can tell, the author would need to know that they have to 
override the validator's default setting, otherwise the document would 
not be validated as HTML5, right?

So, from that point of view, a "processing instruction" for the 
validator (for instance in form of an HTML comment) could work as well.

> Thinking in particular about the case of Java developers and
> speaking anecdotally from my own experience: Every time I upgrade
> my JDK and try to compile some existing Java code I have, I seem
> get gobs of new messages from the compiler that I'd never seen
> before, warning me about deprecated stuff. I've learned to just
> ignore those and to wait to deal with them if/when in some future
> JDK upgrade they actually cause compile errors instead of just
> warnings. I would suspect that most real Java developers are a lot
> more accustomed to seeing those than I am, and would think that
> unless they like have lots of extra time to spend, they're not
> actually going back to rewrite all their works-just-fine-as-is-
> despite-all-the-warnings code just to cause those warning messages
> to be suppressed.

And that's a very bad idea. Believe me. I'm currently working on a 
project that can't upgrade to JDK 5 because the developers chose to 
ignore the warnings about using "enum" in a package name.

> To get back to HTML5 and conformance checking, I would hope that
> we are not aiming to work toward a goal of an HTML conformance
> check be that the author gets a pat-on-the-back sense that their
> HTML documents are perfect. What I mean is, if authors/developers
> who are generating output from stock XSLT engines -- who would
> seem to me to be fairly savvy about knowing what kinds of
> error/warning messages they can safely ignore -- run a conformance
> check on their documents and see a message saying that the doctype
> is not conformant, I would really wonder what degree of
> frustration or trouble that's really going to cause them and what
> if anything we should do to try to avoid it.

The issue with validation warnings is that some people are more serious 
about them than others. For instance, what do you do when your customer 
(say, the government) has a rule that every page must validate with zero 
messages? This is an uphill battle you're not going to win, so there 
will be cases where the developer then will have to switch back to HTML 
4.01.

(related to that: I'm very unhappy to get a warning when I have an 
encoding of ISO-8859-1. I *do* know the difference to win-whatever, and 
when I say "ISO-8859-1" that's what I mean)

> An HTML5 conformance checker will not be handing out gold stars to
> anybody anyway -- no "Valid HTML5" badges for anybody to beat
> others over the head with. So warning those authors/developers
> about the case of a non-conforming doctype that they are probably
> already aware is not conforming, I just really wonder how far we
> should go in adjusting the spec to prevent those developers from
> seeing that warning -- given that if they are doing XSLT
> development, many or most of them would know enough to realize
> they can just ignore it, and maybe what we'd really be left with
> it trying to please the obsessive/compulsive/perfectionist ones --
> the ones that just can't tolerate running a compile/lint/
> conformance check and seeing any warnings at all. God help us
> if we make it a goal to try to please that class of developers.

Well, I sort-of disagree. Once people start ignoring warnings, they'll 
ignore other warnings as well. And I think we should try to avoid that 
effect.

BR, Julian
Received on Thursday, 4 September 2008 14:19:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:23 GMT