Re: ISSUE-54 (html5-doctype-vs-xslt): XSLT 1.0 can not generate HTML5 documents [HTML 5 spec] from Michael(tm) Smith on 2008-09-04 (public-html@w3.org from September 2008)

From: Michael(tm) Smith <mike@w3.org>
Date: Thu, 4 Sep 2008 22:40:37 +0900
To: Jirka Kosek <jirka@kosek.cz>, Julian Reschke <julian.reschke@gmx.de>
Cc: public-html@w3.org, Henri Sivonen <hsivonen@iki.fi>
Message-ID: <20080904134037.GD15529@toro.w3.mag.keio.ac.jp>
I noticed that so far there's not been any specific response to
the following part of one of Henri's messages in this thread -

Henri Sivonen <hsivonen@iki.fi>, 2008-08-28 13:08 +0300:

>  On Jul 5, 2008, at 00:44, Jirka Kosek wrote:
> > Of course there is second issue on which you really elaborate in your
> > email and this is how to extend some *future version* of XSLT language
> > and its implementation to support all bits of HTML5. I almost agree with
> > your analysis on this issue.
> 
>  The issues can be fixed without changing the XSLT language. I released 
>  version 1.1.0 of the Validator.nu HTML Parser the other day. The package 
>  comes with a sample program that uses an unmodified XSLT engine (whatever 
>  you have set as the TrAX default) with an HTML5 parser and an HTML5 
>  serializer. There's running code for addressing the issues *today*.
>  http://about.validator.nu/htmlparser/

I suspect that if you object to it, the response is likely going
to be that, as great as having something like that is, using that
or something similar instead of just using a stock/off-the-shelf
XSLT engine is something that creates and additional burden or
hurdle for developers.

If so, I guess what I would wonder is how you would weigh those
developer concerns/costs against those of casual authors. What I
mean is, if we restrict the spec to only allowing <!doctype html>
as a conformant HTML5 doctype, then we have something very simple
for new authors to learn and for HTML/Web-authoring teachers to
teach: You must include a <!DOCTYPE HTML> string at the beginning
your HTML documents, and it must look just like that -- with just
the word "doctype" followed by the word "html".

And the teacher -- if he or she wants to try to rationalize it for
the students without needing to go on at all about the whole
quirks-mode FUBAR mess that forces us to require the doctype at
all -- might make a reasonable case that the doctype actually has
some small amount of meaning ("it's just a way of asserting that
the document is meant to be conformant HTML", or whatever).

On the other hand, if we want to make things easier for those
developers who are using XSLT (or some XSLT-related thing like
what Julian has described) as part of their document-generating
toolchains, BUT -- because of limitations in their development
environments or just because of their own choice -- are limited to
only using output from stock XSLT engines without any
post-processing...

...then to make thing easier for them, we allow a doctype in some
form that includes the meaningless-in-this-context-but-required
word "PUBLIC" after the word "HTML", followed by some other
meaningless-but-required quotation marks, or with those quotation
marks and some other string inside them that should in this
context be even more completely meaningless, by design.

So now all those new authors would have to learn -- and their
teachers would have to teach -- that, well, things are a bit more
complicated than just <!DOCTYPE html> because, for certain cases
that they really are not likely to have any good understanding of
at the time they first learn it, they need to know that the
doctype can optionally also be in the form
<!DOCTYPE HTML PUBLIC "FUBAR"> (or whatever). And they also need
to know that they should never actually use a doctype in that form
if they are using the normal kinds of authoring tools that they're
likely to be learning with...

This seems like a case where we really should be carefully
considering our "Priority of Constituencies" design principle
("costs or difficulties to the user should be given more weight
than costs to authors; which in turn should be given more weight
than costs to implementors..."), and really looking carefully at
who we want to put the costs on in this case.

What we have should we make <!doctype html> the only conformant
doctype is]: If developers use stock XSLT engines to generate their
output and If they try to validate that output using an HTML5
conformance checker, they are going to get one message, one time,
telling them that the doctype is not conformant -- or to put it
into language that might more clearly mean something to them --
that the document is using something that's no longer conformant
because it's been "deprecated". 

Do we really want to build a special exception into the spec just
to prevent that special set of developers from seeing that
message? (which is effectively just a warning message)

Thinking in particular about the case of Java developers and
speaking anecdotally from my own experience: Every time I upgrade
my JDK and try to compile some existing Java code I have, I seem
get gobs of new messages from the compiler that I'd never seen
before, warning me about deprecated stuff. I've learned to just
ignore those and to wait to deal with them if/when in some future
JDK upgrade they actually cause compile errors instead of just
warnings. I would suspect that most real Java developers are a lot
more accustomed to seeing those than I am, and would think that
unless they like have lots of extra time to spend, they're not
actually going back to rewrite all their works-just-fine-as-is-
despite-all-the-warnings code just to cause those warning messages
to be suppressed.

To get back to HTML5 and conformance checking, I would hope that
we are not aiming to work toward a goal of an HTML conformance
check be that the author gets a pat-on-the-back sense that their
HTML documents are perfect. What I mean is, if authors/developers
who are generating output from stock XSLT engines -- who would
seem to me to be fairly savvy about knowing what kinds of
error/warning messages they can safely ignore -- run a conformance
check on their documents and see a message saying that the doctype
is not conformant, I would really wonder what degree of
frustration or trouble that's really going to cause them and what
if anything we should do to try to avoid it.

An HTML5 conformance checker will not be handing out gold stars to
anybody anyway -- no "Valid HTML5" badges for anybody to beat
others over the head with. So warning those authors/developers
about the case of a non-conforming doctype that they are probably
already aware is not conforming, I just really wonder how far we
should go in adjusting the spec to prevent those developers from
seeing that warning -- given that if they are doing XSLT
development, many or most of them would know enough to realize
they can just ignore it, and maybe what we'd really be left with
it trying to please the obsessive/compulsive/perfectionist ones --
the ones that just can't tolerate running a compile/lint/
conformance check and seeing any warnings at all. God help us
if we make it a goal to try to please that class of developers.

   --Mike

-- 
Michael(tm) Smith
http://people.w3.org/mike/
Received on Thursday, 4 September 2008 13:41:16 UTC