Re: Precision and error handling (was URL work in HTML 5) from Eric J. Bowman on 2012-10-09 (www-tag@w3.org from October 2012)

From: Eric J. Bowman <eric@bisonsystems.net>
Date: Tue, 9 Oct 2012 13:06:37 -0600
To: Karl Dubost <karld@opera.com>
Cc: W3C TAG <www-tag@w3.org>
Message-Id: <20121009130637.cc0101c5.eric@bisonsystems.net>

Karl Dubost wrote:
>
> The fact that we can do with one tool doesn't mean it is the right
> tool for doing it.
> 

Or, maybe my working code example shows XSLT is an *ideal* tool for
processing XHTML.  Surely better than Javascript, in my experience.
I'm hardly the only developer using these methods, so trying to
handwave them away as some sort of aberration or mistake is, at best,
avoiding the issue -- that many of us see no place for the HTML parser
in our toolchains, even if our system output is meant for browser
consumption.

> 
> Somehow using XSLT (processing XML) for organizing HTML can do the
> job but has never been the right tool for doing the job.
>

Says who?  I've been using XSLT to process XHTML for 12 years now.  If
it were the wrong tool as you claim, that would make me a masochist,
when the truth is I'm so lazy I just use the simplest thing that makes
sense and *works*, all "you're doing it wrong" trolling aside.

>
> It is exactly like using regex for parsing HTML. It might work in
> some circumstances but it will fail often.
>

Have you ever even *used* XSLT to process XHTML?  Because I'm having a
difficult time believing that statement came from an expert.  I've
certainly found it works better to use a tool that groks elements/
attributes, over having to code spaghetti regex to determine same before
processing.

If I'm only allowing <i>, <b>, <em>, and <strong> as input, with no
attributes or other elements allowed, and want <i> transformed to <em>
and <b> to <strong>, the amount of XSLT code required works out to
fewer characters than this sentence took to type, and works as expected
100% of the time, with the same code working on the client and/or the
server (less maintenance that way).

To say otherwise is to propagate FUD, IMO, but it's what I've come to
expect from WHATWG folks.  The dismissive attitude that any approach
which deviates from what the browser vendors want is wrong and therefore
irrelevant, may be what's best for your companies, but it is a
DISSERVICE to the larger community.  HTML/HTTP?  You didn't build that.
XML?  Gets real work done for many developers regardless of how despised
it is in some quarters.  XSLT included.

How I would accomplish this same goal with an HTML parser eludes me.
The only rules in play for my purpose are the syntax and semantics of
the language -- not any processing rules.  Thus, my point remains:
there are uses of HTML which have nothing to do with its parsing rules,
therefore it's a mistake to tie text/html to those parsing rules, rather
than the more-generally-reusable syntax and semantics of the language.

Is HTML really only meant for browsers to use?  Has the advent of the
HTML 5 parser really obsoleted all other methods of processing HTML?
If the answers to those two questions are "no" then the media type
should tie to the author document.

-Eric

Received on Tuesday, 9 October 2012 19:07:03 UTC