W3C home > Mailing lists > Public > www-tag@w3.org > November 2006

Re: New TAG issue: TagSoupIntegration-54

From: Elliotte Harold <elharo@metalab.unc.edu>
Date: Wed, 01 Nov 2006 07:25:35 -0500
Message-ID: <4548923F.8070708@metalab.unc.edu>
To: www-tag@w3.org

Vincent.Quint@inrialpes.fr wrote:
> All,
> 
> On 24 October 2006, the TAG has accepted a new issue
> TagSoupIntegration-54:
> 
>   Is the indefinite persistence of 'tag soup' HTML consistent with a
>   sound architecture for the Web? If so, what changes, if any, to
>   fundamental Web technologies are necessary to integrate 'tag soup'
>   with SGML-valid HTML and well-formed XML?
> 
> It is now part of the TAG issues list. Refer to the list for more
> details and to track future progress:
> 

A straw man:

1. Tag soup isn't going away, no matter what we say.

2. Tag soup is a good idea in itself. It expands ease of authoring, and 
means readers do not encounter errors they are not responsible for and 
cannot fix.

3. However, error recovery causes browser interoperability problems. 
This is partially what XML's draconian error handling was designed to solve.

4. Well-formed, valid XHTML is very useful for machine processing, 
including JavaScript.

5. To resolve this conflict, we need a means of converting tag soup into 
valid XHTML that is invisible to a typical end user.


Here is what I propose:

The W3C define a process by which *any* arbitrary byte stream can be 
converted into valid XHTML, no matter what. This would act as a filter 
on incoming data. Browsers in normal operation would be expected to 
apply this filter before rendering a document, constructing a DOM, or 
doing pretty much anything else with a page that purports to be HTML. 
Other tools could use this as well.

This process must be fully determinate. That is two independent 
implementations must always produce the same XHTML version modulo 
insignificant details like white space inside tags or quotes around 
attribute values.

This must be both a specification *and* a normative reference 
implementation.

Fortunately we have at least one existence proof of such a product and 
it is called, obviously enough, TagSoup: 
http://home.ccil.org/~cowan/XML/tagsoup/

I understand there is also work along these lines going on in the HTML 5 
  community, though I am not intimately familiar with it.


-- 
´╗┐Elliotte Rusty Harold  elharo@metalab.unc.edu
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Received on Wednesday, 1 November 2006 12:26:02 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:42 GMT