W3C home > Mailing lists > Public > whatwg@whatwg.org > December 2006

[whatwg] Drop-in parsers (was: Re: Provding Better Tools)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Wed, 6 Dec 2006 02:25:22 +0200
Message-ID: <DD8B4AE5-7249-45AF-BD92-B62F3ECBCE85@iki.fi>
On Dec 5, 2006, at 16:07, Thomas Broyer wrote:

> 2006/12/5, Mike Schinkel:
>> >> I've just started (today) a .NET implementation (in C#):
>> >> a parser as an XmlReader subclass and writers as XmlWriter
>> >> and HtmlTextWriter subclasses.

Cool! I think making the HTML5 implementations drop-in replacements  
for the normal XmlReader and XmlWriter implementations is an  
excellent approach. (However, some parts of the prescribed error  
correction may not be possible with truly streaming XmlReader, so for  
full flexibility and correctness it would be necessary to provide a  
true streaming mode with Draconian fatal errors on streaming- 
incompatible errors and a tree-buffering fake streaming mode with the  
streaming-incompatible errors handled in the buffered tree.)

Hopefully, my conformance checker efforts will, as a side effect,  
produce a parser written in Java that can be extended to cover  
general Java needs as a drop-in SAX/DOM/XOM-compatible parser. (The  
conformance checker only needs a true streaming SAX parser with the  
streaming-incompatible errors treated as fatal. I have a design  
beyond the conformance checking needs in my head, but I have many  
other competing action items to attend to, so please consider this  
vaporware. I can't promise anything.)

In general, I think HTML5 parser implementations should target the  
most important XML APIs for a given language. For Python, this would  
likely mean the Python flavor of SAX (again with partly-Draconian  
true streaming or buffering fake streaming), DOM and ElementTree. For  
Ruby, this would mean a REXML-compatible implementation. For C, it  
would make sense for an HTML5 parser to integrate into libxml2. I  
believe such a C implementation would eventually benefit PHP, too.

Of course, in all these cases, the element names should be reported  
in lower case unlike in browsers.

>> What license will you release under?
>
> Probably the MIT licence, I'm not sure yet...

+1

The known Python and Java projects also use the MIT license*.

If the goal is to drive adoption, the MIT license is great, because  
it is a Free Software license according to the FSF, an Open Source  
license according to OSI, Debian-approved (relevant even to C#  
because of Mono), GPL-compatible and suitable for embedding in  
proprietary products as well.

* http://www.opensource.org/licenses/mit-license.php

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/
Received on Tuesday, 5 December 2006 16:25:22 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:31 UTC