Re: Comments on HTML WG face to face meetings in France Oct 08 from Anne van Kesteren on 2009-01-21 (www-tag@w3.org from January 2009)

From: Anne van Kesteren <annevk@opera.com>
Date: Wed, 21 Jan 2009 10:40:49 +0100
To: noah_mendelsohn@us.ibm.com, elharo@metalab.unc.edu
Cc: www-tag <www-tag@w3.org>
Message-ID: <op.un3o6bh664w2qv@annevk-t60.oslo.opera.com>

On Tue, 20 Jan 2009 21:17:12 +0100, <noah_mendelsohn@us.ibm.com> wrote:
> Consider, though, a different use case, in which some of the same XHMTL
> documents are to be stored in an XML database and their attributes and
> other data used as the subjects of queries.  Now you have in intersting
> tension.  The database will presumably deal only with well formed XML
> documents, which means that the messier content that browsers deal with
> won't work in the database, at least not in the obvious way.  On the  
> other hand, the positive value of the layering becomes a bit clearer.   
> The XML
> specification describes the subset of the documents that will work in
> tools like the XML database.  Conforming XML parsers will accept those
> documents and reject others (though, as Elliotte points out, nothing
> prevents those parsers from handing the input text up to a browser, that
> may still decide to render it.)

Why can the author not use an HTML parser for the database? Henri Sivonen  
e.g. has written tools for parsing HTML in Java that conform to the HTML5  
specification (means that you get the same tree as browsers get) that plug  
directly into an XML toolchain if desired so you can use XSLT etc.

It seems to me that solving a toolchain problem is much better solved on  
the toolchain level than in the format which is used in the database and  
published on the Web.

-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>

Received on Wednesday, 21 January 2009 09:41:44 UTC