W3C home > Mailing lists > Public > whatwg@whatwg.org > December 2006

[whatwg] XML databases, XML syntax and HTML5

From: Elliotte Harold <elharo@metalab.unc.edu>
Date: Fri, 08 Dec 2006 19:54:37 -0500
Message-ID: <457A094D.4040702@metalab.unc.edu>
Alexey Feldgendler wrote:

> LiveJournal, a popular blogging service, inserts hand-authored content into hand-authored templates. While the templates are written by competent authors who (mostly) know how to write proper HTML, blog posts are most often written by people who barely learnt how to use a bunch of tags. LiveJournal makes some simple preprocessing (breaks paragraphs on newlines and strips dangerous markup like <script>) but otherwise leaves the content as is. That's why most blog pages on LiveJournal aren't even close to being valid HTML.

A week ago, I would have responded that LiveJournal should use TagSoup 
or equivalent to clean up the markup before serving it.

That's still true. However, after spending the last few days at XML 
2006, I have a new perspective on such systems I didn't have a week ago. 
In particular I now believe that the relational databases that back 
these sites are fundamentally the wrong technology. As Mark Logic's 
Jason Hunter put it, they're trying to force triangles into rectangle 
shaped holes.

I understand why relational databases were used to build blog engines 
and content management systems. For a long time that was all we had. 
However, that's going to change fast. I expect that new systems are 
going to be developed using pure and hybrid XML databases like Exist and 
  DB2 9. The advantages to a programmer working on such systems are just 
too compelling to ignore.

One consequence of building on top of native XML database rather than a 
relational database is that well-formedness is going to become more 
important, not less. In fact, well-formedness is going to become 
essential because these systems cannot store anything less than a fully 
well-formed XML document. I predict that this, if nothing else, is going 
to convince blog engines and content management systems to start fixing 
up malformed content before storing it. Maybe all the legacy systems 
won't convert, but the new ones most certainly will.

-- 
?Elliotte Rusty Harold  elharo at metalab.unc.edu
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Received on Friday, 8 December 2006 16:54:37 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:31 UTC