W3C home > Mailing lists > Public > public-html@w3.org > May 2009

Re: XMLLiteral handling in RDFa in HTML

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 27 May 2009 18:12:47 +0000 (UTC)
To: Manu Sporny +ADw-msporny+AEA-digitalbazaar.com+AD4
Cc: Toby Inkster +ADw-tai+AEA-g5n.co.uk+AD4, RDFa mailing list +ADw-public-rdf-in-xhtml-tf+AEA-w3.org+AD4, HTMLWG WG +ADw-public-html+AEA-w3.org+AD4
Message-ID: <Pine.LNX.4.62.0905271807470.10857@hixie.dreamhostps.com>
On Tue, 26 May 2009, Manu Sporny wrote:
> 
> I don't believe that there is any such thing as an malformed XMLLiteral 
> in HTML5... is there? Can anybody think of an example of an invalid 
> XMLLiteral in an html5 parser?

If you're asking if an HTML5 parser can generate a DOM that cannot be 
serialised as XML, the answer is yes, there are a number of ways to do 
this. The easiest way is for the text/html source to have an element or 
attribute with a colon in it, as in:

   <html foo:bar>

Another possibility is a comment with two consecutive dashes:

   <!-- -- -->

Another example would be a form feed character (U+000C). For example, if 
a plain text RFC is parsed as text/html, the resulting DOM would contain 
U+000C characters that cannot be converted to XML.

If scripts have been able to mutate the DOM, there are even more ways for 
the DOM to not be serialisable.

These issues are discussed in two places in HTML5. One is the rules for 
coercing an HTML DOM to an Infoset:

   http://www.whatwg.org/specs/web-apps/current-work/#coercing-an-html-dom-into-an-infoset

The other is the rules for serialising a DOM to XML:

   http://www.whatwg.org/specs/web-apps/current-work/#serializing-xhtml-fragments

HTH,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 27 May 2009 18:13:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:37 GMT