- From: Geoffrey Sneddon <foolistbar@googlemail.com>
- Date: Sat, 23 May 2009 13:57:32 +0100
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Philip Taylor <pjt47@cam.ac.uk>, Sam Ruby <rubys@intertwingly.net>, Shane McCarron <shane@aptest.com>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
On 23 May 2009, at 13:34, Julian Reschke wrote: >>>> For this to make sense in real HTML implementations, the >>>> definition should be in terms of the document layer rather than >>>> the byte layer. >>> >>> Disagreed. Many implementations never build a DOM. We're not only >>> talking about browsers here. >> By "DOM" I generally mean any kind of tree structure of elements >> and attributes, either as an explicit data structure (DOM, XOM, >> ElementTree) or implicit (SAX). Would any RDFa implementation *not* >> parse the input HTML into that kind of structure and operate over >> the elements and attributes as distinct objects? (e.g. would they >> just use regular expressions over the input byte stream? That seems >> quite infeasible to me...) > > Depends on the definition of "tree structure". I've been involved in > code that just uses a tokenizer and specialized stack, and > implementations like these will not do the re-arranging of elements > the HTML5 spec specifies for some kinds of broken input. Still specifying it relative to a DOM is still not problem, as you can incur the elements and text nodes from the token stream, until you reach the point where you are required by HTML 5 to throw a fatal error (i.e., when you can no longer parse per spec with the stream, as you can't reorder the elements). -- Geoffrey Sneddon <http://gsnedders.com/>
Received on Saturday, 23 May 2009 12:58:20 UTC