Re: HTML 4 Profile for RDFa from Geoffrey Sneddon on 2009-05-23 (public-rdfa@w3.org from May 2009)

From: Geoffrey Sneddon <foolistbar@googlemail.com>
Date: Sat, 23 May 2009 13:57:32 +0100
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Philip Taylor <pjt47@cam.ac.uk>, Sam Ruby <rubys@intertwingly.net>, Shane McCarron <shane@aptest.com>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Message-Id: <4ECC998B-CC9D-40D1-B465-BB56BBFB6191@googlemail.com>

On 23 May 2009, at 13:34, Julian Reschke wrote:

>>>> For this to make sense in real HTML implementations, the  
>>>> definition should be in terms of the document layer rather than  
>>>> the byte layer.
>>>
>>> Disagreed. Many implementations never build a DOM. We're not only  
>>> talking about browsers here.
>> By "DOM" I generally mean any kind of tree structure of elements  
>> and attributes, either as an explicit data structure (DOM, XOM,  
>> ElementTree) or implicit (SAX). Would any RDFa implementation *not*  
>> parse the input HTML into that kind of structure and operate over  
>> the elements and attributes as distinct objects? (e.g. would they  
>> just use regular expressions over the input byte stream? That seems  
>> quite infeasible to me...)
>
> Depends on the definition of "tree structure". I've been involved in  
> code that just uses a tokenizer and specialized stack, and  
> implementations like these will not do the re-arranging of elements  
> the HTML5 spec specifies for some kinds of broken input.

Still specifying it relative to a DOM is still not problem, as you can  
incur the elements and text nodes from the token stream, until you  
reach the point where you are required by HTML 5 to throw a fatal  
error (i.e., when you can no longer parse per spec with the stream, as  
you can't reorder the elements).


--
Geoffrey Sneddon
<http://gsnedders.com/>

Received on Saturday, 23 May 2009 12:58:21 UTC