Re: XML namespaces on the Web from Simon Pieters on 2009-11-19 (public-xml-core-wg@w3.org from November 2009)

From: Simon Pieters <simonp@opera.com>
Date: Thu, 19 Nov 2009 09:50:47 +0100
To: "John Cowan" <cowan@ccil.org>
Cc: "Lachlan Hunt" <lachlan.hunt@lachy.id.au>, "Liam Quin" <liam@w3.org>, public-html@w3.org, public-xml-core-wg@w3.org
Message-ID: <op.u3mv6xppidj3kv@simon-pieterss-macbook.local>

On Thu, 19 Nov 2009 09:21:14 +0100, John Cowan <cowan@ccil.org> wrote:

> Simon Pieters scripsit:
>
>> Why would one need to reverse engineer an XML parser? It is defined in  
>> XML
>> 1.0 what is an error, so one can just read the XML 1.0 spec and modify  
>> the
>> XML5 algorithm accordingly.
>
> Sure, it's possible, but it's about equivalent in complexity to writing
> a parser, which has already been done repeatedly.

Yes.


> Wake me up when
> it's finished.

Ok.


>> It's not clear to me that that is a goal. It would be possible by making
>> up a bogus root element, but that seems just bogus. :-)
>
> Fair enough, but then there needs to be some kind of restriction on what
> documents can and cannot be repaired.
>
>> I see "DOCTYPE internal subset state" and in total 38 tokenizer states
>> dedicated to handling the internal subset in
>> http://xml5.googlecode.com/svn/trunk/specification/Overview.html
>
> Yes, it skips the internal subset all right, but there's no indication
> that it uses the information to, for example, correctly implement
> attribute value normalization.  Whitespace characters are added to
> attribute values just like any other characters.

It seems handling entities is covered but handling attribute declarations  
is not done yet, but is intended to be covered since it defines a "list of  
attribute declarations" and the relevant tokenizer states have issue  
markers.

-- 
Simon Pieters
Opera Software

Received on Thursday, 19 November 2009 08:51:52 UTC