DOCTYPE processing (and more)

On Mon, 20 Feb 2012 23:30:14 +0100, David Carlisle <davidc@nag.co.uk>  
wrote:
> On 20/02/2012 15:01, Anne van Kesteren wrote:
>> http://dvcs.w3.org/hg/xml-er/raw-file/tip/Overview.html
>
> * Every use of <!DOCTYPE is a parse error
> (No comment on whether this is good or bad, just checking if I'm reading
> your grammar correctly)

I suggest not reading too much into the various things called out as  
"parse error" for now. This was just an idea to steer developers away from  
them as DOCTYPEs are somewhat nasty and do not really provide much  
benefit. I can remove the various "parse error" indications for now so we  
can introduce them as a group if people think that is better.


> * There is no checking of element and attribute names where any unicode
> characters other than the minimum syntax characters of white space and
> <=/ etc are allowed.

Yeah, that and possibly coercion are still to be defined. I think the  
appropriate place for that is not tokenization however, but when you  
actually start creating elements:

http://dvcs.w3.org/hg/xml-er/raw-file/tip/Overview.html#create-an-element-for-the-token


> Just an observation that this means that the result of the parse doesn't
> map well to the XDM or infoset models and so wouldn't work
> with xpath or other XML technologies. (That doesn't necessarily mean
> that this choice is wrong, as there could be a later mapping to XDM
> fixing up illegal names, just flagging this).)

Since that is not actually defined, I think this is a premature conclusion.


> * doctype internal subset parsing still takes up a depressingly large
> part of the grammar, it would be nice if some of that could be lost
> (which would have some cost in compatibility with xml) especially in
> light of the first item above:-)

It currently is as simple as it can be (as far as I can tell) while still  
retaining compatibility with XML. If you can think of ways of simplifying  
it, please tell!


-- 
Anne van Kesteren
http://annevankesteren.nl/

Received on Wednesday, 22 February 2012 14:19:37 UTC