- From: David Carlisle <davidc@nag.co.uk>
- Date: Wed, 29 Feb 2012 13:34:35 +0000
- To: "public-xml-er@w3.org Community Group" <public-xml-er@w3.org>
Jeni said > And to be specific, my suggestion is that when in the Tag name state > [2], if the next character is< then this is a Parse Error, and the > parser emits the current token and reprocesses the current input > character (<) in the data state. _If_ we are going to differ from HTML5 at this point I think I would go further. We have a hard requirement I think that any tree have a serialisation as namespace well formed XML. If we tokenise a start tag at this point that isn't a legal XML name then inevitably there will have to be some arbitrary character mangling leading to names such as oneU00003CtwoU00003CthreeU00003C How would it work if we split up tag name state into a series of states so the only characters accepted are name start optional name - : optional : name start optional name - : ie only namespace well formed names are accepted. using the XML1.1/XML1.0-5thed definitions of Name Start and Name characters. In each of these states, if a non-name character is seen it is put back and reprocessed in data state. If that happens on the first character, the < is put back as data and no tag is tokenised at all. And same for attribute names of course. David ________________________________________________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. ________________________________________________________________________
Received on Wednesday, 29 February 2012 13:35:07 UTC