- From: David Carlisle <davidc@nag.co.uk>
- Date: Mon, 05 Mar 2012 16:42:56 +0000
- To: public-xml-er@w3.org
- Message-ID: <4F54ED10.90801@nag.co.uk>
On 05/03/2012 09:37, George Cristian Bina wrote: > Hi, > > I think that it will be easier to get a first form finalized if we > focus on the browsers usecase, and that means mainly getting from > not well-formed XML to a DOM. > Hmm, over the weekend I'd experimented with what you'd need to do to the current draft to change tag name state to check xml names as suggested at the start of this thread. The result is attached, it proved a useful exercise (to me at least:-) whether or not the group decides to go this way, as it forced me to review the current draft states rather more carefully. As the attached isn't tested by code it's almost certainly wrong in parts but I attach it as some may find the comparison useful. (If anyone does think it a route worth exploring we should probably check it in to the source control but that's probably premature at this stage. Comparing the main approaches there are some differences as to how to tokenise <foo<bar (as two tokens or as one with a weird name), that difference could be made anyway and relates mainly to how close we want to stick to html5. The main difference is that this version stops scanning for an element name when it gets to a non-Name character. This implies a cost during tokenisation as (a) the xml-er system has to have a list (or specification of the code ranges) of the Name Characters and (b) it has to check the input stream against them. I still think that XML parsing shows that neither of these costs are prohibitive, and if we were to insist that an xml-er system had a way to serialise its tree to well formed XML, the same costs re-appear but just in a different (admittedly less used) place. In this version it only checks for Name rather than NCName ":" NCName, ie XML rules rather than XML Namespace rules, so as Mohamed just pointed out George's example would be OK with this, however presumably a similar example could be constructed in which the name was not well formed XML at all. _if_ the consensus is that we should just target DOM, I think that's a shame as it severely restricts the ability to position xml-er as an "error recovery xml parser" as it wouldn't be usable in most places xml parsers are used, however it would be usable on the web and as such I would think that just doing what html5 does as far as tokenisation would gain a lot more relevance so I'd argue in that case we stick very closely to Anne's current draft. David ________________________________________________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. ________________________________________________________________________
Attachments
- text/html attachment: Overview.dpc.html
Received on Monday, 5 March 2012 16:43:28 UTC