- From: David Carlisle <davidc@nag.co.uk>
- Date: Mon, 30 Apr 2012 09:43:08 +0100
- To: Larry Masinter <masinter@adobe.com>
- Cc: Robin Berjon <robin@berjon.com>, "Bjoern Hoehrmann (derhoermi@gmx.net)" <derhoermi@gmx.net>, "www-tag@w3.org" <www-tag@w3.org>
On 30/04/2012 03:54, Larry Masinter wrote: [not sure why this was cc'ed to me rather than to xml-er list, but anyway....] > Since we're talking about XML-ER. I can't tell from looking at the doc > at all how XML-ER deals with unclosed tags. The only tricky/contentious part of the current xml-er draft is deciding what a tag is. Once you have that then the handling of unclosed tags is fairly trivial, (and the same as html apart from the html parsers built in special handling of certain element names. When you reach a close tag you just close all elements on the stack until you reach an element of the right name (or you ignore the close tag if there is no such element, more or less: the devil is on the details, which are in the draft spec. > So I'll call what is desirable about XML is "self-delimiting" rather than > "framing", but it's the same idea: if you're looking for<x> elements, > can you just do a simple string scan for<x> before kicking in a more > complicated parser. (OK, maybe also you have to scan for<x> OR > entity declarations.) There are so many caveats that that is at best only just true. You also have to look for <x > and you have to skip over CDATA sections and comments and processing instructions. Not to mention the black hole of needing to know what character encoding the document is using. > Self-delimiting is clearly something HTML **doesn't have**, since > you can't tell whether in<x><y> whether<y> is a sibling or > child of<x> without knowing something about<x> and<y> and > their relationship. xml-er parsing in the current draft has no knowledge of any particular schema so no predefined list of empty/void elements. so <x><y> (if that is the complete document) parses as <x><y/></x> as they are parsed as open tags and the stack of open elements is closed off at eof. David ________________________________________________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. ________________________________________________________________________
Received on Monday, 30 April 2012 08:43:34 UTC