- From: <lee@sq.com>
- Date: Wed, 30 Oct 96 23:14:13 EST
- To: jenglish@crl.com, w3c-sgml-wg@w3.org
Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU> wrote: > Agreed unanimously: > - to use the string '/>' as the NET delimiter in the SGML > declaration of XML documents Joe English wrote: > I strongly urge the ERB to reconsider this. > > The /> NET trick is extremely fragile. Most current > SGML parsers cannot even implement it, and those that do > cannot enforce its proper usage. Agreed. Note also that allowing <e> means that either (1) every XML parser has to be able to cope with parsing a DTD, merely to deal with <e>, or (2) sending an XML file without a DTD is an error if it contains an element that was declared with content model EMPTY. However, an XML parser that doesn't handle a DTD won't be able to cope with this error easily -- it will simply assume containment, and get to the end of the document. Probably it will then close all open elements, maybe issue a "document may have been truncated' error, and cheerfully continue, with a *completely wrong* interpretation of the document. Or, if you prefer, with a dramatically different ESIS. This is unacceptable. I realise <@e> wasn't very popular. But it has neither of these failure cases, and, unlike the NET proposal, is robust against error -- it would be an error to put an @ in front of an element not declared EMPTY, because until SGML's syntax can be improved, the @ would be in the element declaration. Afterwards, it could be implied -- you'd set the EMPTY ELEMENT START TAG OPEN delimter to "<@". Is error checking really so hard to figure out? Ask yourself (a) what happens if I forget to use the special marker? (1) with <e/ or <e/>: If there's a DTD, and we're in an SGML application nothing. NET is always optional. If we're an XML application using NSGMLs or some other ESIS-providing system (but can't be SGMLS, as it won't let you do the NET trick I think), it cannot be detected, since ESIS doesn't include minimisation information. An XML-specific application convention might say that an element using the NET feature must declared EMPTY. In that case, when we reach the end of the document, we declare a parse error and exit. (see (c) for the no DTD case) (2) with @ If there's a DTD, we detect the error immmediately, with today's unmodified SGML. Note: although the @ is part of the element name (a kludge, I admit), it is not polluting the reference concrete namespace -- there cannot be any existing document types using the RFC in which GIs began with "@".] (3) with <e> If you put a </e> in the document, and there's a DTD, it's an error and is detected. (b) what happens if I use the special marker where I shouldn't? (1) with <e/ or <e/>: It's legal in SGML. People using SGML tools will get very confused. But maybe this will help discourage that practice, when XML tools become available. In XML with a DTD, you'll get an error. If the XML parser is just an SGML parser and uses ESIS, NET information is not available to it, and it is no longer an error (since it's legal in SGML). (2) with @ If there is a DTD (whether ESIS or no, whether XML or SGML) an error is immediately given at the correct point of the error. (3) with <e> Meaningless in this case. No error. (c) what happens without a DTD, when I don't make an error? (1) with <e/ or <e/>: With XML and the application convention, everything is OK. You can't use NSGMLS and ESIS to implement this, though, since the fact tht NET was used is not included in ESIS. (Using the grove mechanism is probably harder than writing a parser in C from scratch for most non-SGML programmers, I would guess) (2) with @ The @ is part of the element name (with unmodified SGML) and gets passed through to the ESIS and the application. Hence, this approach works in all cases. It might not work with SGMLS, as it's not pure reference concrete syntax any more. But it should work with NSGMLS out-of-the-box (yes? James?) (3) with <e> This doesn't work without a DTD> You're hosed. Broken. Back to the drawing board. We have two languages if this is allowed. Language (1) is only for documents with a DTD. Language (2), XML, does not have empty elements. Note that allowing both <e> and <e/ is harder to implement than only allowing one of them -- it's harder than either of them alone. And allowing <e> at all means that every XML parser has just got a lot harder to write. If you allow <e>, you might as well not bother with <e/>, as it doesn't save the parser writer anything (still has to deal with old-fashioned EMPTY elements). Worse, neither of the two mechanisms accepted by the ERB has good error handling characteristics (as Joe English pointed out and I have attempted to enumerate). Oh well. If we stick with this, I propose to implement _only_ the DTDless XML, with no support for EMPTY at all, for my own use. It'll be _much_ quicker to write the code. If my implementation gets spread around quickly enough, it will become the defacto standard. Later, maybe I'll add #include nd other neat features I like. Marc. :-) Lee
Received on Wednesday, 30 October 1996 23:15:06 UTC