- From: Kent M Pitman <kmp@harlequin.com>
- Date: Sat, 25 Apr 98 01:44:14 EDT
- To: xml-editor@w3.org
- Cc: kmp@harlequin.com
In some places, for example, the part between the brackets of a doctypedecl --and I can't tell you how annoyed I get that these things don't have names, but I'll come back to that at the end--you permit PEReference as if this were enough to say. To me, it is not. Can a PEReference expand into more than one markupdecl in that context? In general, how much can a PEReference expand into? If you're writing a parser, and you're at the '[' part, the next thing you do is to enter a loop parsing things. Well, hmm. The loop looks like on each iteration it might find "nothing" (S), a PEReference, or a markup declaration. Now, could a PEReference become more than one markup declaration? To me, at this point in the code, I think the answer is NO. If it could, then why isn't the bnf: [a] ( ( markupdecl ( S markupdecl )* ) | S | PEReference )* to alert me that at any iteration of the outer loop I might end up with m ore than one markup declaration (the result of the PEReference). Or is it that the PEReference is not really part of the "syntax" but is simply enabled in this context as part of a low-level stream expansion. If so, then the right thing to say is: [b] ( markupdecl | PS )* where PS is a space that might contain a parameter entity that needs to be expanded. Surely if I were to parse this thing as an SGML editor would (not expanding the %foo;) it would be odd because the %foo; would occupy a place in the parse that was not syntactically appropriate. I'd get back a list of {{markupdecl} {S} {%foo;} {S}} and that would look ok but if I did a substitution of its value {{markupdecl} {S} {{markupdecl} {markupdecl}} {S}} that would not be appropriate to the BNF. And yet, the only constraints offered on PEReferences where they're defined says they have to start and stop in the same markup declaration as if they're perfectly well allowed to span multiple tokens. I definitely think that the conservative thing is that when there's a list like (foo | bar | PEReference) that the ONLY possible expansions of the PEReference should be "foo" and "bar". Otherwise, reshaping of the parse tree later is a problem. Nowhere that I've found is this constraint specified. Perhaps I'm just overlooking it? If not, perhaps it could be added. And if it's supposed to be the case that more than one markup declaration can appear, then I strongly encourage you to modify the bnf to accomodate the truth of the hair that a parser must really endure. In a sense, this appears to be a casualty of some last minute transition from the old %xxx notation in the XML drafts to something more like SGML. But SGML uses two different kinds of "S" (S and PS). It also uses the Ee (End of Entity) notation, which seems to be missing here. I can't help but think that that omission won't come back to haunt you, since without marking where Ee's can occur, there are also questions about where an entity can end--e.g., can it end mid-token. For example, I understand the reason %foo;%bar; cannot merge two tokens (e.g., if %FOO; turns to "foo" and %BAR; to "bar") forming a single token (e.g., "foobar") is that the Ee is a PS and so is a token separator. Without discussing this issue, and without including the SGML spec by reference (something I hope you'll try steadfastly to do, since requiring people to read the SGML spec to handle XML will put XML *way* out of reach of most people), the whole matter of PEReferences looks to be radically underconstrained. - - - - [Returning to an issue I alluded to up top:] Oh, and about that syntax for [28] doctypedecl. I really do hate things that are this complex without introducing additional names. It forces me to make up names in my hand-written parser, and it virtually assures my made-up names won't match anyone else's. And it makes it just plain hard to talk about the syntax. I think ALL languages, markup and programming, should be defined in such a way that conversation about them is made simple and practical. I feel as if this language goes to very little trouble to help in that regard. In particular, there is a LOT of talk all through the document about the internal and external DTD subset and yet when it comes to saying where those things are, they are VERY hard to find for the uninitiated. You look for them in the syntax rules, and they are nowhere manifest. I *assume* the external DTD subset is what is named by the optional ExternalID in a doctypedecl. Is it? Can you find the word "internal DTD subset" in bold somewhere in the spec where it is easy to see it is a defining reference? I can't. How about "external DTD subset"? Ditto. I'd have written: [28] doctypedecl ::= '<!DOCTYPE' S Name ExtDTDref? IntDTDref? S? '>' [28.1] ExtDTDref ::= S ExternalID [28.2] IntDTDsubset ::= S? '[' markupdecls* ']' [28.3] markupdecls ::= ( PS | markupdecl )* - - - - - DISCLAIMER: These are my personal feelings and not necessarily the official position of any company or organization that I may be affiliated with.
Received on Saturday, 25 April 1998 01:40:55 UTC