- From: <Michael.Goulish@SoftwareAG-USA.com>
- Date: Tue, 23 May 2000 11:57:11 -0400
- To: xml-editor@w3.org
- Cc: Mike.Champion@SoftwareAG-USA.com
- Message-ID: <B48FCF558294D311ADD90080C8FAF3F85064AE@sunshine.ptg.sagus.com>
Greetings to the XML-Editor!
I recently implemented a parser for the full
XML grammar in C. I may be unusual in that I
had no experience in XML when I started this
project, but over 15 years experience as a
full-time programmer and before that an MS
in computer science.
I thought you might be interested to hear about
which parts of the XML 1.0 spec confused me the
most. (I reserve the right to find other parts
confusing in the future.)
1. Not all the productions belong to the grammar.
-------------------------------------------------
In my world, grammars have a single start symbol.
If you represent a grammar as a tree, you *always*
see a connected tree. That means you can start
with the start symbol and, through some series of
steps, reach any other symbol in the grammar.
Any symbol that's not reachable in this way can
be (and should be) discarded.
Starting from production "[1] document" I believe
that the following symbols are unreachable in the
XML 1.0 grammar:
[6] Names
[8] Nmtokens
[30] extSubset
[33] LanguageID
[78] extParsedEnt
[79] extPE
I believe that, if the errata are taken into account
(and they should be rolled into the main document
instantaneously) then all of these productions are
used at least in Validity Constraints. But then --
they're not part of the grammar in the same sense
that the other productions are, and as their membership
in the numbering scheme would seem to imply.
It's odd and confusing to not be able to understand
the grammar on at least a purely syntactic level without
reading the accompanying prose.
I would like to see unreachable symbols clearly marked
in some way -- perhaps given a different numbering scheme
to show that they are not part of the "main" grammar in
the same way as other productions are. Maybe like
VC-1, VC-2, etc.
2. There is no number 2.
----------------------------
( I guess I'll limit this to my main point for now.
Maybe more later. )
Thanks very much for your attention, and I'd
be very interested to hear your thoughts --
-------------------------------- Mick .
Received on Tuesday, 23 May 2000 11:56:59 UTC