Uncertainties in implementing WD-xml.html

In message <3332D971.68F5@utila.ifi.uni-klu.ac.at> "Norbert H. MIKULA" writes:
> As far as I can remember the ERB has initially decided
> to change the syntax for comments to 
> <--* ..... *-->
> (posted to the XML-WG : Wed, 15 Jan)
> But the torture.xml file of cmsmcq uses <!--* .. *-->
> also during the discussion about the appropriate
> regexp people used both alternatives.
> What is the current state of things ?

I share Norbert's concern about uncertainties in the XML draft and feel that
a number of us are 'stalled' at present due to one or more uncertainties in 
the spec.  (It may be that these are simple misconceptions, but they need
tidying up.).  We agree that the mythical grad student can hack a parser in the
mythical two weeks, but only if they have a clear spec to write to.  [My own
position is that I want to extend JUMBO to read any WF XML file and intend to 
do this on top of another parser, and I'd like to do this before WWW6 - 
otherwise it can't be said to be an 'XML browser/editor'.]

My understanding is that the productions (1-77) are consistent and can be
used as the basis of a yacc-like approach (as NXP does, using JACC).  So the
first question is [see Norbert's query]:

(a) are we agreed that (1-77) in WD-xml-961114 are the current version and
that none are under revision at present?  (Until Norbert's question I had
assumed that [21] (Comments) was correct).

(b) some parsing operations (e.g. entity replacement) are not described in
the BNF and are sufficiently complex or insufficiently documented to give
serious problems in implementation.  It would be valuable for these to be
listed and the operations clearly defined (e.g. are comments processed before
entity replacement? are nested entities allowed? etc.)

(c) some ancillary constructs (e.g. CATALOG) are widely held to be part of
XML (or likely to be part of XML).  They are probably not too difficult to
implement if certain processes (e.g. resolution of FPIs) are not exhaustively

IMO it is more important to resolve this asap, than other aspects of developing
a parser.  The worst possible thing to happen at this stage is that developers
have sufficient uncertainty in the spec that there are different interpretations

Against normal practice I have crossposted this to xml-dev.  If the ERB feel 
this is mainly a matter of clarification, then a reply to xmp-dev would be 
adequate, but if (as I fear) some aspects of entity replacement are not
universally agreed, then I think they need to be resolved here.


Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences