Re: Uncertainties in implementing WD-xml.html from Michael Sperberg-McQueen on 1997-03-22 (w3c-sgml-wg@w3.org from March 1997)

From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
Date: Sat, 22 Mar 97 17:01:04 CST
To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Message-Id: <199703222322.SAA24430@www10.w3.org>
On Fri, 21 Mar 1997 18:06:23 -0500 Peter Murray-Rust said:
>In message <3332D971.68F5@utila.ifi.uni-klu.ac.at> "Norbert H. MIKULA" writes:
>> As far as I can remember the ERB has initially decided
>> to change the syntax for comments to
>>
>> <--* ..... *-->
>>
>> (posted to the XML-WG : Wed, 15 Jan)
>>
>> But the torture.xml file of cmsmcq uses <!--* .. *-->

The original posting of the decision did have the form <--* ... *-->
but it was a typo, corrected shortly thereafter.

> ...
>> What is the current state of things ?
>
>I share Norbert's concern about uncertainties in the XML draft and feel
>that a number of us are 'stalled' at present due to one or more
>uncertainties in the spec.  (It may be that these are simple
>misconceptions, but they need tidying up.).  We agree that the mythical

Tim Bray and I are at work on a revised version of the draft spec
which will include fixes for the reported errors and the changes so
far agreed to, for the comment syntax (adding the asterisks) and
white-space handling.

>grad student can hack a parser in the mythical two weeks, but only if
>they have a clear spec to write to.  [My own position is that I want to

Right now, the standard disclaimers pasted on all W3C working drafts
remain appropriate:  the draft is subject to change.  If a given
version of the draft is clear and consistent with itself, I think
we're doing OK.  If implementors want currency more than consistency
and clarity, the decisions we've made have all been announced, and
the logs of the WG are the place to find the decisions.

>My understanding is that the productions (1-77) are consistent and can
>be used as the basis of a yacc-like approach (as NXP does, using JACC).
>So the first question is [see Norbert's query]:
>
>(a) are we agreed that (1-77) in WD-xml-961114 are the current version
>and that none are under revision at present?  (Until Norbert's question
>I had assumed that [21] (Comments) was correct).

They are the current version; the current version is always subject
to revision.  In fact, the only changes being made to the productions
are:

  - correction of reported errors (missing *, missing S, etc.)
  - implementation of the comment-syntax decision, to prepare XML
    for use of comment open- and close-delimiters when SGML gets
    them
  - incorporation of more of the constraints on entity replacement
    into the grammar (in particular parameter entities)
  - simplification of some productions using regular-expression
    subtraction.  (This is a notation Tim found in some orthodox
    CS treatment of regular expressions, so it's not just ad
    hockery:  (A - B) defines the language which contains all the
    members of A except those which are also members of B.  Comments
    can be defined as

    '<!--*' (Char* - (Char* '*-->' Char*)) '*-->

    which is, in a slightly more formal way, almost exactly what
    Lee suggested long ago:  anything can appear in a comment,
    but not the string '*-->'.  Translation from this notation into
    that of a specific regular expression engine may require close
    attention to detail ...

>(b) some parsing operations (e.g. entity replacement) are not described
>in the BNF and are sufficiently complex or insufficiently documented to
>give serious problems in implementation.  It would be valuable for these
>to be listed and the operations clearly defined (e.g. are comments
>processed before entity replacement? are nested entities allowed? etc.)

Since we'll add some things to the grammar to allow us to talk more
directly about parameter entities and how they are expanded and where
they are going to occur, I think we'll be able to say very explicitly
when general entities are / may be / must be expanded.  I don't think we
should constrain the order of comment handling beyond what is logically
implied by the constraints given in the section on logical and physical
structure.

>(c) some ancillary constructs (e.g. CATALOG) are widely held to be part
>of XML (or likely to be part of XML).  They are probably not too
>difficult to implement if certain processes (e.g. resolution of FPIs)
>are not exhaustively defined.

Catalogs are not now part of XML.  Whether they are likely to become
part of XML is something every implementor must judge individually.

I hope this helps.  I see Tim has already replied, but I will send
this anyway in case having the same message in a second formulation
proves helpful.

-C. M. Sperberg-McQueen
Received on Saturday, 22 March 1997 18:22:51 UTC