Re: A few comments on the draft

Some comments on Gavins' comments (sorry) and some new comments.

> Production 1. 
>    This group does not appear complete.
> Character classes are best defined elsewhere anyway.   

Perhaps the best way to do this would be to use the
Posix regexp notation and say:
    S ::= [:space:]
and then in the character class section define exactly
which code points map to space.

An even better way would be to remove S entirely, and to
explain that the following sequences are self-delimiting and
are thus recognised whether or not surrounded by spaces:
    <!--  (in XML this starts a comment)
the remaining tokens being determined by their respective modes --
for example, a string starts with " and continues to a matching "
irrespective of whitespace.

The productions would then all become much simpler, as they would be
in terms of a sequence of tokens.

I can't check the ISO 10646 code points here, but assume several
people have done so.

I am not sure I follow
    Literal data is any quoted string containing neither a left
    ankle bracket nor the quotation mark used as a delimiter
Are you forbidding an unquoted < within an attribute value?

The requirement for a root seems to preclude forests.
We've found forests to be very useful, especially in the Canadian Winter :-)

In 2.4,
    Most processors will require the more complex grammar [...]
I think it might be more helpful to say why --- e.g.:
Any non-trivial application is likely to require...

In 2.5
    For compatibility
is meaningless unless you say with _what_...  I realise the SGML world is
in fact quite ashamed of the SGML syntax (even though the ideas are good),
but I think you should say
    For compatibility with SGML
if that's what you mean.

If you want compatibility with the majority of HTML browsers in use today
(and probably for the next year or two), you would also need to
forbid > within a comment.

    PI target... notation
You need a cross reference to the definition of "notation".
I think someone else already asked what "normally" means.
I take it to mean "in all cases except for the string "XML".

The application to which it belongs
    to which what belongs?  The PI or the target or the notation?

2.7 CDATA sections
should probably be called CDATA Marked Sections, so that other kinds of
marked section can be introduced in future versions of XML, should it
be so desired.

I particularly want ignored TMP NDATA mrked sections :-)

> Section 2.8: 
>    This section is really quite distasteful.
I agree.

The statement
    In elment content, all white space (S) is ignored
seems a little odd to me!
    <P>This is odd</P>
are the same?

Or is "element content" being used in the SGML sense?
If so, it must be defined before being used, and I would
strongly urge the use of italics or some other indication
that "element content" does not mean "element content" but
means "element context" (so to speak).

<P><!--* this is a comment *-->
is the same as
but if there is an SGML parser, we get

I don't think this will help interoperability.
All white space should be retained at the parser level in XML,
at least ouside of a DTD.  Inside a DTD I'd really hate it if a
parser included the S nonterminal in parse trees!

    The XML _document type declaration_ may include a pointer...
I think "pointer" is misleading here.  You don't mean a machine address,
for example, but rather some kind of logical pointer, and should say so.

I agree with Gavin that the PI hack sucks.

I can't accept that it is better than a fixed outermost element of XML
with attributes, and I don't accept that it is better than MIME headers.

It is not compatible with SGML or SGML tools, even though it is in some
semse legal SGML: it will not in itself allow an XML file to be read by
an SGML application, as the SGML application won't know how to switch
character sets based on the PI.

[34] RMDecl

default value of ALL... if neither internal nor external subset no
"visible" effect: what invisible effect is occurring?  It would be far
simpler to say that if there is no subset and no RMDecl, it defaults to
NONE, but that if a subset is given, it defaults to INTERNAL if an
internal subset only is given, and ALL otherwise.
It would be a good idea to have a value that meant
    parse the internal subset if there is one
    parse the external subset if there is one
    if not, default to NONE
as this would significantly ease document maintenance, I think.

The spec is nearly 30 pages, by the way -- time to simplify it!

> Production 38. The stuff about HTML really
> belongs in an appendix "Interoperability with
> HTML", possibly containing the variant HTML DTD's.

Agreed.  This section is greatly improved, by the way... :-)


Well, it's late & I'm hungry, so I have to go and forage for restaurants :-)

More later.  Hang on --

> Section 4.2.2 Seems a shame to limit SYSTEM ID's
> to URL's. The FSI backwayd compatability note
> seemed enough to allow them...

I don't understand this comment.  seemed enough to allow what?
URLs _are_ allowed.  It's a really bad idea to prefix them with <URL>,
as that way you can't treat the same file as containing filenames and
as containing URLs.  If SGML used only the same syntax everywhere, so
that FSIs were attributes on elements, we could use arch forms!