[Bug 5321] REs are not production nonterminals

http://www.w3.org/Bugs/Public/show_bug.cgi?id=5321

           Summary: REs are not production nonterminals
           Product: XML Schema
           Version: 1.0/1.1 both
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Datatypes: XSD Part 2
        AssignedTo: cmsmcq@w3.org
        ReportedBy: davep@iit.edu
         QAContact: www-xml-schema-comments@w3.org


1.  It appears that in the productions defining REs in Appendix G, we use REs
(usually character classes) as though they were nonterminals.  An example is
the production for normal characters:

  Char ::= [^.\?*+{}()|#x5B#x5D]

In productions, normally each nonterminal is the LHS of a production, and each
terminal is a character string denoting itself.  An RE other than a single
character string denoting itself is neither.

In the appendix, terminals are quoted strings and nonterminals are names linked
to their defining production.  These neither-fish-nor-fowl REs are displayed as
unquoted strings.  Perhaps they could be hyperlinked to a paragraph describing
this modification to the standard production system.  (But can the necessary
productions for character classes be made without circularity?  That may need
some thought)

2.  Similarly, "#-escapes" representing characters via their Unicode code
numbers are not normally allowed in our REs--at least I can't find anything
that allows them.  Nor can I find anything that makes an exception for
REs-that-are-nonterminals-in-productions.  At least a note, and some kind of
special treatment within the RE seems appropriate.  (Actually, I wish that the
codes were explained in a text note near each use of such codes; I suspect that
I'm not the only reader who doesn't have the codes memorized.  Perhaps the
special treatment could be a hyperlink to such an explanation.)

We do not currently define the production system we currently use.  If we
really want to have a non-standard production system which allows REs as
additional RHS components, we need to define it.
However, I think since we use the production system to define the REs, this
could get very circular unless we are both careful, and lucky that the
circularity can be avoided.

Expressing a small positive character class as an "or" of single characters is
easy enough.  But I'm not sure how to deal with a large character class, such
as the negative character class of the production quoted above.

Received on Saturday, 15 December 2007 20:29:46 UTC