W3C home > Mailing lists > Public > www-svg@w3.org > April 2009

SVGMobile12/types.html#DataTypeListOfString issue

From: Peter Moulder <Peter.Moulder@infotech.monash.edu.au>
Date: Wed, 08 Apr 2009 15:42:39 +1000
To: www-svg@w3.org
Message-id: <20090408054239.GA31922@bowman.infotech.monash.edu.au>
My reading of the textual description is that strings must be separated by one
or more whitespace characters, whereas the EBNF appears to allow zero
whitespace characters between strings.


I must admit that technically, the EBNF is already correct, insofar as the
following possible definitions of list-of-string all match the same inputs:

   string | string wspchar* list-of-string
   string | string wspchar+ list-of-string
   string

This is because string is defined simply as a sequence of zero or more char (without
quotation), where char can be almost any character including all wspchar characters.


However, if the EBNF is to be used not just for validation (specifying the set
of legal inputs) but for separating into component strings, then it is necessary
to change the definition to one of the following (according to whether empty strings
are allowed):

  spaceless-string | spaceless-string wspchar list-of-string
  nonempty-spaceless-string | nonempty-spaceless-string wspchar+ list-of-string

(The first allows empty strings but requires exactly one separating whitespace
character, while the second allows multiple whitepsace characters between
strings but requires non-empty strings.  Neither of the two allow any strings
that contain wspchar.)


I've used the name ‘wspchar’ in this message for clarity because the existing
definition of wsp in the definition of <list-of-strings> refers to the sequence
of whitespace characters rather than to a single whitespace character.
It would be nice if wsp were defined the same throughout the SVGMobile12 spec.
In the definition of <list-of-Ts> and in paths.html, wsp is defined as a single
whitespace character; in shapes.html it is defined as wspchar+, though the rest
of that page uses it as if it were a single character, e.g. writing ‘wsp+’.
I suggest that wsp be defined as a single character in each case.


The definition of <list-of-content-types> has much the same issues as
<list-of-strings>: The existing definition allows zero whitespace characters
between list elements, whereas in general [and typically] at least one is
necessary to know where one begins and another ends.  (Content-type is already
required to be non-empty (it must contain "/"), so it's fine to allow more than
one separator character between list elements.)  <content-type> parameter values
can be quoted strings that contain spaces.  The fact that they're quoted means
that the parsing is unambiguous, though it's worthwhile pointing out this
possibility so that programmers know that it's not in general safe to blindly
skip to the next wsp character.  The spec is ambiguous as to whether or not
the tokens making up a <content-type> can themselves be separated by
whitespace (or even RFC822-style comments); I believe parsing would still
be unambiguous/deterministic even if whitespace is used between tokens,
but it may be valuable to say something about the issue in the description of
<list-of-content-types>.


pjrm.
Received on Wednesday, 8 April 2009 05:43:19 GMT

This archive was generated by hypermail 2.3.1 : Friday, 8 March 2013 15:54:42 GMT