- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 20 Oct 2008 12:34:21 -0400
- To: Ian Hickson <ian@hixie.ch>
- Cc: public-html-comments@w3.org
Ian: when we chatted at breakfast this morning at TPAC you asked that I
remind you of a couple of issues relating to the HTML 5 draft. I assume
it eases bug tracking if I send them in separate notes, so this is the
first of two.
The first concern we discussed is that the semantics of microsyntaxes like
signed integer [1] are a) unduly burried in the imperative parsing rules
and b) thus at some risk of not making it into any authoring
specification. I suspect that's enough to remind you of the concern, but
for the benefit of readers who weren't with us at breakfast, here's the
same thing in more detail:
The declarative part of the explanation of signed integer says:
"A string is a valid integer if it consists of one of more characters in
the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), optionally
prefixed with a U+002D HYPHEN-MINUS ("-") character."
Sympathetic readers will obviously infer that the characters "123" in fact
refer to the number one hundred-twenty three, but nothing in the above
says that. If I were to claim that these characters represented the
number "three hundred twenty-one" you couldn't prove me wrong from the
above.
Now, immediately following the above are a set of step-by-step parsing
rules, which implement what appears to be the logic of a function. The
last step says "If sign is "positive", return value, otherwise return
0-value", and indeed a sympathetic reader will understand that these rules
have indeed computed a result that defines the intended semantic to be
"one hundred twenty-three". So, the semantic is there in the parsing
rules, at least if you're willing to make the assumption that what's
referred to as the return value is in fact the intended semantic of the
string being parsed.
So, to reiterate the concern, now that the details have been set out:
a) There are probably clearer and simpler ways of conveying the intended
semantic than burying them in the parsing rules. Alternatives range from
informal "these strings have the obvious interpretation as integers, high
order digits on the left, etc., with '-' indicating negative numbers" to
more rigorous or even formal mappings using the appropriate polynomial.
I'm not here recommending which of the many options should be chosen, just
suggesting that burying the semantics in the parsing rules is suboptimal.
b) I believe the intention is to produce a specification for HTML 5
authors by, among other things, stripping out the parsing rules. There is
a risk that the resulting specification would lack any indication at all
of the intended interpretation of the strings.
I believe that similar comments would apply to many of the other
microsyntaxes, and perhaps in other parts of the specification as well.
Thank you.
Noah
[1] http://www.w3.org/html/wg/html5/#signed-integers
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Received on Monday, 20 October 2008 16:35:26 UTC