- From: <noah_mendelsohn@us.ibm.com>
- Date: Mon, 20 Oct 2008 12:34:21 -0400
- To: Ian Hickson <ian@hixie.ch>
- Cc: public-html-comments@w3.org
Ian: when we chatted at breakfast this morning at TPAC you asked that I remind you of a couple of issues relating to the HTML 5 draft. I assume it eases bug tracking if I send them in separate notes, so this is the first of two. The first concern we discussed is that the semantics of microsyntaxes like signed integer [1] are a) unduly burried in the imperative parsing rules and b) thus at some risk of not making it into any authoring specification. I suspect that's enough to remind you of the concern, but for the benefit of readers who weren't with us at breakfast, here's the same thing in more detail: The declarative part of the explanation of signed integer says: "A string is a valid integer if it consists of one of more characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), optionally prefixed with a U+002D HYPHEN-MINUS ("-") character." Sympathetic readers will obviously infer that the characters "123" in fact refer to the number one hundred-twenty three, but nothing in the above says that. If I were to claim that these characters represented the number "three hundred twenty-one" you couldn't prove me wrong from the above. Now, immediately following the above are a set of step-by-step parsing rules, which implement what appears to be the logic of a function. The last step says "If sign is "positive", return value, otherwise return 0-value", and indeed a sympathetic reader will understand that these rules have indeed computed a result that defines the intended semantic to be "one hundred twenty-three". So, the semantic is there in the parsing rules, at least if you're willing to make the assumption that what's referred to as the return value is in fact the intended semantic of the string being parsed. So, to reiterate the concern, now that the details have been set out: a) There are probably clearer and simpler ways of conveying the intended semantic than burying them in the parsing rules. Alternatives range from informal "these strings have the obvious interpretation as integers, high order digits on the left, etc., with '-' indicating negative numbers" to more rigorous or even formal mappings using the appropriate polynomial. I'm not here recommending which of the many options should be chosen, just suggesting that burying the semantics in the parsing rules is suboptimal. b) I believe the intention is to produce a specification for HTML 5 authors by, among other things, stripping out the parsing rules. There is a risk that the resulting specification would lack any indication at all of the intended interpretation of the strings. I believe that similar comments would apply to many of the other microsyntaxes, and perhaps in other parts of the specification as well. Thank you. Noah [1] http://www.w3.org/html/wg/html5/#signed-integers -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 --------------------------------------
Received on Monday, 20 October 2008 16:35:26 UTC