- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Fri, 09 Jan 2009 17:20:36 -0500
- To: Arthur Barstow <art.barstow@nokia.com>
- CC: public-webapps <public-webapps@w3.org>
A few comments: 1) In section 7.3, boolean attributes are defined to use case-insensitive matching. Why is that? There doesn't seem to be a definition of case-insensitive here, which worries me, since case-folding is always tricky business (see below). I would suggest requiring a case-sensitive match to "true" or "false" here. 2) Section 8.2, step 2, second list, item 8 has a similar issue for filenames. For example, consider the following pairs of filenames: a) "i" and "I" b) "i" and "ı" c) "i" and "İ" d) "ı" and "I" e) "ı" and "İ" f) "I" and "İ" Here 'i' is U+0069, 'I' is U+0049, 'ı' is U+0131 and 'İ' is U+0130. Which of these pairs should be considered to "upon normalization, case insensitively match"? Seems like (a) should, (c) should, (d) should, right? But (b) and (e), and (f) maybe should not? That means the matching relation is non-transitive, of course. Or should these all match? Or something else? I'm not sure what the reason for this case-insensitive check is exactly; if there's a strong reason for it it needs to be defined. Otherwise it needs to be removed. 3) When parsing a non-negative integer (Section 8.2, step 8), what's the expected behavior for integers larger than 2^32? 2^64? Are implementations of this specification required to do integer arithmetic on arbitrarily large integers? If not, is the behavior just implementation-dependent? 4) Section 8.2, step 8, it would be good to make sure that the image identification table matches the one in HTML5 (possibly by having both specifications refer to a single table, if that's workable). 5) Section 8.2, step 8, I'm not sure why image/svg+xml is required to be processed according to SVGTiny. This means that an SVG 1.1 or SVG 1.2 Full (whenever that happens) user-agent cannot implement this specification, as far as I can see. 6) Section 6.2 talks about using file extensions followed by content-type sniffing to determine MIME types. This sounds to me like the exact process is up to the UA. Then Section 8.2, step 8, has specific lists of extensions and magic numbers that UAs need to recognize. Is the sniffing allowed in Section 6.2 required to be a superset of what Section 8.2 allows? If so, this should be made clearer. If more sniffing is allowed than what's listed in 8.2, this can lead to security problems where two UAs (say a security checker and a web browser) treat the same file in a widget as having different types. This is the sort of situation that HTML5 is trying very hard to avoid with its sniffing algorithm. I feel that all sniffing that UAs are allowed to perform must be explicitly listed in the specification. If that means that not all files can have MIME types deduced, then an alternate mechanism needs to be provided to indicate MIME types for files. 7) It's not clear to me why Section 5.3 allows encoding of filenames using [CP437]. Why not just require UTF-8? 8) The algorithm for getting text content in Section 8.2, step 2 doesn't look correct to me. For example, consider an input element whose XML serialization looks like this: <outer><inner1>First</inner1> <inner2>Second</inner2></outer> The text content of this input, according to the spec's algorithm, is the the string "FirstSecond". I would expect to get "First Second" as the text content in this case. Is there a reason to not just use textContent here? Note that even the example in the specification gets this wrong. There the markup is: <name> The <blink>Awesome</blink> <author email="dude@example.com">Super <blink>Dude</blink></author> Widget</name> for which this algorithm gives "The AwesomeSuper Dude Widget" and not what the spec claims (I have also removed the carriage returns for legibility). In the same algorithm, there's mention of "the input's text nodes". This relationship is not defined in this specification or elsewhere. I assume you mean the text nodes which have input as their ancestor, right? In the same algorithm, rule 4 doesn't make sense to me. What's "position"? Is it a character, or an index? Or something else? If you mean to say that input's nodeValue is to be appended to result, just say that. In the informative section following this algorithm, there is mention of "getTextContent() DOM3 Java interface", whatever that is. I'm not sure why we need to drag Java into this. If we want to say something about the node's DOM3 textContent property, we should just say that, in my opinion. There's no language binding involved here; the property is defined in the relevant IDL and its definition is language-agnostic. 9) In the "Rules for Removing Whitespace" section in Section 8.2, Step 2 have the following language: While position doesn't point past the end of input and the character at position is not one of the space characters, append character to the end of result and let position become the character in input. Here "character" is a Unicode character the first and second time it's mentioned, and seems to be an integer the third time? Or something? If you're trying to say that the position should move to the next character in input, say that, please. 10) Is there a reason to not have any JPEG images in the Image Identification Table in Section 8.2, Step 2? I would have thought widgets might wish to include such images. -Boris
Received on Friday, 9 January 2009 22:21:23 UTC