- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Fri, 09 Jan 2009 17:20:36 -0500
- To: Arthur Barstow <art.barstow@nokia.com>
- CC: public-webapps <public-webapps@w3.org>
A few comments:
1) In section 7.3, boolean attributes are defined to use
case-insensitive matching. Why is that? There doesn't seem to be a
definition of case-insensitive here, which worries me, since
case-folding is always tricky business (see below). I would suggest
requiring a case-sensitive match to "true" or "false" here.
2) Section 8.2, step 2, second list, item 8 has a similar issue for
filenames. For example, consider the following pairs of filenames:
a) "i" and "I"
b) "i" and "ı"
c) "i" and "İ"
d) "ı" and "I"
e) "ı" and "İ"
f) "I" and "İ"
Here 'i' is U+0069, 'I' is U+0049, 'ı' is U+0131 and 'İ' is U+0130.
Which of these pairs should be considered to "upon normalization, case
insensitively match"? Seems like (a) should, (c) should, (d) should,
right? But (b) and (e), and (f) maybe should not? That means the
matching relation is non-transitive, of course. Or should these all
match? Or something else?
I'm not sure what the reason for this case-insensitive check is exactly;
if there's a strong reason for it it needs to be defined. Otherwise it
needs to be removed.
3) When parsing a non-negative integer (Section 8.2, step 8), what's
the expected behavior for integers larger than 2^32? 2^64? Are
implementations of this specification required to do integer arithmetic
on arbitrarily large integers? If not, is the behavior just
implementation-dependent?
4) Section 8.2, step 8, it would be good to make sure that the image
identification table matches the one in HTML5 (possibly by having both
specifications refer to a single table, if that's workable).
5) Section 8.2, step 8, I'm not sure why image/svg+xml is required to
be processed according to SVGTiny. This means that an SVG 1.1 or SVG
1.2 Full (whenever that happens) user-agent cannot implement this
specification, as far as I can see.
6) Section 6.2 talks about using file extensions followed by
content-type sniffing to determine MIME types. This sounds to me like
the exact process is up to the UA. Then Section 8.2, step 8, has
specific lists of extensions and magic numbers that UAs need to
recognize. Is the sniffing allowed in Section 6.2 required to be a
superset of what Section 8.2 allows? If so, this should be made
clearer. If more sniffing is allowed than what's listed in 8.2, this
can lead to security problems where two UAs (say a security checker and
a web browser) treat the same file in a widget as having different
types. This is the sort of situation that HTML5 is trying very hard to
avoid with its sniffing algorithm. I feel that all sniffing that UAs
are allowed to perform must be explicitly listed in the specification.
If that means that not all files can have MIME types deduced, then an
alternate mechanism needs to be provided to indicate MIME types for files.
7) It's not clear to me why Section 5.3 allows encoding of filenames
using [CP437]. Why not just require UTF-8?
8) The algorithm for getting text content in Section 8.2, step 2
doesn't look correct to me. For example, consider an input element
whose XML serialization looks like this:
<outer><inner1>First</inner1> <inner2>Second</inner2></outer>
The text content of this input, according to the spec's algorithm, is
the the string "FirstSecond". I would expect to get "First Second" as
the text content in this case. Is there a reason to not just use
textContent here? Note that even the example in the specification gets
this wrong. There the markup is:
<name>
The <blink>Awesome</blink>
<author email="dude@example.com">Super <blink>Dude</blink></author>
Widget</name>
for which this algorithm gives "The AwesomeSuper Dude Widget" and
not what the spec claims (I have also removed the carriage returns for
legibility).
In the same algorithm, there's mention of "the input's text nodes".
This relationship is not defined in this specification or elsewhere. I
assume you mean the text nodes which have input as their ancestor, right?
In the same algorithm, rule 4 doesn't make sense to me. What's
"position"? Is it a character, or an index? Or something else? If you
mean to say that input's nodeValue is to be appended to result, just say
that.
In the informative section following this algorithm, there is mention of
"getTextContent() DOM3 Java interface", whatever that is. I'm not
sure why we need to drag Java into this. If we want to say something
about the node's DOM3 textContent property, we should just say that, in
my opinion. There's no language binding involved here; the property is
defined in the relevant IDL and its definition is language-agnostic.
9) In the "Rules for Removing Whitespace" section in Section 8.2, Step 2
have the following language:
While position doesn't point past the end of input and the
character at position is not one of the space characters,
append character to the end of result and let position become
the character in input.
Here "character" is a Unicode character the first and second time it's
mentioned, and seems to be an integer the third time? Or something? If
you're trying to say that the position should move to the next character
in input, say that, please.
10) Is there a reason to not have any JPEG images in the Image
Identification Table in Section 8.2, Step 2? I would have thought
widgets might wish to include such images.
-Boris
Received on Friday, 9 January 2009 22:21:23 UTC