[Bug 14548] Grouping Content: algorithm for incrementing value (OL->LI @value) does not match any current user agent

http://www.w3.org/Bugs/Public/show_bug.cgi?id=14548

theimp@iinet.net.au changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|minor                       |trivial

--- Comment #7 from theimp@iinet.net.au 2011-11-02 09:52:37 UTC ---
> In general I've tried to avoid specifying such limits because over time, different limits become possible.

I was thinking limits like: 
"Vendors should use at least a 16-bit signed integer to store a list counter;
this is an implementation requirement. Authors should avoid using numbers
beyond the range -32768 to 32767, because they may not be counted predictably;
if larger number ranges are needed, authors must test the values with all
software that they require compatibility with."

The practical limits for the sizes of lists is unlikely to change much. It's
just not likely to ever be realistic for humans to read lists with counts of
two billion or so.

If software needs to manipulate such lists, then that's fine: that software can
just make sure it uses as many bits as it needs (if authors are coding with
specific software in mind, we're really not taking about general-purpose HTML
anymore).

More significantly, though, how long will it be before HTML6? (I understand
about HTML version numbers; I just mean, how long before the limit can be
realistically revised?) 15 years? A given limit does not need to extend unto
eternity. I believe that the current position is more-or-less that the spec. de
jure is not even meaningful if all user agents decide to do something
different; so it wouldn't even take a spec. update for authors to get the
benefit of user agents reliably supporting other values. The spec. is just a
recommended starting point to try to avoid the balkanization that was common
when vendors decided to define their implementations unilaterally.

> e.g. what is a reasonable limit on a 32bit system is not on a 64bit system. Some UAs may get some benefit from using one or two of the higher-order bits for some internal state, making the ideal number for some browsers different than others.

Vendors are likely to use the same limit for the entire user agent family
(rendering core), irrespective of the target hardware. Any user agent that can
be compiled for both a 32-bit and 64-bit environment will likely use the exact
same data structure. Opera's desktop limit, for example, might plausibly arise
from limits due to the unified codebase for Opera Mobile/Opera Mini, etc.

Lynx has an arbitrary limit for negative numbers, probably related to Roman
Numeral rendering concerns. This is a different kind of limit, but one that
authors still need to know about (at least it's documented).

JavaScript (currently) has an arbitrary range. CSS does not, but for practical
purposes an author would never notice, because the only values that can be so
large will also be realistically unusable (currently). Where this is not true
(for example. the CSS Lists Module), there is consideration for including a
realistic minimum limit.

> I think authors understand that when they push the limits, the results won't be the same everywhere;

Does an author understand what's so special about the number 2147483647 that
they get bizarre results when it's exceeded? Do they understand that not all
browsers will behave the same at some limit or another (but they usually won't
document where and how, so you'll have to test them all yourself, including
past and future versions), and that they should consider another structure if
it is critical that it renders identically?

If a major vendor used a four-bit counter, and claimed that this was okay
because 99.999% of lists use only positive numbers and stop before 16, would
this be an appropriate limit?

How does one assess conformance with an impossible limit? If it's okay to say
"we use the HTML5 algorithm, with implementation-specific constraints", then
why can't another say the exact same thing but explain: "But we use an unsigned
integer, so while we follow the algorithm, logical constraints prevent negative
numbers"? It's not that they're not following the spec.; it's that the limits
of binary arithmetic don't allow negative numbers within the constraints that
they have imposed. We've seen stranger claims.

In fact, you could also define an integer to allow only negative numbers. Or
only even numbers. Or any arbitrary set of numbers. "Data is what you define it
to be". If the spec. doesn't define it appropriately, that falls to the user
agent developers, and they could define it as practically anything. Of course,
they won't do anything silly; but the point is that it really shouldn't be up
to anyone else, if authors are to get predictable results.

> furthermore, as the "correct" behaviour at any particular number is clear, the risk of us eventually relying on a particular vendor's error handling for these cases is limited compared to other situations on the platform.

The CSS Box Model was as clear as can be, but probably half of the DIV elements
and lines of CSS ever written were to force a certain user agent to behave in a
certain way.

> What about a list that talks about who owns what dollars of the US debt? One could easily imagine a list with a few list items with values in the trillions. Or a list where the values are distances from earth; one could then imagine a list with truly astronomical numbers if the units used are meters.

Have you ever seen such a list (formatted as a list)? I haven't.

I'm not exactly a semantic purist, but if I understand your examples, then it's
an extremely bad way to use the OL structure. The list counters become content;
the examples are data that belongs in a two-dimensional list (DL) or a table.

(Also ignoring that most such examples will require symbols such as $ or %, or
will require fractions, or need markup such as ABBR, or otherwise be
unrealistic with the spartan list structure in HTML.)

I will rephrase: "I can't think of a scenario where a list is practical beyond
a few thousand entries, or is useful with starting numbers beyond a few
million, when the counters do nothing other than count the entries (i.e. are
not content)."

The number of realistic use-cases are so few that I can't think of one (and
that's not because I haven't tried).

If the counters are content, the "list" almost certainly belongs in a
definition list or table. If the "counters" are presentational markers, then
that's a job for CSS. If they're structure, the practical limits are what
humans can use and not how many bits a developer can allocate.

This is why negative values, or multi-billion numeric counts, though perhaps
useful in some extreme scenarios, are not really needed: counters should count,
nothing more. OL allocates counters because it expresses the semantic
difference from UL that the order of the list matters. It is not really for
numbering *per se*, in a word-processing sense; that's mostly presentation,
handled by CSS. It's not for organizing two-dimensional content; other
structures, such as definition lists and tables, do that.

All that you truly need, structurally, is that entries that come before a given
element are numbered smaller, and entries that come after a given element are
numbered larger. Still, @value has much usefulness even if only from the point
of view that it allows you to use CSS such as: 

li[value=1] { list-style-image: url("http://www.example.com/number1.png") } 

and this makes it reasonable to use the actual numeric values that are
equivalent to your presentation, because this keeps the list consistent for
accessible access, for example. So arbitrary numbering is not automatically
bad, even when using CSS.

But even if you disagree with the content/structure/presentation argument,
you'd have to agree that HTML, without CSS, cannot come close to what authors
would like from a presentational point of view. You can't enter unnumbered
entries, for example. Or compound subvalues (such as "2b" for the second child
of the second child). Such use cases (while impossible due to IDL compatibility
constraints) are plentiful and significant, while negative numbers and
multi-billion numeric counts are features that almost no-one has ever asked
for. Almost any use case is possible with CSS, but if you're relying on CSS
then why even change the behavior from HTML4? Or even what most browsers
implemented?

I can't see the point, but please understand that this is no reason to not
allow something. I am not challenging the spec., just seeking to understand it
so that possible issues are dealt with. Since you have answers that seem to
satisfy you, to leave it unchanged, that's fine with me.

> The behaviour is described. It just keeps going. :-)

Behavior that is not currently, has not ever been, will not ever be, and cannot
ever be, implemented by anyone. And with substitute behavior which varies (some
cap, some overflow) at more than one different limit, and also differently
depending upon how that limit is reached.

Even if developers attempt to have no "hard" limits, and have the limits
imposed at compile-time by the configuration, or at run-time by the amount of
memory; these kind of requirements make testing extremely difficult, and bugs
easy to miss until they cause spectacular havoc. This could also affect
authoring software and even automatic data processing software. Okay, so it's
not the spec. author's duty to ensure that software does not contain bugs, but
I wonder if pragmatism might be beneficial. Software bugs don't just affect
software developers, after all.

I'll admit that I agree with the philosophy in principle, and that the
empirical limits are already unlikely to actually be encountered in any useful
application. So we can scratch this idea if you think it's better.

> I'm pretty sure we can't change this, for legacy compat reasons.

Well, I guess that if authors behave themselves, it's not a problem (big
"if"!).

I admit, I looked hard and couldn't find an example or a likely reason, but I'm
not going to challenge you on that.

***

Summary:

The initial positive sign symbol (U+002B PLUS SIGN):
Pros: Compatible with CSS numbering, allowing trivial machine transformations
for cases where the @value attribute is presentational or where documents are
converted.
Cons: Causes problems for a major, widely-deployed client that is very
infrequently upgraded and which is typically used in environments (such as
remote shell sessions, or in conjunction with accessible input/output hardware)
where it cannot be upgraded directly by the user and where another browser
cannot be used. Forbidden by previous HTML specs.

Negative numbers:
Pros: Occasionally useful. Compatible with CSS numbering.
Cons: Widely unsupported currently. Widely unsupported historically. Forbidden
by previous HTML specs.

The value zero:
Pros: Occasionally useful. Compatible with CSS numbering. Permitted by previous
HTML specs.
Cons: Somewhat supported currently. Somewhat supported historically.

Implementation limits:
Pros: Unavoidable in practice. Makes rendering more predictable for authors.
Makes testing more practical for developers. Makes conformance more conclusive.
Cons: Previously not specified. Constrains usage artificially.


Basically every vendor has to fix at least one problem with their list
implementation under the current spec, and no behavior can be defined that is
compatible with everything unless it matches the one vendor that has not
committed to changing anything (naming no names). That's not a great idea,
because the behavior is both quirky and doesn't match the previous specs., so,
on the balance of everything, it's probably best to emphasize future
usefulness.

I'd make the the initial positive sign symbol (U+002B PLUS SIGN) valid for
authors to use. If it has to be supported anyhow, you lose nothing by allowing
it, but gain complete CSS compatibility for all legal values. I see no reason
to forbid it, and there's less fuss that way. I understand that it is unlikely
to encounter such values in CSS, either. Recommend that it not be used, if that
is a major concern.

If you think this is not a good idea, I presume that's because a major user
agent is currently incompatible with it. But then, negative numbers cause much
bigger problems, and they're in the new spec.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Wednesday, 2 November 2011 09:54:52 UTC