[whatwg] Citing multiple <blockquote> elements in HTML5 from Calogero Alex Baldacchino on 2008-12-03 (public-whatwg-archive@w3.org from December 2008)

From: Calogero Alex Baldacchino <alex.baldacchino@email.it>
Date: Wed, 03 Dec 2008 06:48:34 +0100
Message-ID: <49361DB2.1010103@email.it>

Benjamin Hawkes-Lewis ha scritto:
> Calogero Alex Baldacchino wrote:
>> [...]
>>
>
> I think you're confusing parsing rules that conforming user agents
> must follow to associate identifiers with elements (even when ids are
> duplicated) with the authoring rules that conforming documents must
> follow (ids must be unique).

Ok, so what's what?

When you read "The value must not contain any space characters.", is it
an authoring rule for conforming documents, for you? Ok.

When you read "*If the value is not the empty string, user agents must
associate the element with the given value (exactly, including any space
characters)* for the purposes of ID matching within the subtree the
element finds itself (e.g. for selectors in CSS or for the
|getElementById()| method in the DOM).", is it a parsing rule for
conforming user agents, for you? Ok. But, isn't it worth to spend a word
everywhere in the spec to tell when it's a quirck for backward
compatibility, which might go away in the future, and when it's not,
because that's not needed? And when it's a drawback from the past,
shouldn't it be considered in every aspect? After all, wasn't one of the
main goals of html 5 to turn unwritten and browser-specific rules into
written and standard behaviours?

I mean, if you allow spacing characters inside an id value, as a parsing
rule, you can face something like '<div id="foo bar" >', that is an id
consisting of more than one token. Is it good to leave it in untouched?
Yes? Ok, but what does it mean for CSS's, since there is a reference to
them as one reason to allow space characters? That is, can a browser
handle an id selector starting with the '#' character and being broken
by a blank space? Or better, is it legal in CSS? Honestly, again, I
don't remember well, I've never tried something like that (since makes
no sense at me), and I think that's illegal. But let's say that's
illegal for conforming style sheets, but existing user agents may or may
not allow that, each one with its own behaviour. If we "close one eye"
for '<div id="foo bar" >' in a piece of HTML 5 code, but leave its CSS
counterpart to a free implementation, we'll solve half of the problem
(where the problem is turning unwritten rules to written, and possibly
improved, standards), won't we? But any kind of "CSS quirks" would be
out of an HTML specification, and I believe '<div id="foo bar" >' is a
trouble (if instead "foo bar" is not a valid id selector for CSS in any
browser, that means we're allowing user agents to parse as valid an id
which is inconsistent with CSS, and so CSS selectors cannot be a reason
to allow space characters inside an id string - at least, with respect
to any direct reference to the identifier value). But it might be a
trouble per se, even only for html conformance by user agents, since an
URL fragment might contain escaped space characters, but an escaped
space isn't the same thing as the space character itself, so the rule of
exact matching, applied to space characters inside an id, may be a
trouble without extensively considering the '<div id="foo bar" >' case.

Now, let's say, instead, that a user agent, conforming with HTML 5
specifications, must cut off any token after the first one (I know
actually "foo bar" is taken as is), that is <div id="foo bar"> becomes
<div id="foo "> and <div id=" foo "> is valid too. In such a case,
skipping any spaces too, and stating the same behaviour for strings
passed to .getElementById() could be nice as a graceful degradation for
documents non-conforming with the rule "the value [of an id attribute]
must not contain any space characters", but such might fail with CSS
selectors such as 'div[id="foo bar"]'.

Perhaps a compromise, if acceptable for backward compatibility, might be:
- when the id value must be compared to a fragment identifier, strip any
trailing space characters; if the match fails, escape any other space
characters both in the id value and in the fragid and try again;
- when an attribute is defined to hold an url and its value has spaces
in its path/query/fragment, escape them before resolving the url (not
sure if needed);
- for the purpose of ID matching through the DOM 'getElementById'
method, leave the id value untouched;
- for the purpose of ID matching through CSS selectors accessing it as
an attribute, leave the id value untouched;
- for the purpose of ID matching through CSS selectors directly
accessing it (e.g. '#foo') either choose the first sequence of
non-spacing characters or let the match fail (I can't decide what's
better, but perhaps the former would fail as well, since I guess anyone
coding <div id="foo bar"> not only as a fragment identifier, but also
for styling, might have the nice idea to write "#foo bar { font-weight :
bold; }" as well).

Anyway, if the id value is also a fragment identifier, which might have
space characters (since parsing rules prescribe to add such characters
to the unreserved production), does the (authoring) rule "the value must
not contain any space characters" make sense?

Now let's come to the duplicated ids issue. Again, what's what? When
it's said, "The id attribute represents its element's unique identifier.
*The value must be unique in the subtree within which the element finds
itself and must contain at least one character.*", I think that's what
you call an authoring rule. So, I don't think it was so bad to ask for a
clarification on the subtree nature. And if a subtree happened to match,
eventually, an element subtree inside a document, was the suggestion for
a getElementById method on the HTMLElement interface so awful?
Otherwise, let's consider (again) the second paragraph:

"If the value is not the empty string, user agents must associate the
element with the given value (exactly, including any space characters)
*for the purposes of ID matching within the subtree the element finds
itself (e.g. for selectors in CSS or for the |getElementById()| method
in the DOM).*"

It's a parsing rule, isn't it? But it tells also the id must be unique
in the whole document for the purpose of ID matching through the
getElementById() method in the DOM, because the only object capable to
get an element by its id is an instance of the Document interface. So,
any choice should be taken on what to do with duplicated ids. Solving
the question at the parser level (i.e. defaulting any duplicated id to
the empty string) would be consistent with both the fragment identifier
behaviour (only the first occurrence is valid) and the uniqueness rule,
but might brake some semantics (i.e. an hyperlink used to create an
instance of a <dfn>, or a <blockquote> with a cite attribute referencing
a <cite> element, both with a duplicated id not being the first
occurrence). On the other hand, leaving the duplicated id in the
document requires some changes in the Document's getElementById()
method, since the W3C DOM Core does not define a unique behaviour in
such a case, and I've expressed a few dubts on solving this by adding an
equivalent method on the HTMLDocument interface; anyway the
getElementById() behaviour must be defined for such situations, and
having it to pick the first match may be a solution (but might cause
side/unwanted effects if misused in actual documents, and leaves no
chance to access directly to any element with a duplicated id, but if
I'm not careful when choosing an ID, I can complain just with myself...
- anyway, the uniqueness fulfillment might become problematic when
dinamically putting together pieces of code, perhaps from different
sources, e.g. using XMLHTTPRequests, or because of externally syndicated
contet, but this is in the scope of careful programming).

From the point of view of CSS, both choices may be consistent with
coupled rules such as "#foo { font-size : 13; }" and #foo { font-size :
14; }", since both would refer to the same element because of cascading
rules; on the other side, something like 'div[id="foo"] {/*something
here*/}' or a direct reference to an ID selector as a descendant of
different elements might perhaps isolate different elements in the
document (whether to allow such or not is outside html scope - but are
such cases in the wild?), and for the purpose of compatibility with
document styled that way, leaving duplicated ids in the document would
be a better choice. But, in such cases, shouldn't the DOM elements
selection be consistent with the CSS elements selection (i.e. to avoid
side-effects when CSS rules manipulate the DOM itself)? That is, if
through CSS it were possible to reach elements with duplicated ids in
different subtrees of a document tree (according to the definition of
all nodes descendant of a non-leaf node as being part of its subtree)
and to manipulate their content, shouldn't it be possible through the
DOM too?

Anyway, I'm not so much confused, no more than usual :-P

BR, Alex.

--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
CheBanca! La prima banca che ti d? gli interessi in anticipo.
* Fino al 4,70% sul Conto Deposito, zero spese e interessi subito. Aprilo!
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=7917&d=3-12

Received on Tuesday, 2 December 2008 21:48:34 UTC