Re: Comments on element traversl specification (multiple responses) from liorean on 2007-05-28 (public-webapi@w3.org from May 2007)

From: liorean <liorean@gmail.com>
Date: Mon, 28 May 2007 23:10:59 +0200
To: "Web APIs WG (public)" <public-webapi@w3.org>
Message-ID: <cee13aa30705281410p708879ddg76d3f5c82f22df16@mail.gmail.com>
On 27/05/07, Ray Whitmer <ray@jhax.net> wrote:
> There were several responses to my comments. None solved any use case
> I raised or made a case for an API I could use instead of DOM for
> either simple element or mixed processing. Since the responses
> referred to each other and I did not receive them in my in box, I will
> attempt to make responses in a single response.
>
> I realize that this is a proposal for an element processing model,
> which would be expected to map to requirements for processing element
> content, not mixed content.

In many cases, you're only interested in the element tree even in
mixed content. This goes especially for scripting HTML documents in
the browser, where most elements have mixed content models. You are
pretty much always only interested in either element-element relations
or node-node relations, not element-text relations. For example, you
add event handlers on element nodes only, so when traversing the
document for elements to add event handlers to, you have no interest
in non-element nodes at all. In the browser text is usually just for
presenting to users (something the script wants nothing to do with)
but elements are for structure which scripts can be interested in
handling in various ways.

> The suggestion of an added argument was
> only to solve infrequent mixed content processing.

Well, the element traversal is just a set of pointers on element
objects, not methods with arguments, so arguments aren't a
possibility.

> If I have to write
> my own procedures for that, it is expected. But the specification is
> still useless to me reliably processing element content because an
> essential part of that, in my experience, is verifying the assumption
> as I go that it is, indeed element content.

Coming from the browser scripting side, there are generally two cases:
- You're scripting a specific site. You have control over the
structure, or can exert some influence on the developers who have. You
are scripting something for use with the particular structure your
site uses, and can entirely ignore things you know will not be any
issue.
- You're writing a generic library, and have to make your code safe to
use on any type of structure. Ignoring text content helps making the
library robust when you are only interested in the element tree and
particularly an element's location in the hierarchy. I find it more
common to be interested in the parent-child relations than in content
carrying text nodes.


> David writes:
> "Why not just ignore it? It's very useful to be able to traverse the
> element tree without taking text nodes into account at all. Just look
> at CSS, it is purely concerned with element nodes (and
> pseudo-elements). We already have
> nextSibling/previousSibling/firstChild/lastChild/parentNode for the
> case of being interested in other types of nodes."
>
> Response:
> If it is comments or white space, I agree. If not, then it was
> generally not placed there to be ignored. If we expected to receive
> element content and it turned out to be mixed content it is never in
> any application I have written satisfactory to ignore that the mixed
> content exists.

Most scripts on the web deal with elements as black boxes and aren't
particularly interested in textual content. The text is there for the
user, the script is there for behaviour. If there is text content that
comes without an enclosing element, it's generally something we don't
have any interest in. This includes, in fact, the way most content
related scripting is done. We are only interested in elements up till
the point where we want to modify the content. At that point, we're
not particularly interested in anything other than the content, on the
other hand.

On 28/05/07, Ray David Whitmer <ray@jhax.net> wrote:
> The API approaches the problem of element processing in an attitude that
> doesn't care if there happens to be text chunks floating around in your
> element content. I find this slip-shod, and part of a wider set of
> attitudes responsible for the state of markup content on the internet
> not being interoperable or accomplishing what the authors set out to do.
> But enough people program that way that you will probably find a willing
> audience who says, yes, that is exactly what I want, because content is
> slip-shod on the internet due to browsers that allow you to get away
> with anything and interpret it one way or the other.

For an API where there is only one purpose to the handling of a single
documents, I'd say you're right. However, documents are targets for
scripting the DOM for behaviour, but target for rendering for
presentation. Most cases the set of nodes that are interesting for
behaviour is not the same as the set of nodes interesting for the user
in presentation.

> There were exactly two cases I mentioned:
>
> 1. processing element content (with error reporting when it turns out to
> not be mixed content, which should never be out of scope, unless content
> is prevalidated)

I assume you actually meant to say "when it turns out to be mixed
content" there?

In the muddle-on-through environments of HTML, ECMAScript and CSS,
especially considering we have a large set of mixed content elements
in HTML, I'd say error reporting for such a common case would become a
developer nuisance.

> What the API describes allows accidental traversal of mixed
> content when you intended to process element content, which is seldom,
> in my experience, what anyone actually wants because non-whitespace
> content is always relevant, even if you thought you were processing
> element content -- enough that I would recommend that no one attempting
> traversal of element content use the API.

In my experience text content is seldom relevant for the actual
traversal part of the code, independent of whether it's whitespace or
not. The text that is interesting for script is most often contained
in an element of it's own that does not have mixed content.

> Yes, I can perform element processing today by making my own utility
> functions. It is not because I am solving different problems, I believe,
> but because I insist on tighter processing and not silently ignoring
> chunks of text that happen to be floating around in the content I am
> processing.
>
> When I mentor people on processing XML content using APIs, I will
> continue to characterize an API such as yours as slipshod because it
> ignores significant chunks of stuff that should seldom be ignored
> floating around in the input stream, but whatever you like, you will
> ultimately get. A responsible specification of an API of this sort would
> at least warn people of this problem.

If the API was purely for XML processing, generation or editing, I'd
agree. But for browser scripting this ability to entirely ignore
non-element content is very useful.
 --
David "liorean" Andersson
Received on Monday, 28 May 2007 21:11:06 UTC