Re: E[foo*="bar"] selector from Bert Bos on 2003-09-02 (www-style@w3.org from September 2003)

From: Bert Bos <bert@w3.org>
Date: Tue, 2 Sep 2003 15:58:54 +0200
To: www-style@w3.org
Message-ID: <16212.41502.183261.356347@lanalana.inria.fr>
Bjoern Hoehrmann writes:
> 
> * fantasai wrote:
> >Matt wrote:
> >> What if I have an element like this:
> >> <a href="?alpha=200&amp;beta=300">blah</a>
> >> 
> >> Which of these selectors should match it?
> >> 
> >> a[href*="alpha=200&amp;beta=300"]
> >> a[href*="alpha=200&beta=300"]
> >> 
> >> i.e. Does the CSS selector match the HTML entity, or its replaced character?
> >
> >The replaced character. Entity replacement is done during
> >parsing, and CSS doesn't care what happens at that stage
> >of document processing.

Correct. I'm not sure how explicit HTML is about entities that aren't
defined by the spec, but predefined entities like &amp; and &eacute;
are indeed meant to be replaced by the corresponding character. The
entity is offered to get around limitations in the HTML syntax, in the
character encoding, in the software the author uses to type the text,
or simply for the convenience of the author. Apart from that, "&amp;"
and "&" are the same.[1]

But since Björn wants to generalize the question to non-standard
XML-based formats and entities that aren't predefined...

> 
> Not quite. CSS does not specify how user agents must process documents.

Indeed, in the case of an XML document with an externally defined
entity that is parsed with a "non-validating" parser, the XML 1.0 spec
says that the parser may either replace the entity or "inform the
application that it recognized, but did not read, the entity."[2] And
neither XML nor CSS explains what that means.

And, as Björn said, that is only one aspect of a much larger fact:
rendering XML documents is simply not defined at all. That is because
between the XML syntax and the rendering (with CSS or otherwise),
there is supposed to be another step: interpreting the syntax
according to the rules of a specific language. E.g., XHTML and SVG
both use XML, but the structure of an XHTML document is derived from
the XML syntax in a different way than the structure of an SVG
document.

In HTML (SGML-based), the process is fairly straightforward: add the
implied elements, but not the implied attributes and also expand all
entities that HTML defines. In XHTML, it is even simpler, since there
are no omitted tags. In SVG, it is a bit more complex, since SVG
defines that USE elements cause a part of the tree to be duplicated in
other places. Other formats may interpret the XML file in yet other
ways.

Another example: CSS requires that the document tree somehow contains
information about which elements are links and which of them have been
visited, but that information comes partly from outside the document
and partly from the syntax; and the syntax is different in HTML and
SVG.

One may assume that a "generic XML" document (i.e., an XML-based
format that the UA knows nothing about, other than that it is XML) is
turned into the document tree that CSS expects in the "obvious" way:
the document tree is the same as the XML tree, but all attributes are
considered to be string-valued, all entities and CDATA sections are
resolved and all PIs, comments, DOCTYPEs and XML declarations are
removed. Language info (from xml:lang, HTTP or elsewhere) is inherited
through the XML tree. Namespaces ditto (at least for CSS3). If the XML
Core WG defines an "xmlid" attribute, one may assume that it will
match #id selectors. But, in reality, this "obvious" way depends on
the browser and there is no wrong or right way.

In fact, rendering a "generic XML" document is a fallback mechanism:
if the browser doesn't know the format, it can offer some
alternatives, the usual ones being: try to render as "generic XML,"
show the source, or offer to save to disk. Each of these works better
in some cases than in others: rendering a SMIL document as "generic
XML" isn't very useful...

I don't think we have to, or even should, define how these fallbacks
work. After all, we want people to use well-known, standard formats on
the Web, not proprietary ones. A generic XML document may render fine
on screen, but it has no semantics, apart from that visual style.

[1] http://www.w3.org/TR/html401/charset.html#entities
[2] http://www.w3.org/TR/REC-xml#include-if-valid



Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos/                              W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France
Received on Tuesday, 2 September 2003 09:59:14 UTC