[css3-gcpm] value of content() from Peter Moulder on 2011-10-07 (www-style@w3.org from October 2011)

From: Peter Moulder <peter.moulder@monash.edu>
Date: Fri, 07 Oct 2011 15:58:29 +1100
To: www-style@w3.org
Message-id: <20111007045829.GA29248@bowman.infotech.monash.edu.au>
The content() value for 'string-set':

  # ... returns the textual content of the element, not including the
  # content of its ::before and ::after pseudo-element.  The content of
  # the element's descendants, including their respective ::before and
  # ::after pseudo-elements, are included in the returned content.

The term "element" originally had a very clear meaning, as described in
CSS21/conform.html#element:

  # (An SGML term, see [ISO8879].)  The primary syntactic constructs of
  # the document language [for example P, TABLE and OL in HTML].

Since that definition was written, however, CSS2.1 has come to use
"element" in different ways, for example sometimes excluding display:none
content, and sometimes including text node children of block container
elements but possibly excluding text nodes that would be collapsed away
according to the 'white-space' property.  In many contexts it includes
pseudo-elements, though some places seem to require it to exclude
:first-letter/line pseudo-elements.  The definition of content() isn't
affected by whether :first-line counts as an element or not, but it does
raise doubt about what other pseudo-elements should be included as part
of content(): for example the pseudo-elements that css3 modules introduce
for list-item markers or footnote calls.

For interoperability and technical soundness of specification, we need
a clearer definition of textual content of an element.

In contexts where the term "element" includes ::before and ::after
pseudo-elements, I would usually expect it to exclude display:none
elements, so I was surprised to see the example text [not normative text,
by the way] not just saying that 'string-set' still applies within
display:none elements, but implying that the string gets set to a
non-empty value in the example and usual case.  This suggests to me that
display:none descendants would also be included in the value of
content(), which is a little unexpected if :before/:after descendants
(other than children) are to be included in content().


The rest of this message consists of thoughts that may help decide how
content() should behave.

Considering the usual case of named strings being used to hold the
textual content of headings for use in page headers, I suppose we'd
prefer that the value of content() exclude footnote calls -- especially
if footnote numbering resets each page or if the footnote call involves
styling.

(Footnotes do occur in headings, for example to say "An earlier version
of this article/chapter appeared in..." or "The full text of this ... can
be found in/at...".)

Whereas on the other hand it does seem that we'd usually want page
headings to include the :before text of a section heading in the usual
case that this is used for the section number.  [I.e. usual case if there
is a :before pseudo-element; though it's also common (and arguably better
practice) for the section numbers to be part of the source document.]

For list-item markers and display:none descendants of a
string-set:content() element, I'm not aware of any particularly pressing
arguments for whether to include or exclude their text, so I would go by
consistency with whatever other behaviour we decide on.

It's unfortunate that I get different answers as to the desirability of
the page heading including the text of these different types of
pseudo-elements.  However, note that the :before section number in the
heading example isn't actually included in content() as currently
specified anyway, because content() explicitly excludes the element's
direct :before and :after pseudo-elements.  This means that we can still
have section numbers included in page headings even if we were to specify
a simple "look at the raw source document (subject only to what the document
language considers replaced elements)" rule for what content() means
for the common case of:

  h1 { string-set: page-heading content(before) content() }

The motivation for mentioning the possibility of a "raw source document"
approach was just that it's the simplest possible way of having footnote
calls be excluded.

Whether or not such a "raw source document" approach is appropriate would
depend on the importance of including descendant (other than child)
:before and :after pseudo-element textual content in the value of
content().  What was the reasoning for having content() exclude child
:before & :after pseudo-element content but include other descendant
:before and :after content?

I have a feeling that it's actually a good thing to exclude :before and
:after descendants: CSS, and hence :before and :after pseudo-elements
in particular, are to be used for stylistic effects, and so shouldn't
really be considered as part of the textual content of the heading.[1]
The content of :before or :after is often styled in a particular way
to set it off from the main content to which it's attached (smaller font
for example), and of course this extra styling would not be applied when
the named string is reproduced, so it may well be better not to include
this presumed-ancillary text at all than to include it without its
styling (especially in the case of a dingbats font-family).

I only introduced the "raw source document" approach as a simple starting
point, but following these principles of "string-set copies without
style" and "CSS only applies stylistic effect rather than providing main
content", I'm actually starting to think that this might actually be the
right approach.

There is one remaining issue to consider, and that's whitespace handling.
I'm not very confident on the right approach here, but following the
principle of unstyled content, I suppose we'd leave it for the document
language to define the rules for whitespace.  The HTML4 spec gave rules
for this in section 9.1; what's the HTML5 equivalent?  The xml spec
doesn't give any whitespace rules for xml:space="default", leaving it up
to the application language to decide.

What are the most important use cases for non-child descendant :before or
:after pseudo-elements within a string-set element, or display:none
descendants of a string-set element?

The most common case I can think of where a :before or :after containing
text would occur as a non-child descendant of a string-set element would
be where hyperlinks are annotated, say with a dingbat character or the
text ‘[link]’ in smaller type.  In this case, I'd prefer to exclude this
annotation, though could live with the annotation being included if there
were some other reason to prefer including :before & :after text.

I can't think of common cases where a string-set element would have a
display:none descendant.  Wikipedia has some conditionally-display:none
content as descendants of headings (the Edit hyperlink), though in at
least Wikipedia's case, the actual heading text is in a separately
selectable span element.

pjrm.


[1]: Hence the comment about "arguably better practice" to have section
 numbers be part of the source text rather than using :before: if the
 section numbers are significant, then you wouldn't want them to be absent
 in a user agent that doesn't apply styling, or doesn't implement
 counters() (which included at least one common user agent until fairly
 recently), or when the stylesheet isn't available due to a temporary
 server error.  A common approach is to use authoring tools instead of
 CSS counters to manage section numbering.
Received on Friday, 7 October 2011 04:58:59 UTC