- From: Laurens Holst <lholst@students.cs.uu.nl>
- Date: Thu, 14 Jul 2005 22:02:31 +0200
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: www-html@w3.org
- Message-ID: <42D6C4D7.6020303@students.cs.uu.nl>
Bjoern Hoehrmann wrote:
>* Laurens Holst wrote:
>
>
>>xml:space="preserve" by itself will not achieve the desired effect, it
>>only controls how whitespace ends up in the DOM. It is additionally
>>already set to that value on all elements in XHTML 2.0. See also:
>>http://www.w3.org/MarkUp/2004/xhtml-faq#xmlspace
>>
>>
>Well, at least http://www.imc.org/atom-syntax/mail-archive/msg12799.html
>Tim Bray and I disagree with this, and the latest XHTML 2.0 draft does
>not seem to define this either. Of course, the HTML WG is not easily
>persuaded http://lists.w3.org/Archives/Public/www-validator/2004Jul/0236
>by technical argument about simple and obvious facts.
>
>
Your second link doesn’t really seem related.
Anyways, I did some research, and these were the conclusions I made...
What |xml:space| is used for
The spec says "to signal an intention that in that element, white space
should be preserved by applications". First of all, note that this is
about what the XML parser communicates to the application on top of it,
which is the DOM. Now the confusing thing here is that there are really
two things that are dealing with whitespace: white space preservation by
the application on top of the XML processor (the DOM), and white space
collapsing by CSS. My guess would be that |xml:space| hints at how the
application should process the spacing, not how the styling language
processes it.
To start with an example: IE’s XML parser does not insert any text nodes
with only whitespace into the DOM. This is referred to as ‘element
content whitespace’ <http://www.w3.org/TR/REC-xml/#sec-white-space> (see
also here
<http://www.w3.org/TR/2004/REC-xml-infoset-20040204/#infoitem.character>)
and DOM levels 1 and 2 don’t specify what has to be done with it. Both
preserving and ignoring makes sense. Rich text formats such as HTML need
it to be preserved, because otherwise also the spaces between adjacant
inline elements would disappear. On the other hand, ignoring them makes
processing of XML data easier and more compact.
Thus, if you just look at parsing and displaying documents with a
generic XML processor and CSS, you can distinguish the following cases.
Example 1, XML as a database, where element content whitespace and other
adjacent whitespace is not important:
<?xml version="1.0" encoding="UTF-8"?>
<movies xml:space="default">
<movie>
<title>The Fifth Element</title>
<director>Luc Besson</director>
</movie>
</movies>
Example 2, a text document, where element whitespace is important
inbetween ‘reallya good example’:
<?xml version="1.0" encoding="UTF-8"?>
<doc xml:space="preserve">
<p>It is <em>really</em> <a href="">a good example.</p>
</doc>
Note by the way that nowhere in the DOM specification it says that this
is how |xml:space| should be processed. Also note that setting
|xml:space| has no effect on the way IE’s DOM treats them. But nowhere
in XML does it say that the application on top of the XML parser should
be a DOM either.
Anyways, if you look at these examples, you will notice that the usage
of |xml:space| here has *nothing* to do with whether the content should
be displayed preformatted.
There are more supporting arguments for that preserving spaces is
desirable in a rich text format such as XHTML. Subsequent whitespace
which could be collapsed not ending up in the DOM would e.g. also affect
how well a ‘view source’ application works, or whether CSS applying
|white-space: pre;| to an element will have any effect at all.
So these are cases that |xml:space| would resolve, because the DOM spec
itself left the issue ambiguous for a long time. See also the DOM FAQ
<http://www.w3.org/DOM/faq.html#emptytext>. This ambiguity is only
resolved in DOM level 3 by means of a parameter
|element-content-whitespace|
<http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-element-content-whitespace>
(which I suppose sets the default, and can be overridden with
|xml:space| by a DOM implementation that supports it) and an
|isElementContentWhitespace| property
<http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Text3-isElementContentWhitespace>.
Although a working draft of DOM level 1 also already addressed this
partially by means of |isIgnorableWhitespace|
<http://www.w3.org/TR/WD-DOM/level-one-core-971009#Text>, but it did not
end up in the final specification.
So, for the sake of ensuring the preservation of the document in the DOM
or whatever other backend the XHTML UA has, |xml:space="preserve"| is
automatically set as the default value for the entire XHTML document. As
shown above, it is appropriate for rich text documents.
Automatically placing |xml:space="preserve"| on elements is not at all
unforseen use by the way, because in the last paragraph of section 2.10
of the XML spec <http://www.w3.org/TR/REC-xml/#sec-white-space> it
explicitly says that the attribute can be declared with a default value
on the root element. I cannot think of much other use cases for that
other than for the reason XHTML 2.0 does it.
Using xml:space to express preformatted content
Given that default setting to preserve, it’s not possible to
/additionally/ specify |xml:space="preserve"| for elements in the
document. After all, that would change nothing. When using the following
CSS to achieve the preformatted styling:
*[xml:space=preserve] { whitespace: pre }
It would stop collapsing the whitespace and linebreaks for the entire
document.
But even supposing that we could use |xml:space| for this purpose, I am
still left with the question: what would I present ASCII art in? It
certainly doesn’t qualify as a ‘paragraph’, nor do I see anything else I
could properly use for that... To give another example, if I have some
graph expressed by means of preformatted text (which I do quite
frequently, especially in email), e.g.:
+-+-+-----+--+
|5|5| |5%|
+-+-+-----+--+
How else would I represent that than by means of the |<pre>| element?
|<img xml:space="preserve">|? Hmz. So I don’t think, even assuming that
|xml:space| is also to give a signal with regard to the /display/ of the
whitespace, that this warrants the removal of |<pre>|.
~Grauw
p.s. with all the effort I made to nicely mark up this document (I guess
I need to make an extension to Thunderbird to make the semantic elements
more accessible), it kind of itches that I can mark up inline code as
‘code’, but have to mark up blocks of code as ‘preformatted’, giving no
hint of it being code at all.
--
Ushiko-san! Kimi wa doushite, Ushiko-san!!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Laurens Holst, student, university of Utrecht, the Netherlands.
Website: www.grauw.nl. Backbase employee; www.backbase.com.
Received on Thursday, 14 July 2005 20:02:43 UTC