W3C home > Mailing lists > Public > www-html@w3.org > July 2005

Re: code and blockcode

From: Laurens Holst <lholst@students.cs.uu.nl>
Date: Thu, 14 Jul 2005 22:02:31 +0200
Message-ID: <42D6C4D7.6020303@students.cs.uu.nl>
To: Bjoern Hoehrmann <derhoermi@gmx.net>
Cc: www-html@w3.org
Bjoern Hoehrmann wrote:

>* Laurens Holst wrote:
>  
>
>>xml:space="preserve" by itself will not achieve the desired effect, it 
>>only controls how whitespace ends up in the DOM. It is additionally 
>>already set to that value on all elements in XHTML 2.0. See also: 
>>http://www.w3.org/MarkUp/2004/xhtml-faq#xmlspace
>>    
>>
>Well, at least http://www.imc.org/atom-syntax/mail-archive/msg12799.html
>Tim Bray and I disagree with this, and the latest XHTML 2.0 draft does
>not seem to define this either. Of course, the HTML WG is not easily
>persuaded http://lists.w3.org/Archives/Public/www-validator/2004Jul/0236
>by technical argument about simple and obvious facts.
>  
>
Your second link doesn’t really seem related.

Anyways, I did some research, and these were the conclusions I made...


      What |xml:space| is used for

The spec says "to signal an intention that in that element, white space 
should be preserved by applications". First of all, note that this is 
about what the XML parser communicates to the application on top of it, 
which is the DOM. Now the confusing thing here is that there are really 
two things that are dealing with whitespace: white space preservation by 
the application on top of the XML processor (the DOM), and white space 
collapsing by CSS. My guess would be that |xml:space| hints at how the 
application should process the spacing, not how the styling language 
processes it.

To start with an example: IE’s XML parser does not insert any text nodes 
with only whitespace into the DOM. This is referred to as ‘element 
content whitespace’ <http://www.w3.org/TR/REC-xml/#sec-white-space> (see 
also here 
<http://www.w3.org/TR/2004/REC-xml-infoset-20040204/#infoitem.character>) 
and DOM levels 1 and 2 don’t specify what has to be done with it. Both 
preserving and ignoring makes sense. Rich text formats such as HTML need 
it to be preserved, because otherwise also the spaces between adjacant 
inline elements would disappear. On the other hand, ignoring them makes 
processing of XML data easier and more compact.

Thus, if you just look at parsing and displaying documents with a 
generic XML processor and CSS, you can distinguish the following cases. 
Example 1, XML as a database, where element content whitespace and other 
adjacent whitespace is not important:

<?xml version="1.0" encoding="UTF-8"?>
<movies xml:space="default">
   <movie>
      <title>The Fifth  Element</title>
      <director>Luc Besson</director>
   </movie>
</movies>

Example 2, a text document, where element whitespace is important 
inbetween ‘reallya good example’:

<?xml version="1.0" encoding="UTF-8"?>
<doc xml:space="preserve">
    <p>It is <em>really</em> <a href="">a good example.</p>
</doc>

Note by the way that nowhere in the DOM specification it says that this 
is how |xml:space| should be processed. Also note that setting 
|xml:space| has no effect on the way IE’s DOM treats them. But nowhere 
in XML does it say that the application on top of the XML parser should 
be a DOM either.

Anyways, if you look at these examples, you will notice that the usage 
of |xml:space| here has *nothing* to do with whether the content should 
be displayed preformatted.

There are more supporting arguments for that preserving spaces is 
desirable in a rich text format such as XHTML. Subsequent whitespace 
which could be collapsed not ending up in the DOM would e.g. also affect 
how well a ‘view source’ application works, or whether CSS applying 
|white-space: pre;| to an element will have any effect at all.

So these are cases that |xml:space| would resolve, because the DOM spec 
itself left the issue ambiguous for a long time. See also the DOM FAQ 
<http://www.w3.org/DOM/faq.html#emptytext>. This ambiguity is only 
resolved in DOM level 3 by means of a parameter 
|element-content-whitespace| 
<http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#parameter-element-content-whitespace> 
(which I suppose sets the default, and can be overridden with 
|xml:space| by a DOM implementation that supports it) and an 
|isElementContentWhitespace| property 
<http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Text3-isElementContentWhitespace>. 
Although a working draft of DOM level 1 also already addressed this 
partially by means of |isIgnorableWhitespace| 
<http://www.w3.org/TR/WD-DOM/level-one-core-971009#Text>, but it did not 
end up in the final specification.

So, for the sake of ensuring the preservation of the document in the DOM 
or whatever other backend the XHTML UA has, |xml:space="preserve"| is 
automatically set as the default value for the entire XHTML document. As 
shown above, it is appropriate for rich text documents.

Automatically placing |xml:space="preserve"| on elements is not at all 
unforseen use by the way, because in the last paragraph of section 2.10 
of the XML spec <http://www.w3.org/TR/REC-xml/#sec-white-space> it 
explicitly says that the attribute can be declared with a default value 
on the root element. I cannot think of much other use cases for that 
other than for the reason XHTML 2.0 does it.


      Using xml:space to express preformatted content

Given that default setting to preserve, it’s not possible to 
/additionally/ specify |xml:space="preserve"| for elements in the 
document. After all, that would change nothing. When using the following 
CSS to achieve the preformatted styling:

*[xml:space=preserve] { whitespace: pre }

It would stop collapsing the whitespace and linebreaks for the entire 
document.

But even supposing that we could use |xml:space| for this purpose, I am 
still left with the question: what would I present ASCII art in? It 
certainly doesn’t qualify as a ‘paragraph’, nor do I see anything else I 
could properly use for that... To give another example, if I have some 
graph expressed by means of preformatted text (which I do quite 
frequently, especially in email), e.g.:

+-+-+-----+--+
|5|5|     |5%|
+-+-+-----+--+

How else would I represent that than by means of the |<pre>| element? 
|<img xml:space="preserve">|? Hmz. So I don’t think, even assuming that 
|xml:space| is also to give a signal with regard to the /display/ of the 
whitespace, that this warrants the removal of |<pre>|.


~Grauw

p.s. with all the effort I made to nicely mark up this document (I guess 
I need to make an extension to Thunderbird to make the semantic elements 
more accessible), it kind of itches that I can mark up inline code as 
‘code’, but have to mark up blocks of code as ‘preformatted’, giving no 
hint of it being code at all.

-- 
Ushiko-san! Kimi wa doushite, Ushiko-san!!
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Laurens Holst, student, university of Utrecht, the Netherlands.
Website: www.grauw.nl. Backbase employee; www.backbase.com.
Received on Thursday, 14 July 2005 20:02:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:16:03 GMT