Re: [CSS21] Comments on the 2003-09-15 CSS 2.1 Draft from Henri Sivonen on 2003-10-21 (www-style@w3.org from October 2003)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 21 Oct 2003 18:10:26 +0300
To: David Woolley <david@djwhome.demon.co.uk>
Cc: www-style@w3.org
Message-Id: <B39101D6-03D8-11D8-ABD7-003065B8CF0E@iki.fi>
On Tuesday, Oct 21, 2003, at 00:32 Europe/Helsinki, David Woolley wrote:

>> I was referring to XML DTDs. Not reading them does not affect the
>> majority of the current Web pages, because the majority is using
>> text/html.
>
> XHTML is XML

For practical parsing purposes, when delivered as text/html, it isn't.

The majority of current Web sites are not delivering what is purported 
to be XHTML in a way that would make it XML for processing purposes and 
for the purposes of the CSS specs.

"CSS defines different conformance rules for HTML and XML documents; be 
aware that the HTML rules apply to XHTML documents delivered as HTML 
and the XML rules apply to XHTML documents delivered as XML." -- 
http://www.w3.org/TR/xhtml1/#C_13

> and that does define names for many characters.

The DTDs do define named entities for characters. It doesn't make 
referencing entities in XML on the Web a good idea or a reliable 
practice. I think defining named entities for characters in the various 
XHTML DTDs is harmful, because it confuses people who don't realize the 
XML spec allows non-validating XML processors to leave the definitions 
unprocessed.

> Unless the world goes to semantics free, invent your own elements,
> XML, HTML and maybe XHTML, will be the main users of CSS for a long
> time to come.

Processing the DTD and an XML vocabulary having defined semantics are 
not coupled.

Anything that can be expressed using XHTML delivered as 
application/xhtml+xml with a doctype declaration can also be expressed 
without the doctype declaration with no loss of semantics.

>> The copyright symbol (or any Unicode character for that matter) can be
>> represented without entities.
>
> There seems strong reluctance, amongsth authors, to use the numeric
> values, and the name is defined for XHTML.

Using UTF-8 (or UTF-16) is a nicer approach than typing NCRs.

> HTML owes its success to hand
> codability[1], and whilst XHTML 2.0 might not have named entities, it 
> is
> actually moving back in the direction of hand codability.  Hand coders
> find &copy; much easier to remember than &#251;.

I find option-1 much easier to type than either the those.

Also, I think it's excellent that the HTML WG has been using Relax NG 
instead of DTDs to define the XHTML 2 grammar drafts formally.

>> Real-world text/html browsers
>
> Which is what most people understand by the term web browser.

Mozilla, Safari and Opera can be characterized as a Web browsers and 
they process and display XHTML delivered as application/xhtml+xml, so 
there are "Web browsers" with which processing of XML (and XHTML in 
particular) is relevant.

>>                               are tag soup processors, so the issues
>
> It is not really possible to implement CSS with a true tag soup 
> browser,
> as CSS requires a well defined parse tree.

A tag soup parser can be used to produce a parser tree that is suitable 
for use with the CSS layout engine.

> I think it may prove commercially very difficult for
> mainstream XML browsers to reject not-well-formed code as well.

Not rejecting ill-formed XML would be commercially shortsighted, in my 
opinion. So far, commercial use of XML outside browsers has worked 
despite (or perhaps thanks to) the well-formedness requirements.

I hope browser makers will firmly reject ill-formed XML even when more 
old tag soupers come along and try their skills with a XML.

-- 
Henri Sivonen
hsivonen@iki.fi
http://www.iki.fi/hsivonen/
Received on Tuesday, 21 October 2003 11:10:29 UTC