- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Tue, 21 Oct 2003 18:10:26 +0300
- To: David Woolley <david@djwhome.demon.co.uk>
- Cc: www-style@w3.org
On Tuesday, Oct 21, 2003, at 00:32 Europe/Helsinki, David Woolley wrote: >> I was referring to XML DTDs. Not reading them does not affect the >> majority of the current Web pages, because the majority is using >> text/html. > > XHTML is XML For practical parsing purposes, when delivered as text/html, it isn't. The majority of current Web sites are not delivering what is purported to be XHTML in a way that would make it XML for processing purposes and for the purposes of the CSS specs. "CSS defines different conformance rules for HTML and XML documents; be aware that the HTML rules apply to XHTML documents delivered as HTML and the XML rules apply to XHTML documents delivered as XML." -- http://www.w3.org/TR/xhtml1/#C_13 > and that does define names for many characters. The DTDs do define named entities for characters. It doesn't make referencing entities in XML on the Web a good idea or a reliable practice. I think defining named entities for characters in the various XHTML DTDs is harmful, because it confuses people who don't realize the XML spec allows non-validating XML processors to leave the definitions unprocessed. > Unless the world goes to semantics free, invent your own elements, > XML, HTML and maybe XHTML, will be the main users of CSS for a long > time to come. Processing the DTD and an XML vocabulary having defined semantics are not coupled. Anything that can be expressed using XHTML delivered as application/xhtml+xml with a doctype declaration can also be expressed without the doctype declaration with no loss of semantics. >> The copyright symbol (or any Unicode character for that matter) can be >> represented without entities. > > There seems strong reluctance, amongsth authors, to use the numeric > values, and the name is defined for XHTML. Using UTF-8 (or UTF-16) is a nicer approach than typing NCRs. > HTML owes its success to hand > codability[1], and whilst XHTML 2.0 might not have named entities, it > is > actually moving back in the direction of hand codability. Hand coders > find © much easier to remember than û. I find option-1 much easier to type than either the those. Also, I think it's excellent that the HTML WG has been using Relax NG instead of DTDs to define the XHTML 2 grammar drafts formally. >> Real-world text/html browsers > > Which is what most people understand by the term web browser. Mozilla, Safari and Opera can be characterized as a Web browsers and they process and display XHTML delivered as application/xhtml+xml, so there are "Web browsers" with which processing of XML (and XHTML in particular) is relevant. >> are tag soup processors, so the issues > > It is not really possible to implement CSS with a true tag soup > browser, > as CSS requires a well defined parse tree. A tag soup parser can be used to produce a parser tree that is suitable for use with the CSS layout engine. > I think it may prove commercially very difficult for > mainstream XML browsers to reject not-well-formed code as well. Not rejecting ill-formed XML would be commercially shortsighted, in my opinion. So far, commercial use of XML outside browsers has worked despite (or perhaps thanks to) the well-formedness requirements. I hope browser makers will firmly reject ill-formed XML even when more old tag soupers come along and try their skills with a XML. -- Henri Sivonen hsivonen@iki.fi http://www.iki.fi/hsivonen/
Received on Tuesday, 21 October 2003 11:10:29 UTC