- From: Rick Jelliffe <ricko@topologi.com>
- Date: Sun, 6 Apr 2003 20:44:47 +1000
- To: <www-tag@w3.org>
The I18n WG's Charmod draft has been in development over a long time, and it an excellent work. However, there is still resistance to many of its ideas. Some of the resistance is based on the idea that it is telling people how to write their software, that is impractical and none of the WG's business. Michael Kay[1] has voiced several concerns on XML-DEV recently, for example. Apart from the specific technical concerns about who does what where and to whom, which are appropriate for the I18n IG forum and XML-DEV and XML Plenary, I think there is an architectural question here which the Architecture document should clarify, even if by just providing a vocabulary of terms that specs can use. The question is how to express that the WWW is complex, but that it can be partitioned in ways that help understand the appropriateness and fixity of W3C specifications for different uses. To be concrete:-- Suggestion ------------- Add to Architecture ideas to the effect:-- 1) The "Standard" World Wide Web: representations and protocols use standard specifications with no incompatible extensions or behaviours. The public WWW must conform to Charmod and WAI. Senders may make assumptions that recipients have will use particular software processes or have available particular files. For example an HTML page: we write an HTML page for a browser and if we expect it to know how to process even if there is no DOCTYPE declaration. "Public Identifiers" may be used for things that are built-in. 2) The "Extended" World Wide Web: where standard representations are used, but there is a layer of user defined usage. In particular, an XML document that does not use a publicly-available language. Senders and recipients must make no assumptions that the other end has anything other than a complete implementation, nor that the other end will use any particular processing software or have available any particular information. 3) The "Private" World Wide Web: this is where there is private agreement between parties to use WWW protocols, but they have negotiated to only use a profile of a specification or that certain processing is expected at the recipient. 4) The "Underworld Wide Web": this is the realm of processing software which creates, maintains, transforms, processes, etc documents but whose input and output are not directly available to strangers. This includes, for example, data capture software working with incomplete documents. With these three definitions, TAG then should say: A) The Standard WWW MUST conform to W3C specs B) The Extended WWW MUST conform to W3C specs, or compatible profiles: these profiles are "subsets-of-agreement" (e.g. "we won't send the letter A") rather than "subsets-of-implementation" (e.g. "a processor may fail if it receives the letter A") C) The Private WWW SHOULD conform to W3C specs, in particular in only using compatible profiles. However, technical practicalities are important. D) The Underworld Wide Web MAY conform to W3C specs, however, technical practicalities are king. Then TAG should say: i) W3C specs SHOULD distinguish which features are appropriate for the Standard, Extended, Private and Under WWW. ii) W3C specs MAY provide features that are not appropriate for the Standard WWW. ---------- Where does this get us? Well, lets look at XML: it allows us to say the following important and useful architectural directives: * Any required post-processing based on the recipient implying certain infoset augmentations is not appropriate for the Extended WWW. This means that it is not appropriate to require validation or schema-augmentation of XML files on the Extended WWW. This is especially true with W3C XML Schemas, because there is no way of assuring that the PSVI the sender wants will be the PSVI that the receiver gets. This has a lot of consequences. * Getting back to Charmod, it allows us to say that Standard WWW and Extended WWW data MUST be early normalized by the sender and clients MUST fail if they detect the problem. But Private and Under- WWW clients MAY not, as suits them. * It shows why the standard character entities are appropriate for text/xhtml+mathml but not for text/xml I think this would go a good way to alleviate the unproductive concerns about the scope of Charmod in particular, but also clarify other topics as well. For example, it enables us to suggest that XML is popular because while it is intended as "SGML on the Web", it also provides excellent support for Underworld activity (entities, multiple character sets, and PIs). It also lets us postulate the best-practice that representations on the Public and Extended WWW need to be atomically parseable: they should not require multiple accesses. Under this best-practice, XML external entities are not appropriate for the Extended WWW but they are appropriate for the Standard WWW. It is appropriate for HTML to use entities, but not for SOAP data, for example. I suspect that many criticism of W3C technology can come down to a lack of awareness (wheteher by the specification developers, by the specification editors, or by the punter outside) that a good specification should either provide broad support for all these four sectors of the WWW *or* be explicit about which areas it is appropriate for. It provides a vocabulary and ground for profiling, for example: * XML Schema WG can warn that value defaulting is inherently unreliable for XML Schemas used on the Extended WWW * XQuery WG can state that, because Schema validation is not reliable for the Extended WWW, type-reliant queries are not reliable (and therefore not appropriate) for the Extended WWW. (Same true for XPath2 and XSLT2.) * XML Core WG can warn that entity boundaries SHOULD not be part of the infoset of a document on the Public and Extended WWW, but they may be part of the infoset for the UnderWWW. Cheers Rick Jelliffe (Invited expert, WC I18n IG, not speaking for them) [1]See http://lists.xml.org/archives/xml-dev/200304/maillist.html (not compiled yet)
Received on Sunday, 6 April 2003 06:40:51 UTC