- From: Arjun Ray <aray@q2.net>
- Date: Sun, 6 Feb 2000 02:48:17 -0500 (EST)
- To: www-html@w3.org
On Sat, 5 Feb 2000, Murray Altheim wrote: > [I] have lobbied within the HTML WG to begin work on figuring out > exactly what all of the data types (ie., notations) currently used > in XHTML are, and come to some determination on how they can be > declared in a way that is the same between XHTML DTDs and Schemas. Hoo boy, that last bit is a biggie...:) Right now, the Schema activity has Second System Syndrome. I still can't get through the new WDs without my eyes glazing over, and I suspect my experience isn't atypical. Even though some sort of "DTD compatibility" is a requirement, I'm guessing this will be the first thing to go when the Schema stuff manages to articulate its direction to the General Public. Which is to say, the best we can hope for, IMHO, is the evantual availability of an explicit conversion program, for the old fogies still out there who Have Not Seen The Way And Thus Realized That DTD Syntax Is Imperishably Ugly And Has Gotta Go. > We need to understand better the data types we're using anyway, > esp. as we move into schemas for XHTML. The -datatypes.mod is a good start. Is there something important missing? IMHO, Schemas are tending too much towards an "ontological" view of datatypes (the 'Datatypes' moiety relegating regular expressions to a "string" type says it all, I think.) For a long time, I used to be troubled by the set of "datatypes" provided by SGML. Then I realized that the real problem was the illusion of a "datatype" itself - i.e. it's not useful to think of those thingies in these terms, because SGML is basically just a taxonomic formalism. It's all about names. Attribute value literals, regardless of the declared value, are replaceable character data and thus just strings: the only distinction is whether the string can (or should) be tokenized into combinations of name characters - CDATA vs all the others - because name tokens are the *only* true native notation ("datatype") in SGML, and so that particular tokenization service "comes for free" in the parser. This is why notations are so important: one of their taxonomic roles is to "hook" to other tokenizing, structuring or machine-processing schemes (note the typical usage of system identifiers for notations pointing to interpreters.) Beyomd that is the black art of judging when enough is enough: how much do you try to encompass within your formalism, and how much are you content to simply point to (i.e. record a reference only)? As an example, suppose the tokenization of strings were taken beyond name tokens, to regular expressions (surely the natural generalization in text processing.) Well, can a URI be described by a regular expression (and therefore speficiable "internally" to a parser or validator that groks regexps)? Here's one answer: http://www.deja.com/=dnc/getdoc.xp?AN=513160002 http://www.deja.com/=dnc/getdoc.xp?AN=513219055 Be careful what you ask for! Sometimes the better part of wisdom is to declare a notation, rather than invest hope in a built-in schema validator:) > If the 'DATA' attributes feature proves that valuable, then > perhaps we can lobby for its inclusion in a future version of XML. We need data attributes and something like DAFE, too. Arjun
Received on Sunday, 6 February 2000 02:32:07 UTC