- From: Jukka Korpela <jkorpela@cc.hut.fi>
- Date: Thu, 26 Mar 1998 16:22:37 +0200 (EET)
- To: david_richmond@nl.compuware.com
- cc: www-html@w3.org
On Thu, 26 Mar 1998 david_richmond@nl.compuware.com wrote: > I would like to see a formal HTML way of formatting data-type values, > such as dates and numbers. This sounds presentation-oriented, and the current trend is drop such feature out of HTML itself. I suggest that you rephrase suggestions in terms of structural markup (which _may_ involve information used by browsers to select suitable presentation). For example, you might suggest a text-level (phrase) element for marking up some text as consisting of data of some specific kind, such as being a date notation. You would then need to explain why should markup would be useful. I think dates are not very interesting in this way, as I'll try to explain. But there might be other reasons for introducing "data type markup", as I'll propose. > The raw value would be specified using USA > conventions, Why? This is the World Wide Web. And there is an international convention for date notations, developed for the presentation of dates in international contexts. See http://www.roguewave.com/resources/exchange/iso8601.html for a description of the standard, ISO 8601:1988. It has been criticized for being too strange to normal people to read, but this counterargument does not apply to things like notations in HTML source. This is the approach adopted in http://www.w3.org/TR/REC-html40/types.html#h-6.11 > but can be reformatted to the user agent's conventions. > For example, an american date of 3/26/98 would be shown in a European > user agent as 26/3/98 or even as 26 March 1998. Don't you find this confusing? If I were reading a document in English on a browser configured for the Finnish locale, I would probably see the date as 26. maaliskuuta 1998. The formatting should take place according to the language of the _document_. Thus, the date can be written that way in the first place. (Quite apart from this, I think it might be useful to switch to the 1998-03-26 style universally.) > As for the best way of doing this I am not sure, but adding a > <DATATYPE> tag would be one way. Well, a bit long name. The element for "data type markup" could be called just DATA. For example, <DATA TYPE="date">1998-03-26</DATA> could just indicate that 1998-03-26 is specifically a date. This might be marginally useful in the sense that various checkers could validate the format of dates against some syntax, preferably some ISO 8601:1988 based syntax in this case. A style sheet might suggest some particular font face for dates, for example. But for reasons explained above, it is highly questionable whether the _essentials_ of presentation should be affected. Similarly, something like > <DATATYPE type=number>1000.00</DATATYPE> should not affect the use of decimal point versus decimal comma, for example, but it might affect the font face and size, if the browser so decides. Much more importantly, such markup could be essential in _translation_. I once translated some texts on HTML 3.2 using BabelFish. I noticed that the translation program was too clever: it noticed the 3.2 and converted this number according to the conventions of the target language, making it 3,2 for languages which use decimal comma! Thus, an author might wish to assist translation software to make it realize that 3.2 is a code-like notation, not a number with a decimal point in it. On the other hand, something like <DATA type=realnumber> vs. <DATA type=versionnumber> would make things rather awkward. Perhaps a better solution would be an element (nestable with other text level markup of course) for simply indicating that its content is to remain invariant in translations. This would allow us to use, say, the name of an HTML element or a C language keyword in running English text so that a translation program had a chance of realizing that those names, although found in English glossaries, are not to be translated. Someone might says that the technically simplest way to achieve this would be to introduce a specifing language name (LANG attribute value), such as "none" (with the meaning 'no human language'), e.g. The <LANG="none">TITLE</LANG> element... but the problem is that the HTML element name TITLE _is_ an English word in the sense that it is _pronounced_ as an English word. > In the case of INPUT elements simple datatype validation could also be > performed by the user agent on any values modified by the user. This is an interesting question but quite distinct from marking up the document content. It relates to specifying to allowable type of _user input_. The current trend seems to be to handle them in the server side, possibly with client-side checking (using JavaScript) before submission. The situation is unsatisfactory, and it would be a cleaner solution to allow the HTML code specify the expected input format. The problem is that although simple checks (such as data being numerical) would be easy, authors need all kinds of checks, and any method which provides the necessary universality would inevitably have the power of a simple programming language at least. Yucca, http://www.hut.fi/u/jkorpela/
Received on Thursday, 26 March 1998 09:22:45 UTC