RE: Fwd: I18N Last call comments on Schema Part 2

Martin wrote:
>However, there are at least two problems with this:

> It does not address localization of anything else than
>   boolean (and maybe NMToken,...). The example with dates

That comment was trying to be minimalistic and only addressed the simplest case where the possibilities could be enumerated and a table of substitutions could map a specific lexical representation to
the base lexical representation.

A more general capability could be built from the regex engine that has already been mandated for a compliant parser.  Something like this could create date type with a US style lexical representation
for date:

<xsd:simpleType name="USDate" base="xsdt:date">
	<xsd:transform>
		<!--  \3 expands third group, \1 expands first group  -->
		<xsd:replace match="(\d\d)/(\d\d)/(\d\d\d\d)" replace="\3-\1-\2"/>
	</xsd:transform>
</xsd:simpleType>

<xsd:simpleType name="EuroDouble" base="xsdt:double">
	<xsd:transform>		
		<!--  replace all commas with periods and
			eliminate all periods used as thousand separators.
                  like translate function in XSLT
		-->
		<xsl:translate match=",." replace="."/>
	</xsd:transform>
</xsd:simpleType>


>Unique representations of numerics, points 12-15:
>
>While there may be some benefit of having a unique representation for each 
>value in the numeric values spaces for data signing, it would result in 
>the numeric datatypes being unusable for creating
>schemas for the vast majority of existing XML documents and not be usable 
>with the current generation of XML technologies.

Basically, the current schema datatypes are generally compatible with common XML usage (which is dominantly Anglo) and are hostile to non-Anglo lexical representations.  The numerics representation
you described would be hostile to 99.99% of XML documents with numerics.  I would be much more in favor of expanding schema datatypes to enable non-Anglo representations than restrict schema
datatypes.

>This means you can't handle double or float with XSLT.

Unless you restrict the type with a pattern to forbid the 'E' term, add extension functions to XSLT, or depend on non-standard behavior of the XSLT processor.  I would definitely hope XSLT 1+ adds
native support for the 'E' term.

>or guarantee conformance to the formats.

XSLT's decimal-format specifies that only the functionality of a specific JDK (1.1?) is supported.  Exponential terms in the format were introduced after that (JDK 1.3?).

Having the exponential term is highly, highly desirable in engineering apps (my domain) so that you don't need to use 200-300 characters to represent only 15 digits of precision and to maintain human
comprehendability of the data.  However, if you have to build one and only one lexical representation for a value that was compatible with existing XML infrastructure, the E term would have to go.
However, I am not in favor of that.

>Also, please note that actual contracts, although they may express
>things in terms of months or years, actually are up to a specific
>date,...

Looks like the lawyers have been bitten by ambiguity and don't want to be bitten again.  However, I pay my rent based on a specific rate per month.


>>The spec doesn't say anything about this (comparison of timeDurations). 
>>We made our comments on the spec, not on something else.

I agree that the spec should address it.

>>Of course, this doesn't eliminate the problems that months and years
>>are highly calendar-dependent.

Agreed. But without them, timeDuration should just be derived from double and interpreted as seconds.  I have frequently commented on time related issues and agree that repeating durations and such
don't appear to be generally useful enough for the complexity that the introduce.  However, the Schema WG felt compelled to somehow derive date and time from a common ancestor.

>>By the way, your mail also mentioned the case of time zones producing
>>multiple representations. I forgot about that when writing up my
>>comments, it should be added.

I think this is the worst offender of the multiple representation cases since it really does tempt you to communicate something else (the time zone) in the format of a value.

Received on Thursday, 1 June 2000 13:23:48 UTC