Proposal of an element for numbers

I'd like to propose a new element for XHTML2 and later to markup all kind of
numbers (even written out ones). The benefit of this attribute would be
mainly the easier interpretation by screen reading or translation programs
and data gatherers like search engines, but might also be useful for
styling, e.g. setting "white-space:nowrap;" to ensure the dimension stays in
the same line as the value without having to use  .
I thought of the possible names "number", "nr", "no", "n" and "value", but
are open for better suggestions. So far I'll be using "nr".
This element would belong into the Text Module.
The most important thing about this element would be its attributes to
define the format of the enclosed number. The content of that attribute[s]
would be similiar to the way you define custom formats in your spreadsheet
program:

---

 Element | Attributes | Minimal Content Model
---------+------------+-----------------------
 nr      | Common     | (PCDATA | Inline)*

Inline
    abbr | acronym | cite | code | dfn | em | kbd | nr | quote | samp |
    span | var

--

8.X The *nr* element

The _nr_ element indicates that a text fragment is of a numeric kind
(e.g. date, measurement, price).

/Attributes/

-

The Common collection
  A collection of other attribute collections, including: Core, Events,
  I18N, and Hypertext

-

system = text [CI]

  This attribute specifies the numeric system used. It contains a white-
  space seperated list of keywords. In case of a conflict, only the
  first keyword should be used. It defaults to "decimal arabic".

  Valid keywords are
   · "literal",
   · "arabic",
   · "roman",
   · "binary",
   · "octal",
   · "decimal",
   · "hexadecimal".
   <!-- t.b.c. -->

-

format = text [CS]

  This attribute defines the format of the (numerical) string enclosed
  by the current element. It may either contain a space seperated list
  of keywords or a format string.

  The predefined keywords are
   · "date" (e.g. "2002-08-22"),
   · "time" (e.g. "12:00:00"),
   · "currency" (e.g. "$12"),
   · "dimension" (e.g. "12,8 cm", "356.34 kN")
   · "numeric" (default, generic, e.g. "12", "123.456" "654,321"),
   · "ordinal" (e.g. "3.", "2nd").

  User agents SHOULD provide algorithms to guess the exact format thus
  be able to extract it and convert it to the user's prefered format.
  It's suggested to also look at the appropriate xml:lang attribute to
  solve this issue.
  A conforming UA MUST support format strings, but MAY ignore keywords.

  Format strings are built as follows.
  <!--quick incomplete design of mine-->
    required digit: "0" also means entire strings, if "literal"
    optional digit: "#" equals "0?"
    any letter: "*" more specific letters with no numeral meaning
                have to be surrounded by quotes.
    Square brackets ("[" & "]") define the format of the preceding
    digit with special meaning.
   Digits with special meaning:
    year: "Y", i.e. for a four digit year: "YYYY" or "Y[0000]".
    quartal of year: "Q"
    month of year: "M"
    week of year: "W"
    day of year: "C"
    day of month: "D"
    day of week: "d", starting with Monday = 1
    AM/PM modifier: "H"
    hour of day: "h"
    minute of hour: "m"
    second of minute: "s", millisecond and smaller units are decimal
                      fractions of a second
    Unix time second: "S"
    literal date: "!" suffixed: "M!", "D!" (e.g. Ides of March) and "d!"
    literal date abbreviated: ":" suffixed: "M:" and "d:"
   Strings with special meaning:
    currency: "$" (e.g. "-#*0(.00)?$")
    dimension: "~" (e.g. "00.0' '~")
    decimal point: "." (a komma in some languages)
    omittable grouping point: "," (e.g. the komma each 3 digits in English)
    required grouping point: ";" (e.g. the divider between days and months)
    +/- sign: "-", omittable if positive
    preceding expression occurs
     · zero or one time: "?" (e.g. "'-'?0"), "0?" == "#", ";?" == ","
     · zero to infinite times: "*"
     · zero to specified times: "*" followed by integer, "0*3" == "###".
     · one to infinite times: "+"
     · occurs one to specified times: "+" followed by integer.
    grouped: surrounded by round brackets "(" & ")"
    delimiter between alternatives: "|"

-

value = text [CI]

  This attribute allows to serve the actual value of the enclosed text
  in a standardized way. This is either
   · the ISO date & time format for such data,
   · a plain decimal arabic number with a dot (".") as decimal point
     and an optional prefixed plus or minus sign ("-0+(.0+)"), or
   · such a number followed by a whitespace and the literal dimension,
     which must either be SI defined or a three letter currency code.
   <!-- Should a keyword "calculate", which would be the default, be
        provided that told the UA to calculate the value itself? -->

-

/Examples/

Henry <nr format="ordinal" system="roman">VII</nr> had <nr
value="6">six</nr> wives.

<nr system="literal" format="0' '~">One inch</nr> equals <nr format="0.00'
'~">2.54 <abbr title="centimeters">cm</abbr></nr>.

<nr format="##0'.'##0'.'##0;##0">123.0.255.17</nr> is an IP address.

There are many ways to write a single date, for example according to ISO:
<nr format="YYYY,MM,DD">2002-08-22</nr>, US-American: <nr
format="M[#0]'/'D[#0]'/'Y[##00]?">08/22/2002</nr> or German style: <nr
format="D[#0]'.'M[#0]'.'Y[(00|'&apos;')00]">22.8.'02</nr>.

We'll meet on <nr format="date" system="literal">August 22nd</nr> at <nr
format="time" system="literal">twelve o'clock</nr>.

<nr format="literal">Five</nr> is <nr system="binary">101</nr>.

---

Some examples given above aren't so good, because their context requires the
numbers to be shown as typed.

Note that so far only Arabic and Roman numbers had been taken into
consideration. There're mixtures like writing the month with a Roman number
that cannot be covered with a single occurence of the element as definded in
this proposal, i.e. nesting should be allowed:
<nr format="date">
    <nr format="DD'.'">22.</nr>
    <nr system="roman" format="M+'. '">VIII. </nr>
    <nr format="'A.D. 'Y[0000]">
        <abbr title="Anno Domini" xml:lang="la">A.D.</abbr> 2002
    </nr>
</nr>

There should be a way do simply define custom formats document-wide --
entities?

The validity of the attributes' contents can't of course be covered by a
DTD.

It might be an option, to forget about the proposed element, and instead
adding  the attributes to the existing "var".

--
Christoph Päper

Received on Thursday, 22 August 2002 12:20:50 UTC