Lexical representation, Internationalization, Documentation, Versioning from bob_buxton@uk.ibm.com on 1999-08-04 (www-xml-schema-comments@w3.org from July to September 1999)

From: <bob_buxton@uk.ibm.com>
Date: Wed, 4 Aug 1999 18:53:27 +0100
To: www-xml-schema-comments@w3.org, w3c-xml-schema-ig@w3.org
Message-ID: <802567C3.00625BDC.00@d06mta03.portsmouth.uk.ibm.com>
I realize that the authors have deliberately excluded some of these issues
from the current drafts however I feel that we at least need to establish
some signposts towards the road ahead !    Comments are based on the
6-May-1999 draft.

Lexical representation, Internationalization:

In my view the schema serves (at least) two distinct purposes:

   For a generic schema aware tool such as an XML editor the schema should
   allow the editor to provide considerable added value compared to a
   merely DTD aware application.  For example, if it sees  a date or number
   based data type it can accept input in the format appropriate for the
   user's locale and preferences and convert it into the appropriate
   representation for storage in the XML document.
   For user defined data types and enumerations there needs to be
   additional information in the schema or  some form of side-information
   to allow the editor to know what alternatives can be used.   You might
   wish to validate that a post code is 5 or 9 digits for a US user whilst
   it is a mixture of letters and numbers for a British user and to giver a
   French user the option of entering Verte/Rouge in a colour choice
   selection.

   For the writer of an application which uses a validating parser to
   interpret an XML document it removes the need for the application itself
   to validate the contents of the element or attribute.    The parser is
   not aware of the date and decimal point conventions of the document's
   creator so it can only validate that the data fits within the declared
   lexical representation, it can not be expected to interpret ambiguous
   date or number formats.   The application does not want to know what
   options the user was given at document creation time - it will expect to
   handle colours Green/Red even if the user originally entered
   Verte/Rouge.

I think there is a  need for a Map function to  equate the values of an
enumeration with another enumeration to allow for aliases.  Mapping an
enumeration onto one with an ordered base type  would then allow for
comparison between enumerated values.

Example: A size enumeration

<datatype name="sizeEnglish">
<basetype name="string"/>
<enumeration>
 <literal>Extra Small</literal>
 <literal>Small</literal>
 <literal>Large</literal>
 <literal>Extra Large</literal>
</enumeration></datatype>

<datatype name="sizeEngAbbr">
<basetype name="string"/>
<enumeration>
 <literal>XS</literal>
 <literal>Sl</literal>
 <literal>L</literal>
 <literal>XL</literal>
</enumeration></datatype>

<datatype name="sizeFrench">
<basetype name="string"/>
<enumeration>
 <literal>Tres Petite</literal>
 <literal>Petite</literal>
 <literal>Grande/literal>
 <literal>Tres Grande</literal>
</enumeration></datatype>

<datatype name="sizeCode">
<basetype name="integer"/>
<enumeration>
 <literal>10</literal>
 <literal>20</literal>
 <literal>30</literal>
 <literal>40</literal>
</enumeration></datatype>

We need a syntax that allows us to say that the four enumerations are
equivalent with one being the one to returned to a application by a parser
whilst the others can be used as alternatives.  There would also need to be
a way for an XML editor application to know that a French user would wish
to see the sizeFrench list in a pull down selection list whilst an English
speaker would wish to see sizeEnglish and/or sizeEngAbbr.

Similarly there is a need for an XML editor application to be able to
choose the appropriate lexical representation out of several possibilities
for the user's locale and preferences.

Documentation:

There is a need for several different types of documentation associated
with a  schema and for the documentation to exist in the national languages
of the users of the schema.  It is probably desirable that some of the
documentation be kept in the schema itself (especially where translation is
not expected to be a requirement) but it should also be possible to  keep
documentation in separate documents and that there should be a
straightforward way of linking to the documentation in the appropriate
language.    I don't regard coding a URI for each of the French, Spanish,
German ... documents in each  elementType definition as straightforward -
especially if I want to add Japanese documentation at a later date.

As for the types of documentation  required I can see the need for the
following:

   Design/programming information for use by the schema designers and those
   writing applications based on the schema (using an HTML subset)

   Short text description,  a one line plain text title for an
   element/attribute that an application could use  in place of the
   element/attribute name as a more meaningful label.

   Long text description, one or paragraphs formatted using a subset of
   HTML tag language to be displayed as a result of a context sensitive
   help button. It might include links to even more detailed information.

   Icons that a application might wish to use to represent the
   element/attribute in, for example, a tool palette.

All documentation is, of course, optional



Versioning:

Currently there is a single version='M.n' attribute on the schema element
which does not give any indication as to what may have changed since the
previous version of the schema.   I would like to see a more formal change
control methodology introduced to be able to mark up a schema and show what
was changed by who, when and why.

This would have value for human readers of the schema avoiding the need to
find and compare the old version of the schema but is much more important
when you have two applications communicating and they might understand
different versions of the schema.   To prevent the down level application
being sent data that he can't understand the higher level application may
wish to send data  that fits the previous level of the schema.  This is
easier to achieve using change flags than  by attempting to compare two
schemas at run time.

We would need to be able to mark what was new in the schema, what was
deleted and changes by  a delete of the old and add of the new.

A possible syntax might be:

<schema version="1.2" ...>
<changehistory>
<version name="1.1" by="Me" date="1999-08-04">Add the panda
element</version>
<version name="1.2" by="AN other" date="2000-01-01">Remove the widget
attribute from  the panda element</version>
</changehistory>
...
<change version="1.1" type="add">
<elementType name="panda">
...
<change version="1.2" type="delete">
<attrDecl name="widget"/>
</change>
</elementType>
</change>


Background:


CPSM is the System Management component of IBM's CICS Transaction Server.
It collects data from CICS transaction servers running in a  network and
uses it for automated operations, passing to application programs, and
formatting to give human operators a single point of control.
Since the network is potentially global it is not realistic or desirable to
mandate that they are all running at the same product level,  hence the
interest in internationalization and version control.

Currently the data is transported as simple data structures and we provide
C, Cobol, PL/I and Assembler mappings for applications to use.   I am
looking to define schema for the data structures and then generate the
various programming  language mappings from the schema.  I would be
interested in hearing of any existing work in this area.

Bob Buxton
CPSM development, MP 208, Hursley
Ext 248193, External 01962-818193
Received on Wednesday, 4 August 1999 13:54:42 UTC