Comments on Assigning Media Types to Binary Data in XML

I have produced these comments at the request of the XML Schema WG,
but the WG has not yet seen them, much less endorsed them.
None-the-less they are intended to be comments specifically from the
perspective of XML Schema, and I hope they may be helpful.

1) Architecture

I'm sympathetic to what you're trying to achieve, i.e

 a) Enable e.g. a SOAP message to say "this bit of binary encodes an
    application/png image";
 b) Enable e.g. an XML Schema involved in a WSDL doc. to say "the
    binary allowed here should be marked (per (a) above) as encoding
    one or more of the following media types. . ."

But I'm not entirely happy with the way you've separated these two
goals and introduced a new mechanism for cross-validation between
them.

There are already two relevant mechanisms available in XML Schema
which it seems to me would serve your needs much more
straightforwardly.

The first mechanism is the type-derivation hierarchy; one way to use it
for your purposes goes as follows:

 It makes sense to think about e.g. image/png encoded as base64 as a
 _subtype_ of xs:base64Binary.  It even makes sense to think about
 image/png encoded as base64 as a subtype of image/* encoded as
 base64.

 So why not define a family of types in a basic XML Schema for the
 xmlmime namespace, one rooted at xs:base64Binary and the other,
 parallel, one rooted at xs:hexBinary, as follows:

                     xs:base64Binary
                        |
                   xmlmime:base64Binary
                       /|\
                      / | \
                     /  |  \
                    /   |   \
                   /    |    \
                  /     |     \
                 /      |      \
    xmlmime:image xmlmime:text xmlmime:application    . . . audio, video
          .            /|\           .
          .           / | \          .
          .          /  |  \         .
                    /   |   \
                   /    |    \
                  /     |     \
                 /      |      \
                /       |       \
               /        |        \
              /         |         \
             /          |          \
            /           |           \
xmlmime:text_plain xmlmime:text_xml  xmlmime:text_html

Then you can get rid of xmlmime:expectedMediaType altogether, so that
e.g.

    <xs:complexType name="JPEGPictureType" 
            type="xs:base64Binary"
            xmlmime:expectedMediaType="image/jpeg"/> 

    <xs:element name="JPEGPicture" type="tns:JpegPictureType"/>

becomes simply

    <xs:element name="JPEGPicture" type="xmlmime:image_jpeg"/>

and you don't need xmlmime:contentType if you're confident of schema
processing, but _if_ you want to be explicit, you can use xsi:type
instead, e.g.

<Picture xsi:type="xmlmime:image_png">/aWKKapGGyQ=</Picture>

I realise this doesn't cover the full generality of your proposal, in
so far as you appear to be allowing _any_ kind and number of
'parameters' after the type/subtype.  I am not at all sure that's the
right thing to do, and at the very least I think you need some
argumentation to establish the need for that much generality.

Note furthermore that the general rules for type substitution and
unions can be used to establish arbitary sets of media types, so that
the functionality your now achieve by allowing a list of media types in
is not lost.

----------------

The other mechanism already present in schema reflects the fact that
this problem has been around for a long time.  A solution was already
present in SGML DTDs and carried forward into XML DTDs and XML Schema,
namely NOTATION attributes (see [1] and [2]).  There is an example in
[2] which directly addresses your concerns, and would allow you to
eliminate xmlmime:expectedMediaType and connect xmlmime:contentType
more clearly to the IETF media type definitions via the URL hierarchy
rooted at http://www.iana.org/assignments/media-types/, e.g. by
changing the example from [2] to read

<xs:notation name="jpeg"
  public="image/jpeg"
  system="http://www.iana.org/assignments/media-types/image/png" />

Again, on this approach the XML Schema for the xmlmime namespace would
contain declarations for many useful Notations and simple types
derived from NOTATION, for use in WSDL schemas.

2) Low-level points

If the discussion above has persuaded you to change the current
design, well and good.  If it has not, there are some minor ways in
which the current design could be improved.

2a) Using xs:string is almost certainly not what you want -- that
makes whitespace variation significant, so that e.g. 
  xmlmime:contentType="image/png "
is not the same as
  xmlmime:contentType="image/png"

I would recommend xs:token instead.

2b) Please provide a concrete reference for "IANA media type token".

2c) In example 1, you probably mean

  <xs:restriction base="xmlmime:base64Binary">

2d) In example 4, you probably mean

  <xs:complexType name="JPEGPictureType>
   <xs:complexContent>
    <xs:restriction base="xmlmime:base64Binary"
                    xmlmime:expectedMediaType="image/jpeg"/>
   </xs:complexContent>
 </xs:complexType>

ht

[1] http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#NOTATION
[2] http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#declare-notation
-- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]

Received on Wednesday, 24 November 2004 18:48:50 UTC