Alternative XML serialization for TAI from Rick Jelliffe on 2002-06-24 (www-tag@w3.org from June 2002)

From: Rick Jelliffe <ricko@topologi.com>
Date: Mon, 24 Jun 2002 15:21:06 +1000
To: <www-tag@w3.org>
Message-ID: <04d901c21b3e$f13638a0$4bc8a8c0@AlletteSystems.com>
PSVI
-------
The concerns with the PSVI have been raised endlessly before. At the risk
of boring everyone, some are:

* Raises the complexity of understanding processing to a point a very distant from 
XML's original mascot, the Desparate Perl Hacker, and so cannot expect to be treated 
sympathetically by people who adopted XML because of its simplicity;

* Does not have a standard relationship-preserving serialization as XML; and

* Is disruptive. We have a generation of tools that are now only starting to work 
properly with the XML infoset. The people who are happy with those tools want 
to be able to use them more and in more circumstances, rather than requiring a
new generation of technology. Saying "we need XML Schemas because
XQuery needs types, so we want to supercede XPath1 with XPath2 and
XSLT 2" is hardly likely to inspire with gratitude someone who is happily 
plugging away with XSLT 1.[1]

The first is the nature of the beast, and its answer is tolerance on both sides,
support for plurality in niche uses, and careful excision of unwarranted dependencies. 

TAI Serialization
--------------------

Here is a suggestion for the second: a serialization in XML of the TAI (nee PSVI).  
It is a simple wrapper along the lines of

<!ELEMENT tai ( document, schema, outcomes)>
<!ELEMENT outcomes ( outcome* ) >
<!ATTLIST outcomes
    defaultValid %boolean; #IMPLIED
    defaultValidated %boolean; #IMPLIED
    ....
>
<!ELEMENT outcome ( ??? )>
<!ATTLIST o    t %XPATH-OF-TYPE-RELATIVE-TO-SCHEMA;     
             #REQUIRED
    ii %XPATH-OF-INFORMATION-ITEM-RELATIVE-TO-DOCUMENT;  
            #REQUIRED
   ...
>

Basically, you have the instance to be validated in one element,
the effective schema in another element, and a list of outcomes
linking the two.  The list of outcomes is reduced by not reporting
any defaults.   

The main property of this is that all the relationships in the original
document are unchanged, except for the trivial one of which element
is the root (which can be trivially reconstructed without traversing
the document.)

E.g. 

<tai xmlns="http://www.topologi.com/psvi">

 <document>
  <xxx />
 </document>
 
 <schema xmlns="http://">
  <element name="xxx"><empty/></element>
 </schema>
 
 <outcomes defaultValid="true" defaultValidated="true">
      <o    ii="/*[1]"     t="//*[@name='xxx']"    />
 </outcomes>

</tai>

Of course this is interconvertable with Richards' serialization.  But I
think Richard's particular serialization rather perpetuates us talking at cross-purposes:
the XSD people say "see, there is a serialization"  and the loyal oppositions says 
"it is not the same document, (i.e. its structure has changed) therefore it is no 
serialization at all".  As such, Richard's serialization (and this is no criticism of it 
for its purpose as a dump format) serializes the very problem that is being 
complained of!

My suggested serialization of the Type Augmented Infoset is also an XML Infoset.   

As such it addresses to an extent the third issue: people can continue
to use existing XML tools without requiring any upgrade.

Augmentation considered unfortunate
--------------------------------------------

Accompanying this, I think the TAG/Schema WG should consider (in the 
Schema specs and in its discussion) paying more attention to 
the terminology of "augmenting" the infoset.  We can see that
a TAI can be modelled by several different XML infosets
(Richards, mine, one using external links that even preserves
the root element).  Adding new kinds of information items is
not really "augmenting" the infoset as "extending" it, to my ear
anyway.

Of course, the PSVI is currently defined in terms of augmentation:
but for some of us, it is not the augmentation (adding new information
items) that is the problem but the extension (adding non-XML 
information items)! 

The XML infoset relates to data representation allowing generic manipulation
close to the transfer syntax as strings, links and symbols.  The TAI relates to data representation closer to storage types, allow generic manipulation close to the machine,
and (if you are lucky and your types match those that the XML Schema WG
adopted) manipulation based on value spaces. 

I hope this is helpful.

Cheers
Rick Jelliffe

[1] HXP, the Happy XSLT Plugger
Received on Monday, 24 June 2002 01:08:35 UTC