Suggestion: Microparsing support in XML Schema (long) from Anders W. Tell on 2000-05-10 (www-xml-schema-comments@w3.org from April to June 2000)

From: Anders W. Tell <anderst@toolsmiths.se>
Date: Wed, 10 May 2000 09:47:27 +0200
To: WWW XML Schema Comment <www-xml-schema-comments@w3.org>
CC: xml-dev@xml.org
Message-ID: <3919140E.78DBA33F@toolsmiths.se>
Problem:
----------
A common phenomena which now and then surfaces in the markup
world is the occurrence of what some authors calls "Micro-parsing".
This is the situation when Schema writers define that a XML attribute
should contain structured information and therefore creates a need
for customized parsers, hence the above term.

Two examples are
XPath expression in XSL:  match="/cars/car[@name='volvo']"
Path in SVG:  <path d="M 100 100 L 140 100 L 120 140 z"/>

Is this not a paradox? A markup language which cannot be used for markup
anymore?
Of course all markup languages have a limit and maybe XML's limit have
been
reached.

Why:
What are the reasons for encoding complex information in a single
attribute ?
The reason I have seen are sofar are:
* compression, produces smaller XML streams  (SVG paths,...)
* usage of attribute strings for readability (XPath expressions.,,,)
* usage of attribute strings for compactness (XPath expressions,...)
*...

The following suggestions is an attempt to "internalize" these encoding
scenario's,
to capture as much as possible of the encoding information inside XML
Schema's
instead of relying on externally created and managed documentation.

Another side effect of the proposal is that its now possible to have DOM
access
to structured attributes as if they where XML element encoded.

For Grove enthusiasts it is also possible to view (with a little effort
;)) attributes
as hierarchical node's.

So here goes...

Solution:
- - - - - - - - - - - - - - - - - - -
First a few initial short definitions:

* Encoding "Stereotype" <=> something that should be encoded,
  is defined by a information model which may be defined in terms of
  one or more information items (nodes/properties,...).

* Encoding "Form" <=> principles for how nodes/properties in an
Stereotype's
  information model must be encoded as a strings or XML elements.
  (the following suggestion implies two forms, one for attribute
encoding
   and one for XML element encoding)

* "Attribute-Micro-Parser" <=> A software artifact which encodes and
decodes XML attribute strings to/from XML elements.

- - - - - - - - - - - - - - - - - - -
* Add new XML Schema data type which represents "MicroParsed" attribute
values.
  Make it a subtype of "string" with all its facets.
  Schema writers can now derive their own MicroParsed data types, one
for each
  stereotype they want to encode as attribute.

* In this new data type add a reference to a complexType. This
referenced schema
   defines how to encode the contents (information model) of the
attribute string
   ("attribute form") as an XML element tree ("element form").
Note: Maybe this reference should be a new facet for the string data
type.
Note: With this design it is possible to encode the same stereotype as
either
  XML attribute string or XML element tree in documents.

* In this new data type add a reference to the attribute's "form
specification",
   i.e. where to find more information on how to construct attribute
strings
  from the underlying information model.

* All available information in the stereotypes information model MUST be
encoded in
   the "element form" and the information encoded in the "attribute
form" MUST be a
   "subset" of the information encoded in the element form's encoding
   (similar to applying a grove plan before encoding as attribute).
  The "element form" is considered a "complete" encoding form
   (contains all information in the information model).

* Information set:
Add an extra optional property to attribute information item.
 property: "parsed"  sequence<element-info-item[zero or one]>

* Recommend that all Schema authors first create an information model
for the stereotype then create encoding "form"s for the primary XML
element
encoding form and last the corresponding XML attribute strings form.

* DOM framework
Create a new software artifact called  "DOMAttributeMicroParser"

interface DOMAttributeMicroParser {
    readonly  attribute string  name;
    readonly  attribute string  namespace;

    /* parse attribute string and create the corresponding element tree
*/
    long            parse(in DOMAttribute from, out DOMElement to);

    /* Traverse the element tree and create corresponding attribute
string expression */
    long            construct(in DOMElement from, out DOMAttribute to);
};


* DOM framework [Optional]
Create a subclass to DOM Attribute called "DOMParsedAttribute"

interface DOMParsedAttribute : DOMAttribute {
     attribute DOMElement  fParsed;  /* parsed attribute */
};


All comment are welcome.

Best Regards
Anders W. Tell
--
/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
/  Financial Toolsmiths AB  /
/  Anders W. Tell           /
/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
Received on Wednesday, 10 May 2000 03:46:29 UTC