Re: Simplifying XML Schema (mixed content?)

Phil Wadler writes
<snip>

> 2. Simple types vs. complex types
> ---------------------------------
>
> One lack of orthogonality in XML Schema Part 1: Structures is that
> simple types and complex types cannot always be used in the same
> way. We suggest that simple types be permitted wherever complex types
> are.
>
> This would result in a number of simplifications:
>
> * The 'content' attribute (which specifies 'mixed', 'element-only',
> or 'empty') may be eliminated.
>
> * Rather than `mixed', which allows pcdata to appear anywhere, one can
> specify exactly where pcdata is allowed.
>
> * One can specify that the presence of simple types is optional.
>
> * This corresponds more directly to SGML and XML DTDs, which indicate
> mixed content by explicitly mentioning PCDATA.
>
> For example, we can now specify a LETTER element that consists
> of a SALUTATION element, followed by some text, followed by a
> CLOSING element.
>
>
>
>
>
>
>
> This is more precise than using `mixed', and, because it lists
> the components in the order they appear, it is easier to read.
>
> Of course, types must be parseable and serializable.  Usually, values
> of primitive type can be space separated, the exception being strings
> (which may themselves contain spaces).  Therefore, it is not allowed
> to specify two successive occurrences of primitive type if one or both
> of them is a string.
>
<snip>

This appears to be related to issue 7 in the Schema Working Group Issue List
http://www.w3.org/XML/Group/xmlschema-current/issues.html#richerMixed
There a mail by Paul Prescod is referenced, which appears to follow the same
lines as suggested in this plea for simplification
http://lists.w3.org/Archives/Public/www-xml-schema-comments/1999AprJun/0015.html

The resolution in
http://www.w3.org/XML/Group/xmlschema-current/issues.html#richerMixed
talks about the "pernicious mixed content problem in SGML", whereas
Paul Prescod says: "There sprung up a superstition that these mixed content
models were evil
when the truth is that the particular bug in SGML was the real problem."

On the other hand, the referenced XDR spec says
(http://www.ltg.ed.ac.uk/~ht/XMLData-Reduced.htm
search for "mixed content") "There is no way to constrain an element to have
either element content or text content, the source of the SGML mixed content
problem."
suggesting that there exists indeed a fundamental problem, not just a bug.

I am confused. What IS the problem with mixed content? Is it so fundamental that
it
justifies the rather complicated but restrictive specificational options
XML-schema
has chosen now?

Peter Fankhauser

Received on Thursday, 25 May 2000 08:45:27 UTC