Validating mixed Content

While doing some research for an earlier project, it occurred to me that we 
currently cannot restrict the content contained within an element of mixed 
content type. We can restrict the attributes and elements contained within 
it, but not the text. Is there someway (perhaps through regular expressions 
or otherwise) of restricting the kind of content we can contain here?

For instance, something like:

<xs:complexType mixed="true" mixedType="xs:nonNegativeInteger">
   <xs:any processContents="lax" namespace="myNamespace" />
</xs:complexType>

This wouldn't be that useful as it wouldn't allow for much variation. 
However, a more powerful regular expression might be useful that, for 
instance, allows us to specify that the content should only fall within the 
ASCII character set range, e.g.:

<xs:complexType mixed="true" mixedPattern="[&#00-&#7F]*">
   <xs:any processContents="lax" namespace="myNamespace" />
</xs:complexType>

Or maybe that we cannot have any punctuation characters:

<xs:complexType mixed="true" mixedPattern="[^!.?,:;]">
   <xs:any processContents="lax" namespace="myNamespace" />
</xs:complexType>

I know that in most scenarios, the XML vocabulary chosen could just be more 
structured to allow for this, but in something like markup it would be 
useful to be able to specify some restrictions on the type of content that 
can be inside mixed content. In a mathematical language, for instance, you 
may want to be able to markup specific elements of the equation, but the 
equations must follow a fixed format. E.g.:

<xs:complexType mixed="true" mixedPattern="([0-9a-zA-Z]+( 
&#5E?[0-9a-zA-Z]+)? [+-/*%] )* [0-9a-zA-Z]+ = ...">
   <xs:any processContents="lax" namespace="mathNamespace" />
</xs:complexType>

My regular expression started to get more and more complicated so I 
stopped. But do you get my point?  I want an element to only be able to 
contain a fixed format string but for this string to be able to contain 
additional markup, perhaps to indicate tooltip or emphasis-style elements.

What do you think? I suspect it would be hellish to implement... The only 
way I could think of doing it in XML Schema 1.0 (and I'm not sure if that 
would work) is to actually specify the element tags inside the regular 
expression and make the content CDATA content.

-- 
Andrew Polshaw
Editor and Author for Wrox Press Ltd

Received on Monday, 16 December 2002 09:53:35 UTC