W3C home > Mailing lists > Public > xmlschema-dev@w3.org > October 2010

Re: Re: How to allow html tags and restrict number of characters‏

From: Cheney, Edward A SSG RES USAR <austin.cheney@us.army.mil>
Date: Fri, 08 Oct 2010 06:19:42 -0500
To: Vani Gupta <vani_1974@yahoo.co.uk>
Cc: xmlschema-dev@w3.org
Message-ID: <f733a60517035.4caeb7fe@us.army.mil>

XSD does have a length restriction if the content contained by an element is a string literal and not additional tags to evaluate.

<xsd:element name="myInputExample">
        <xsd:length value="50000"/>

The length restriction applies to simple types only, which means that it does not apply to elements with child elements.  In that case you will have to use an expression provided from schema v1.1.

The problem is how to functionally limit input into a string literal when an instance document is evaluated.  I am not sure this can be done with Schema.  This actually occurs in HTML using the textarea element, as exemplified by http://prettydiff.com/ which can take XML as input into a textarea and output different XML into a different textarea and not impact the containing document.  The HTML textarea element, however, can take XML code because it is functionally limited by the processing application and not by reference of the HTML language.  If there are not functional limitations associated with your processing application, that actually evaluates the schema instance, then you may have to escape characters in the input to prevent collision against the evaluation of the containing document.

Escaping characters without altering character length is improbable unless you know what the bounds are upon the input.  In your case conserving character length is essential since you are operating with a character length limitation.  Because of that limitation you cannot use standard XML character entities since they would significantly alter the character length of input.  I have encountered this problem before in my XML beautification algorithm since it tolerates syntactically nested elements, such as:

<c:example attribute="<strong>HTML output here</strong>"/>

In that case I had to be able to ensure that the example element was treated as a single literal so as to prevent interference with the containing document and also maintain the integrity of that element as a single element without breaking on each less than character.  I accomplished this by converting each angle bracket into a square bracket if it is contained by quotes and those quotes are resident inside a single tag.  I then built an array in my evaluating application that built a list of character indexes where each single character transformation occurred and what the conversion was.  I set that array aside until I am no longer concerned with evaluation of tag integrity within my string literal at which point I undo the conversion only in the exact character indexes I specified in the array.  This way there is no collision is arbitrary square brackets are supplied in the string literal prior to my transformation.

Additionally, you could not use standard XML character entities because if character entities are supplied with the input then a processor would not be able to accurately convert the input back into its original form after processing or storage.  If length were irrelevant I would recommend inventing your own character entity syntax for temporary use internal to your processing logic so as to prevent unintended collision with supplied standard entities regarding characters you are concerned with temporarily transforming.


Austin Cheney, CISSP
Received on Friday, 8 October 2010 11:20:17 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:56:18 UTC