Universality of strict mode

Dear colleagues,

in the current version of the spec, the strict mode supports the 
encoding and decoding of almost all schema-valid documents, except the 
following ones (note that there might be other cases not identified here):
* Documents containing QNames as values, as the strict mode disallows 
the use of preserve.prefixes and the current representation of QNames 
without prefixes is meaningless.
* Documents containing elements featuring both xsi:type and xsi:nil at 
the same time, when the referenced type does not contain an attribute 
wildcard.

The first case unfortunately corresponds to a large number of use cases, 
that we have to support in small embedded devices:
* Exchange of static metadata information: for instance XML Schema or 
WSDL documents.
* Exchange of more dynamic metadata information through messages: for 
instance WS-Discovery and WS-MetadataExchange messages used in the OASIS 
DPWS (Devices Profile for Web Services) specification.
* SOAP 1.2 fault codes and subcodes.

It seems that this could be fixed in the spec without creating backward 
incompatibility (as the feature is currently not supported), in one of 
two ways:
1. Support preserve.prefixes in strict mode: This would require the 
addition of a production containing the NS event in the undeclared 
productions for strict=true. It has the drawback of increasing the event 
code size of the first production in all element grammars, of adding 
unnecessary prefix declarations in the stream (besides the necessary 
ones), and also of requiring the support of dynamic addition of URIs in 
the URI table.
2. Use an alternative representation of QName values in strict mode: the 
proposal is to encode both the URI and the local name as String values 
(and not URI for the first part). This will prevent the required update 
of the URI table, and allow control of caching using the maxCapacity and 
maxLength options of the String value table. I understand that this 
solution makes life a little bit more complex for implementers (as 
appropriate prefix definitions may need to be inserted upstream in a 
streaming API), but does not actually create additional complexity when 
using a typed API or direct encoding/decoding from a data structure. 
This second alternative is our preferred one.

The second case is obviously less critical, as it is not usual. However, 
it seems that it could be fixed by simply requiring the xsi:nil 
attribute to occur before the xsi:type one. In such a case, the 
processor would know, when encountering the xsi:type attribute, whether 
to select the corresponding Type or TypeEmpty grammar. It would also 
have the side effect of simplifying the spec, by removing the case where 
AT(*) can be matched by xsi:nil. This would however create a (small) 
backward incompatibility with the current version of the spec.

Best regards

Antoine Mensch

Received on Friday, 2 July 2010 10:05:26 UTC