RE: schemas too "lax"

The good news is that we seem to agree on a number of important points - that we need a real XML validator and cannot rely solely on XML schema.  Any XML schema validation or not of an instance document is indeterminate WRT actual document conformance, and probably always will be.

 

And, if we can make commercial validators “behave better”, that solves the problem.

 

And, if we can make the schema better, we should try.  I’ll try your suggestion and a couple of other construction ideas I have. Maybe between some more product configuration twisting and schema experiments we can avoid having to debate the merits of the two schema approaches.

 

Regards,

 

                Mike

 

From: Glenn Adams [mailto:glenn@skynav.com] 
Sent: Tuesday, September 03, 2013 12:36 PM
To: Michael Dolan
Cc: TTWG
Subject: Re: schemas too "lax"

 

 

On Tue, Sep 3, 2013 at 11:48 AM, Michael Dolan <mdolan@newtbt.com> wrote:

This issue is about commercial validator behavior and what are the least worst side affects when validating TTML documents with foreign namespace attributes.  This is not about TTML document conformance - the schema is not authoritative. It’s just a tool.  Sorry for any misunderstanding.

 

The issue is only about which side affects you prefer in your schema validation:

 

you present these as an either or choice when in fact they are semantically unrelated; that use of a strict processing mode happens to also detect undefined attributes in one of the TT namespaces is a coincidental side effect

 

1.       Fail to properly validate the use of foreign namespace attributes unless a schema is available; or

 

Since one cannot validate foreign namespace attributes unless a schema is available, this doesn't make much sense.

 

 

2.       Fail to properly reject forbidden TTML attributes (e.g. <tt>), or misspelled permitted TTML attributes.

Actually they aren't forbidden, they are simply not defined. 

 

I prefer #1 over #2, but perhaps it is a matter of personal or application preference.

 

I don't see this as a choice between #1 and #2 but two unrelated issues.

 

As long as there is no implication that the W3C schema is authoritative, then it doesn’t really matter, I guess.

 

The W3C schema is authoritative for the purpose it was designed to meet, and this purpose is documented in 4.1 as:

 

"may be used to validate a superset/subset of conformant TTML Content Document Instances"

 

In my view, you are asking more from this schema than is intended.

 

My primary position is that a content conformance verification tool should be employed, which may use either defined XSD or RNC schemas (or not), and which verifies other conformance requirements than can be expressed in XSD.

 

My secondary position is that the TTML schemas we publish should remain as close as possible to our definition of a TTML Abstract Document Type while at the same time doing what makes sense for use of these schemas in practical tools.

 

Notwithstanding the above, we might be able to improve our schemas (at least XSD) to do a better job of

detecting non-defined attributes in TT namespaces without sacrificing lax processing on non-TT namespaces, e.g., we might be able to use something like:

 

<xs:choice>

  <xs:anyAttribute namespace="http://www.w3.org/XML/1998/namespace" processContents="strict"/>

  <xs:anyAttribute namespace="http://www.w3.org/ns/ttml#metadata" processContents="strict"/>

  <xs:anyAttribute namespace="http://www.w3.org/ns/ttml#parameter" processContents="strict"/>

  <xs:anyAttribute namespace="http://www.w3.org/ns/ttml#styling" processContents="strict"/>

  <xs:anyAttribute namespace="##other" processContents="lax"/>

</xs:choice>

 

I haven't checked this yet, but it seems like it might serve your goals of #2.

 

 

Regards,

 

                Mike

 

From: Andreas Tai [mailto:tai@irt.de] 
Sent: Monday, September 02, 2013 2:50 AM
To: Glenn Adams
Cc: Michael Dolan; TTWG
Subject: Re: schemas too "lax"

 

Hi Glenn, Mike,

I agree with Glenn that the value "lax" best matches the original intention. TTML does not make any provisions about the validation of foreign namespace elements. An XML schema is no requirement for foreign namespace elements. Therefore it is possible that document authors include foreign namespace elements (e.g. some metadata) that are not specified in any schema or written specification. 

It is as well probable that implementations take a TTML subset/derivation (e.g. Subset A) as reference. If this implementation gets a TTML document that take Subset B as a reference where for example some extra metadata elements are defined it will just ignore them and process the document without any errors.

It could be helpful to dig a bit deeper why some commercial parsers do not conform to the XSD standard. If this can not be fixed an additional note may be helpful (e.g. that in some application contexts the processContents attribute of xs:any and xs:anyAttribute may be set to "strict").

Best regards,

Andreas

 
Am 01.09.2013 05:20, schrieb Glenn Adams:

Sorry, hit the Send button prematurely. See more inline: 

 

On Sat, Aug 31, 2013 at 9:17 PM, Glenn Adams <glenn@skynav.com> wrote:

 

On Sat, Aug 31, 2013 at 11:21 AM, Michael Dolan <mdolan@newtbt.com> wrote:

The current schemas don’t reject attributes on elements that:

 

1.       are undefined (e.g. junk:junk=”junk”),

Since this is permissible, I'm not sure whether we want to reject. Keep in mind that TT validity is assessed only after removing foreign namespace elements and attributes. If we reject these by making the schema more restrictive, then it may produce a false negative assessment.

 

The current use of processContents="lax" is defined by XSD 1.0 as follows:

 

"If the item has a uniquely determined declaration available, it must be  <http://www.w3.org/TR/xmlschema-1/#key-vn> ·valid· with respect to that definition, that is,  <http://www.w3.org/TR/xmlschema-1/#key-vn> ·validate· if you can, don't worry if you can't."

 

IMO, this seems the logically correct choice.

2.       typos of valid attributes (e.g. ttm:descr=”t”), and

Unfortunately, this is a limitation of relying upon an XSD schema to solely determine validity. This is only one of a number of validity constraints not expressible using XSD 1.0 schemas. However, this particular invalidity is testable outside of XSD, and the TTV tool does test and report this as an error [1].

 

[1] https://github.com/skynav/ttv/blob/master/tst/resources/com/skynav/ttv/app/ttml10-invalid-metadata-unknown-attributes.xml

3.       valid attributes from TTML namespaces that are forbidden (e.g. <tt ttm:desc=”t” …>).

Again, this is an XSD 1.0 limitation, and requires testing beyond XSD usage. The TTV tool does test and report this as an error [2].

 

[2] https://github.com/skynav/ttv/blob/master/tst/resources/com/skynav/ttv/app/ttml10-invalid-metadata-disallowed-attributes.xml 

 

I understand that this was intentional to enable foreign namespace attributes without requiring their schemas.

 

This negative side effect stems from the use ##other and processContents=”lax” in combination with commercial validators not even trying to validate when the schemaLocation is actually provided (lax is supposed to be “best effort”, not “just forget all about it”).

I just tested use of lax validation of a "junk:junk" attribute using ant's <schemavalidate/> task, which uses the platform's JAXP implementation. Doing a bit of snooping, I'm using:

 

$ java -version

java version "1.6.0_51"

Java(TM) SE Runtime Environment (build 1.6.0_51-b11-457-11M4509)

Java HotSpot(TM) 64-Bit Server VM (build 20.51-b01-457, mixed mode) 

 

$ java com.sun.org.apache.xerces.internal.impl.Version

Xerces-J 2.6.2

 

So, back to "junk:junk", I find that it is indeed being processed with use of lax processing. I've tested it using two techniques, and both work:

 

Option #1 using xsi:schemaLocation in the instance document, e.g.

 

<tt tts:extent="640px 480px" xml:lang="en"

  xmlns="http://www.w3.org/ns/ttml"

  xmlns:tts="http://www.w3.org/ns/ttml#styling"

  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

  xsi:schemaLocation="http://junk.com/junk junk.xsd"

  xmlns:junk="http://junk.com/junk" junk:junk="-3"/>

 

and

 

Option #2 using ant's <schemavalidate/> task's <schema/> child element (having removed the above xsi:schemaLocation embedded in instance document):

 

<schemavalidate fullchecking="true" warn="true">

  <schema namespace="http://www.w3.org/ns/ttml" file="${xsd.schema}"/>

  <schema namespace="http://junk.com/junk" file="${examples.dir}/junk.xsd"/>

  <fileset dir="${examples.dir}">

    <include name="ex1.xml"/>

  </fileset>

</schemavalidate>

 

If don't create a junk.xsd file, then I get the following using Option #1:

 

$ ant validate-example-1

Buildfile: /Users/glenn/work/w3c/ttml/ttml1/spec/build.xml

 

validate-example-1:

[schemavalidate] /Users/glenn/work/w3c/ttml/ttml1/spec/examples/ex1.xml:6:52: schema_reference.4: Failed to read schema document 'junk.xsd', because 1) could not find the document; 2) the document could not be read; 3) the root element of the document is not <xsd:schema>.

 

BUILD SUCCESSFUL

Total time: 0 seconds

 

For Option #2, I get:

 

$ ant validate-example-1

Buildfile: /Users/glenn/work/w3c/ttml/ttml1/spec/build.xml

 

validate-example-1:

 

BUILD FAILED

/Users/glenn/work/w3c/ttml/ttml1/spec/build.xml:128: File not found: /Users/glenn/work/w3c/ttml/ttml1/spec/examples/junk.xsd

 

Total time: 0 seconds

 

If I do create junk.xsd with the following contents:

 

<xs:schema targetNamespace="http://junk.com/junk"

  xml:lang="en" xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:attribute name="junk" type="xs:positiveInteger"/>

</xs:schema>

 

then retry validation, I get:

 

For Option #1:

 

$ ant validate-example-1

Buildfile: /Users/glenn/work/w3c/ttml/ttml1/spec/build.xml

 

validate-example-1:

[schemavalidate] /Users/glenn/work/w3c/ttml/ttml1/spec/examples/ex1.xml:6:52: cvc-minInclusive-valid: Value '-3' is not facet-valid with respect to minInclusive '1' for type 'positiveInteger'.

[schemavalidate] /Users/glenn/work/w3c/ttml/ttml1/spec/examples/ex1.xml:6:52: cvc-attribute.3: The value '-3' of attribute 'junk:junk' on element 'tt' is not valid with respect to its type, 'positiveInteger'.

 

BUILD FAILED

/Users/glenn/work/w3c/ttml/ttml1/spec/build.xml:128: /Users/glenn/work/w3c/ttml/ttml1/spec/examples/ex1.xml is not a valid XML document.

 

Total time: 0 seconds

 

 

For Option #2:

 

$ ant validate-example-1

Buildfile: /Users/glenn/work/w3c/ttml/ttml1/spec/build.xml

 

validate-example-1:

[schemavalidate] /Users/glenn/work/w3c/ttml/ttml1/spec/examples/ex1.xml:4:52: cvc-minInclusive-valid: Value '-3' is not facet-valid with respect to minInclusive '1' for typ\

e 'positiveInteger'.

[schemavalidate] /Users/glenn/work/w3c/ttml/ttml1/spec/examples/ex1.xml:4:52: cvc-attribute.3: The value '-3' of attribute 'junk:junk' on element 'tt' is not valid with res\

pect to its type, 'positiveInteger'.

 

BUILD FAILED

/Users/glenn/work/w3c/ttml/ttml1/spec/build.xml:128: /Users/glenn/work/w3c/ttml/ttml1/spec/examples/ex1.xml is not a valid XML document.

 

Total time: 0 seconds

 

In conclusion, there clearly are commercial validators that do respect lax semantics. Perhaps the one you are using should be replaced with a newer model.

 

 

 

 

 

 

I believe that the negative side effects of this “feature” in practice far outweigh its benefits and would like to change this to “strict”.

 

If a user is going to the trouble to use XML validation with the TTML schema, why wouldn’t they also ensure that they have schemas handy for the foreign namespaces in use?

 

Regards,

 

                Mike

 

Michael A DOLAN

TBT, Inc.    PO Box 190

Del Mar, CA 92014

(m) +1-858-882-7497 <tel:%2B1-858-882-7497> 

mdolan@newtbt.com

 

 

 

 





-- 
------------------------------------------------
Andreas Tai
Production Systems Television IRT - Institut fuer Rundfunktechnik GmbH
R&D Institute of ARD, ZDF, DRadio, ORF and SRG/SSR
Floriansmuehlstrasse 60, D-80939 Munich, Germany
 
Phone: +49 89 32399-389 <tel:%2B49%2089%2032399-389>  | Fax: +49 89 32399-200 <tel:%2B49%2089%2032399-200> 
http: www.irt.de | Email: tai@irt.de
------------------------------------------------
 
registration court&  managing director:
Munich Commercial, RegNo. B 5191
Dr. Klaus Illgner-Fehns
------------------------------------------------

 

Received on Tuesday, 3 September 2013 23:02:55 UTC