possible text on validation

First attempt at more text on validity.
In some sense, more is less! The current text says little, but says what 
is formally required. My text expands on that, with the hope of being 
more useful, but perhaps drifts into being more confused.

I'll follow up, with a shorter version (less explanation) and see what 
people think.


Jeremy


After the following text in
http://www.w3.org/TR/2007/WD-grddl-20070302/#txforms
[[
Therefore, it is suggested that GRDDL transformations
be written so that they perform all expected pre-processing, including
processing of related DTDs, Schemas and namespaces.  Such measure can
be avoided for documents which do not require such pre-processing to
yield an infoset that is faithful. That is, for documents which do not
reference XInclude, DTDs, XML Schemas and so on.</p>
]]

I suggest the following:

[[
<p>
To be more specific concerning XML Validation.
GRDDL aware agents may use either validating or
non-validating XML processors, (see section 5.1 of
[<a class="norm" href="#XML">XML</a>]), or
even some mix of validating and non-validating XML processors.
Thus, document authors should avoid reliance on
an external <a href="http://www.w3.org/TR/2006/REC-xml-20060816/#dt-doctype"
 >DTD subset</a>.
This can be achieved by, for example,
following the rules specified in <a 
href="http://www.w3.org/TR/2006/REC-xml-20060816/#vc-check-rmd">
the standalone document</a> validity constraint.
If all these rules are followed, then adding
a:
<pre>
     <code>standalone="yes"</code>
</pre>
on such XML documents, may reduce the cost of processing
with some GRDDL aware agents.
In practice,
for GRDDL, these rules
can be applied only in part,
depending on knowledge
of the licensed GRDDL transforms.
The two issues most likely to cause problems are:
</p>
<ul>
<li>
The use of a default attribute to set the namespace.
This occurs, for example, with the XHTML DTDs.
It is usually ambiguous to use GRDDL with
a document which references an XHTML DTD,
and does not include
an explicit namespace declaration, for example:
<pre>
      &lt;html&gt;
</pre>
rather than
<pre>
      &lt;html xmlns="http://www.w3.org/1999/xhtml"&gt;
</pre>
See, tests (@@@TODO, TODO, TODO) which explore this case.
</li>
<li>
The XPath Node set is not well-defined for documents
including references to
external entities,
except via the external DTD,
and so neither are the rules for GRDDL.
In particular, a
non-validating XML processor, that does not
read the external DTD subset, if any, cannot reliably
compute GRDDL results, when any external entity reference occurs
in some part of an XML document relevant to GRDDL processing,
e.g. within the value of a <code>rel</code> attribute within
an XHTML family document, or within element content corresponding
to an XPath text node, that is processed as part of a GRDDL transform
of the document.
Even when a reference to an external entity occurs in other places
in an XML document, document authors should have no expectation of
interoperable GRDDL processing. Permitted behaviour for
an XSLT engine, using a non-validating XML processor, is to raise
an unrecoverable error in such a situation.
</li>
</ul>

<p>
In summary, document authors, particularly XHTML document authors,
wishing their documents to be used with GRDDL, are encouraged:
</p>
<ul>
<li>
To always explicitly include the XHTML namespace in an XHTML document,
or an appropriate namespace in an XML document.
</li>
<li>
To avoid use of entity references, except those listed
in <a href=
"http://www.w3.org/TR/2006/REC-xml-20060816/#sec-predefined-ent">
section 4.6</a> of [<a class="norm" href="#XML">XML</a>]
</li>
</ul>

]]


-- 
Hewlett-Packard Limited
registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England

Received on Friday, 30 March 2007 12:26:24 UTC