XML, namespaces, extensibility and validation from olivier Thereaux on 2007-04-24 (www-tag@w3.org from April 2007)

From: olivier Thereaux <ot@w3.org>
Date: Tue, 24 Apr 2007 15:02:21 +0900
To: www-tag@w3.org
Cc: Tim Berners-Lee <timbl@w3.org>, Liam Quin <liam@w3.org>, Michael Sperberg-McQueen <cmsmcq@w3.org>, Norman Walsh <Norman.Walsh@Sun.COM>, James Clark <jjc@jclark.com>
Message-Id: <F2776BB8-321A-4C12-B0A2-5117A93AA49C@w3.org>
Dear all,

In the context of the development of the markup validator, I  
sometimes see questions related to XML validation,  extensibility and  
namespaces, and witness some frustration. I am, by far, not an expert  
on these topics, but I think many of you are: would you mind me  
picking the collective mind of the tag mailing-list to get a better  
idea of the road the markup validator should take to be more useful  
for more people, while obviously remaining true to the specs it's  
checking against?




* Namespaces

Namespaces are the main tool for extensibility. But namespaces are an  
extra layer on top of XML, and XML validity has no concept of  
namespaces, as far as I know.

In the words of James Clark:
[[
It would of course be very useful to have namespace-aware validation:  
to be able to associate each URI used in a universal name with some  
sort of schema (similar to a DTD) and be able to validate a document  
using multiple such URIs with respect to the schemas for all of the  
URIs. The XML Namespaces Recommendation does not provide this. The  
reason is is that DTDs have many other problems and missing features  
in addition to lack of namespace-awareness. So the plan is to come up  
with a new schema mechanism that fixes the problems with DTDs and, as  
part of this, provides namespace-awareness. This work is being done  
by the XML Schema WG.
]] -- http://www.jclark.com/xml/xmlns.htm

And yet at the same time people are bothered by the fact that the W3C  
Markup validator for not being namespace aware:
Here: http://www.w3.org/Bugs/Public/show_bug.cgi?id=4475
Or Here: http://www.w3.org/2007/04/23-rdfa-minutes.html
In both cases, the frustration comes from people dealing with xml- 
based markup languages, defining them normatively with DTDs, aware of  
the limitations of DTDs regarding xmlns, but who still consider the  
fact that the validating parser of the Markup Validator is in error  
when saying "sorry, xmlns:foo is not a legit attribute for this  
document type".

I've read a lot on the issue and am still puzzled. To me  the core  
issue is a limitation of XML (and in particular XML DTDs), and what I  
read here [1] or here [2] seem to show I'm not the only one  
recognizing the issue, and makes me hope that the XML 2.0 could make  
this problem a thing of the past. That said, [3] seems to show that  
the issue is about as old as XML is...

[1] http://www.w3.org/2001/tag/issues.html#xmlSW-10
[2] http://norman.walsh.name/2004/11/10/xml20
[3] http://www.stylusstudio.com/xmldev/199808/bythread.html#00249

In the meantime, as developer of one of the tools doing XML  
validation (among others), I would like to find ways to make a better  
and more useful tool, but I am not sure how to tackle this. Pointers  
to specs, ideas, thoughts, welcome.



* "Ignore what you do not recognize"

This has been raised in the past, and I don't think it has been  
solved either.
Quoting Michael Sperberg-McQueen a while ago:
[[
Some versions, at least, of the HTML spec have said, as I understand
it:  valid HTML documents are those that conform to the attached
schema (whether in DTD syntax, XML Schema, RelaxNG, or ...).
  => http://www.w3.org/TR/xhtml1/#strict
But a conforming processor should (or must?) also ignore tags for
elements it doesn't understand.
  => http://www.w3.org/TR/xhtml1/#uaconf
]]

So, in the context of a validation / conformance checking tool, which  
should it be:

option 1) Validity according to the schema is part of conformance. A  
conformance checker should _obviously_ report usage of elements and  
attributes not pertaining to the schema as being incorrect.
(somehow that's what I think http://www.w3.org/TR/xhtml1/#well-formed  
means)

option 2) Conformance says: "ignore what you do not recognize". A  
conformance checker should be a conforming agent too, and thus should  
ignore usage of attributes and elements not pertaining to the  
declared schema.

To me, 2) is actually using twisted logic. I don't think a  
conformance checker is in the same class of products as other user  
agents. While it is useful for normal agents to just shrug and ignore  
stuff it doesn't know, a conformance checker just passing by unknown  
elements/attributes and saying "bah" just defeats the purpose of  
conformance checking.

What might be acceptable would be a 2bis) saying that "unknown  
attributes and elements *in foreign namespaces*"  may be ignored by  
the conformance checker. But this would be a serious change in a very  
popular validation tool, so I'd rather it be set into some spec  
first, rather than arbitrarily code it in.

If you have any pointer to specs, ideas or suggestions, I'm taker.


Thank you very much.
-- 
olivier
Received on Tuesday, 24 April 2007 06:02:25 UTC