stripping remote namespace elements from parse tree

Dear opensp developers,

As some of you know, opensp is used as the base sgml/xml parser in  
the W3C markup validator.
http://validator.w3.org/

One of the issues I am facing, as one of the developers of that  
application, is the thorny subject of validation (against XML DTDs)  
and XML namespaces. Since the XML namespaces specification came after  
the first XML recommendation (and long after SGML), and since the  
namespaces specification did not address the question of validating  
XML with namespaces, the question "how can one build a document that  
is valid (wrt a DTD) and uses namespaces to define foreign elements  
and attributes", the answer is, generally "you don't".

e.g: http://www.rpbourret.com/xml/NamespacesFAQ.htm#dtd_6


This has been the source of a lot of frustration, and, I assume, is  
one of the reasons why a lot of contemporary XML-based language  
design doesn't use DTDs but one of the more recent schema languages.  
It is also a pity, because it means that a number of XML-based  
languages can not at the same time retain the concept of validity,  
and be extended with namespaces. This has made the extensibility of  
XHTML difficult (making its name a painful irony), and made the  
validation of SVG (commonly used in combination with other languages/ 
namespaces) quasi-impossible.

However much I am told that the solution "simply is to make DTD  
validation namespace-aware", I still have no clue how that would be  
done. Another solution however, which seems to be favored by e.g  
TimBL [1], would be to ignore, while parsing a document tree against  
a DTD, anything not in the current root namespace.

[1] http://www.w3.org/DesignIssues/Architecture.html


I am thinking of implementing such a thing in the markup validator.  
After all, the validator does know what the root namespace is, and  
uses the opensp API (through Bjoern's excellent perl wrapper). The  
idea I could think of implementing is to just ignore any message from  
the parser when between a StartElementEvent and EndElementEvent for  
an element not in the root namespace. Ditto for issues with  
attributes not in the root namespace. Make that an option in the  
validator, and people who really want to can extend their XHTML, SVG,  
etc. documents without forfeiting the possibility of checking the  
validity of their core document.

However, before I do that, I'm curious about the following. Apologies  
if these are FAQs, I could not find the info anywhere yet:

* is there a mechanism in opensp to ignore elements in foreign  
namespaces?
* would there be any interest for such a mechanism?
* were there any other possible solutions to the question of DTD  
validation / namespaces thought of before, that I should be aware of?


Thank you.
-- 
olivier Thereaux - W3C - http://www.w3.org/People/olivier/
W3C Open Source Software: http://www.w3.org/Status

Received on Tuesday, 9 October 2007 12:54:42 UTC