RE: VALID_MARKUP local DTD catalog

Hi,

Yes, it is right solution having a catatalog with the common DTDs used
by the checker.

As Abel and I pointed in a study about third parties [1], JHOVE uses a
SAX parser too and include several DTDs as internal resources in benefit
of efficiency. (None of mobile DTDs are included).

To fullfil this, JHOVE uses an adhoc DTDMapper which we should extend in
order to add new DTDs.

On the other hand, in JHOVE is possible to specified the SAX parser
implementation [2] but we don't know

If the CatalogResolver can be set in a external manner avoiding modify
JHOVE source code. (e.g we haven't access to the parser object to do
this method call:
reader.setProperty("http://apache.org/xml/properties/internal/entity-res
olver", resolver); 
)

[1] http://docs.google.com/Doc?id=dhbw7zt7_0f8w6bq
[2] http://hul.harvard.edu/jhove/xml-hul.html

Regards,


Miguel
________________________________________
De: public-mobileok-checker-request@w3.org
[mailto:public-mobileok-checker-request@w3.org] En nombre de Jo Rabin
Enviado el: martes, 29 de mayo de 2007 12:19
Para: Ruadhan O'Donoghue; public-mobileok-checker@w3.org
Asunto: RE: VALID_MARKUP local DTD catalog

Good point.

The test is "If the document is an HTML document and it fails to
validate according to its given DOCTYPE , FAIL" 
So we need a reasonable catalogue of known html and html dtds. We don't
need any non-html dtds and I agree that we should not go fetch random
dtds.

Jo
________________________________________
From: public-mobileok-checker-request@w3.org
[mailto:public-mobileok-checker-request@w3.org] On Behalf Of Ruadhan
O'Donoghue
Sent: 29 May 2007 11:11
To: public-mobileok-checker@w3.org
Subject: VALID_MARKUP local DTD catalog

Hi,

I'm not sure if anyone has been looking at this, but for validating the
original document, we are going to need a local catalog of DTDs. In
ready.mobi we use the Xerces CatalogResolver class to map between
DOCTYPEs and local copies of the DTDs.


Any thoughts on the following?

(1) We need to validate the document against its stated DOCTYPE and
XHTML Basic 1.1 (and maybe 1.2). So the set of DTDs that we wish to
store locally should include
XHTML Basic*, MP*, HTML* 

Are there others? And do we store variations like the Openwave XHTML
DTDs which turn up quite a bit? Perhaps we should compile an exhaustive
list of the DOCTYPES that we will recognise.


(2) The behaviour when a DOCTYPE specifies an obscure DTD not in the
catalog - fetching a DTD from the wild is not a good idea, so we should
just report an "unrecognised DOCTYPE - will not try to validate"
error... Is this the desired behaviour?


Ruadhan

Received on Tuesday, 29 May 2007 11:58:38 UTC