Re: DTD and Entity Resolver

On 22/10/12 04:43, Mansour Al Akeel wrote:
> I am trying to process many xml files. All of these files have DTD
> that doesn't exist.
> I read Norman's Walsh article, and tried to follow it :
>
> http://norman.walsh.name/2009/07/22/xmlCatalogsandXProc
>
> So I downloaded the resolver from
> http://www.java2s.com/Code/Jar/x/Downloadxmlresolverjar.htm
>
> and setup the configuration. Now I don't know how to specify the
> catalog for the path for the dummy.dtd
>
> My question is, how do I specify the catalog file ??

Hi Mansour,

Catalogs can be specified in two syntaxes, one a text file format
that is called the TR 9401 format, which goes back many years
prior to XML:

    https://www.oasis-open.org/specs/a401.htm

and the XML Catalog specification.

    https://www.oasis-open.org/committees/download.php/14809/xml-catalogs.html

So far as I can remember they're largely isomorphic, i.e., they
have the same features and can be losslessly transformed back
and forth, at least for the common features (I can't rightly
remember if XML Catalogs have any new features not supported
in TR 9401).

I would recommend you use the resolver that is part of the Apache
project, which is where Norm's resolver ended up. That way you can
be assured you'd be getting the latest release. Norm has released
a newer experimental resolver, but I'd still recommend the Apache
one insofar as your project is part of a production system. If you
can afford to experiment, then by all means try out Norm's new one,
which can be found at

    http://norman.walsh.name/2007/02/06/xmlresolver

noting that Norm's resolver doesn't support the TR 9401 syntax. I
still like the older syntax since it's easier to edit by hand, and
I generally have to create my catalogs by hand anyway.

In Java you'll need to set a System property called "xml.catalog.files"
whose value is a *URL* pointing to the catalog file's location
(regardless of which syntax you use). Note that you can't simply
use a file path -- it must be a URL, even if it's a file: URL.

In your catalog file you simply need to use one of the existing
syntaxes to specify the location of your DTD. As described in both the TR 9401
and XML Catalog specifications, you can use either a name (a public
identifier, via PUBLIC) or an address/location (a system identifier, \
via SYSTEM) to refer to a DTD. If you're using a public identifier
you'd need to come up with either a URN or an FPI to name your DTD.
If your document looked like:

    <?xml version="1.0"?>
    <!DOCTYPE doc PUBLIC "-//MansourAlAkeel//DTD My DTD//EN" "dummy.dtd">
    <doc>
    ...

or

    <?xml version="1.0"?>
    <!DOCTYPE doc SYSTEM "dummy.dtd">
    <doc>
    ...

In TR 9401 syntax that'd be something like

    PUBLIC "-//MansourAlAkeel//DTD My DTD//EN" "file:D:/drive/work/project/dummy.dtd"

    SYSTEM "dummy.dtd"  "file:D:/drive/work/project/dummy.dtd"

This would resolve the public identifier to a file on your D drive,
and the system identifier "dummy.dtd" to the same file. The syntax
for the XML Catalog equivalent of the above can be inferred from
the spec.

Hope that's helpful.

Murray

............................................................................
Murray Altheim <murray12 at altheim dot com>                       = =  ===
http://www.altheim.com/murray/                                     ===  ===
SGML Grease Monkey, Banjo Player, Wantanabe Zen Monk               = =  ===

     In the evening
     The rice leaves in the garden
     Rustle in the autumn wind
     That blows through my reed hut.
            -- Minamoto no Tsunenobu

Received on Tuesday, 23 October 2012 18:11:03 UTC