p:load and DTD validation

Morning all

I have a strange behaviour occurring with p:load and was hoping someone could suggest a solution

Background - I have a large set of documents to process. These documents are defined with a DTD that declares fixed attributes to fake several namespaces*. Therefore, I can’t load them without referencing the DTD.  I have calabash set up with a resolver. It’s working just fine.

The following extract from one of my test scripts works just fine:

<p:option name=“document-uri” required=“true”/>
<p:load>
	<p:with-option href=“document-uri”/>
</p:load>

The main intent of the script it to iterate over a set of directories, filter out the files I need and them. So, I created a couple of steps to wrap this up. The output is something like:

<articles>
	<c:file xmlns:c="http://www.w3.org/ns/xproc-step"
		name="file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/13835718/07570001/1300168X/main.xml"/>
	<c:file xmlns:c="http://www.w3.org/ns/xproc-step"
		name="file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/13835718/07570001/13001824/main.xml"/>
	<c:file xmlns:c="http://www.w3.org/ns/xproc-step"
		name="file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/13835718/07570001/13001848/main.xml"/>
	<c:file xmlns:c="http://www.w3.org/ns/xproc-step"
		name="file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/13835718/07570001/13001873/main.xml”/>

	…
</articles>


I’ve tried two approaches to loading this. Firstly, I just used the resulting document in the script, and iterated over it:

	<p:for-each name="iterate-articles">
		
		<p:iteration-source select="//c:file"/>
		
		<p:output port="result" primary="true" sequence="true">
			<p:pipe port="result" step="load-article"/>
		</p:output>
		
		<p:load name="load-article" dtd-validate="true">
			<p:with-option name="href" select=“/c:file/@name"/>
		</p:load>
		
		
	</p:for-each>
	
That failed (failure below). So, I wrote the file listing out to disk, loaded it as an input and processed it with the same loop. Failed again. 

Having run calabash with debug on, I can see that it’s failing to load the DTD because it isn’t resolving it:

[nicg@newt-449 british-library]$ java -Dxml.catalog.files=/Users/nicg/Projects/british-library/dtd/catalog.xml -classpath "/usr/local/share/java/calabash.jar:/usr/local/share/java/xml-resolver-1.2.jar" com.xmlcalabash.drivers.Main -U org.xmlresolver.Resolver -E org.xmlresolver.Resolver -D --input source=files.xml load-files-test.xpl  
com.xmlcalabash.core.XProcException: XProc error err:XD0011
	at com.xmlcalabash.core.XProcException.dynamicError(XProcException.java:176)
	at com.xmlcalabash.util.XProcURIResolver.parse(XProcURIResolver.java:199)
	at com.xmlcalabash.core.XProcRuntime.parse(XProcRuntime.java:838)
	at com.xmlcalabash.util.DefaultXMLCalabashConfigurer.loadDocument(DefaultXMLCalabashConfigurer.java:45)
	at com.xmlcalabash.library.Load.run(Load.java:67)
	at com.xmlcalabash.runtime.XAtomicStep.run(Unknown Source)
	at com.xmlcalabash.runtime.XForEach.run(XForEach.java:116)
	at com.xmlcalabash.runtime.XPipeline.doRun(XPipeline.java:235)
	at com.xmlcalabash.runtime.XPipeline.run(XPipeline.java:135)
	at com.xmlcalabash.drivers.Main.run(Main.java:326)
	at com.xmlcalabash.drivers.Main.run(Main.java:96)
	at com.xmlcalabash.drivers.Main.main(Main.java:80)
Caused by: net.sf.saxon.s9api.SaxonApiException: I/O error reported by XML parser processing file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/15709639/184401PB/13003968/main.xml: /Users/nicg/Projects/british-library/supplied/inputs/BLC00430/15709639/184401PB/13003968/art510.dtd (No such file or directory)
	at net.sf.saxon.s9api.DocumentBuilder.build(DocumentBuilder.java:380)
	at com.xmlcalabash.util.XProcURIResolver.parse(XProcURIResolver.java:191)
	... 10 more
Caused by: net.sf.saxon.trans.XPathException: I/O error reported by XML parser processing file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/15709639/184401PB/13003968/main.xml: /Users/nicg/Projects/british-library/supplied/inputs/BLC00430/15709639/184401PB/13003968/art510.dtd (No such file or directory)
	at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:427)
	at net.sf.saxon.event.Sender.send(Sender.java:143)
	at net.sf.saxon.Configuration.buildDocument(Configuration.java:3348)
	at net.sf.saxon.s9api.DocumentBuilder.build(DocumentBuilder.java:377)
	... 11 more
Caused by: java.io.FileNotFoundException: /Users/nicg/Projects/british-library/supplied/inputs/BLC00430/15709639/184401PB/13003968/art510.dtd (No such file or directory)
	at java.io.FileInputStream.open(Native Method)
	at java.io.FileInputStream.<init>(FileInputStream.java:146)
	at java.io.FileInputStream.<init>(FileInputStream.java:101)
	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:616)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1290)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1257)
	at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:260)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1162)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1050)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:938)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:846)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:775)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:628)
	at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:396)
	... 14 more
Aug 25, 2014 8:16:43 AM com.xmlcalabash.util.DefaultXProcMessageListener error
SEVERE: err:XC0011:Could not load file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/15709639/184401PB/13003968/main.xml (file:/Users/nicg/Projects/british-library/load-files-test.xpl) dtd-validate=true
Aug 25, 2014 8:16:43 AM com.xmlcalabash.drivers.Main error
SEVERE: Unknown error
com.xmlcalabash.core.XProcException: Could not load file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/15709639/184401PB/13003968/main.xml (file:/Users/nicg/Projects/british-library/load-files-test.xpl) dtd-validate=true
	at com.xmlcalabash.core.XProcException.stepError(XProcException.java:184)
	at com.xmlcalabash.library.Load.run(Load.java:77)
	at com.xmlcalabash.runtime.XAtomicStep.run(Unknown Source)
	at com.xmlcalabash.runtime.XForEach.run(XForEach.java:116)
	at com.xmlcalabash.runtime.XPipeline.doRun(XPipeline.java:235)
	at com.xmlcalabash.runtime.XPipeline.run(XPipeline.java:135)
	at com.xmlcalabash.drivers.Main.run(Main.java:326)
	at com.xmlcalabash.drivers.Main.run(Main.java:96)
	at com.xmlcalabash.drivers.Main.main(Main.java:80)


Other tests have shown me that the resolver is working. The right DTD is referenced in the catalog. 

So… anyone got any suggestions as to where I go next?

thanks

nic

* yeah, I know. Not my idea of sensible.
--
Corbas Consulting / @CorbasLtd
Digital Publishing Consultancy and Training
http://www.corbas.co.uk, +44 (0)7718 906817/+44 (0)1273 930765

Received on Monday, 25 August 2014 07:21:44 UTC