Re: p:load and DTD validation

So, after far too much debugging I discovered an annoying interaction between the shell script running this and my catalog set up which was causing the problem
/me hides in the corner

nic


On 25 Aug 2014, at 08:21, Nic Gibson <nicg@corbas.co.uk> wrote:

> Morning all
> 
> I have a strange behaviour occurring with p:load and was hoping someone could suggest a solution
> 
> Background - I have a large set of documents to process. These documents are defined with a DTD that declares fixed attributes to fake several namespaces*. Therefore, I can’t load them without referencing the DTD.  I have calabash set up with a resolver. It’s working just fine.
> 
> The following extract from one of my test scripts works just fine:
> 
> <p:option name=“document-uri” required=“true”/>
> <p:load>
> 	<p:with-option href=“document-uri”/>
> </p:load>
> 
> The main intent of the script it to iterate over a set of directories, filter out the files I need and them. So, I created a couple of steps to wrap this up. The output is something like:
> 
> <articles>
> 	<c:file xmlns:c="http://www.w3.org/ns/xproc-step"
> 		name="file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/13835718/07570001/1300168X/main.xml"/>
> 	<c:file xmlns:c="http://www.w3.org/ns/xproc-step"
> 		name="file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/13835718/07570001/13001824/main.xml"/>
> 	<c:file xmlns:c="http://www.w3.org/ns/xproc-step"
> 		name="file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/13835718/07570001/13001848/main.xml"/>
> 	<c:file xmlns:c="http://www.w3.org/ns/xproc-step"
> 		name="file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/13835718/07570001/13001873/main.xml”/>
> 
> 	…
> </articles>
> 
> 
> I’ve tried two approaches to loading this. Firstly, I just used the resulting document in the script, and iterated over it:
> 
> 	<p:for-each name="iterate-articles">
> 		
> 		<p:iteration-source select="//c:file"/>
> 		
> 		<p:output port="result" primary="true" sequence="true">
> 			<p:pipe port="result" step="load-article"/>
> 		</p:output>
> 		
> 		<p:load name="load-article" dtd-validate="true">
> 			<p:with-option name="href" select=“/c:file/@name"/>
> 		</p:load>
> 		
> 		
> 	</p:for-each>
> 	
> That failed (failure below). So, I wrote the file listing out to disk, loaded it as an input and processed it with the same loop. Failed again. 
> 
> Having run calabash with debug on, I can see that it’s failing to load the DTD because it isn’t resolving it:
> 
> [nicg@newt-449 british-library]$ java -Dxml.catalog.files=/Users/nicg/Projects/british-library/dtd/catalog.xml -classpath "/usr/local/share/java/calabash.jar:/usr/local/share/java/xml-resolver-1.2.jar" com.xmlcalabash.drivers.Main -U org.xmlresolver.Resolver -E org.xmlresolver.Resolver -D --input source=files.xml load-files-test.xpl  
> com.xmlcalabash.core.XProcException: XProc error err:XD0011
> 	at com.xmlcalabash.core.XProcException.dynamicError(XProcException.java:176)
> 	at com.xmlcalabash.util.XProcURIResolver.parse(XProcURIResolver.java:199)
> 	at com.xmlcalabash.core.XProcRuntime.parse(XProcRuntime.java:838)
> 	at com.xmlcalabash.util.DefaultXMLCalabashConfigurer.loadDocument(DefaultXMLCalabashConfigurer.java:45)
> 	at com.xmlcalabash.library.Load.run(Load.java:67)
> 	at com.xmlcalabash.runtime.XAtomicStep.run(Unknown Source)
> 	at com.xmlcalabash.runtime.XForEach.run(XForEach.java:116)
> 	at com.xmlcalabash.runtime.XPipeline.doRun(XPipeline.java:235)
> 	at com.xmlcalabash.runtime.XPipeline.run(XPipeline.java:135)
> 	at com.xmlcalabash.drivers.Main.run(Main.java:326)
> 	at com.xmlcalabash.drivers.Main.run(Main.java:96)
> 	at com.xmlcalabash.drivers.Main.main(Main.java:80)
> Caused by: net.sf.saxon.s9api.SaxonApiException: I/O error reported by XML parser processing file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/15709639/184401PB/13003968/main.xml: /Users/nicg/Projects/british-library/supplied/inputs/BLC00430/15709639/184401PB/13003968/art510.dtd (No such file or directory)
> 	at net.sf.saxon.s9api.DocumentBuilder.build(DocumentBuilder.java:380)
> 	at com.xmlcalabash.util.XProcURIResolver.parse(XProcURIResolver.java:191)
> 	... 10 more
> Caused by: net.sf.saxon.trans.XPathException: I/O error reported by XML parser processing file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/15709639/184401PB/13003968/main.xml: /Users/nicg/Projects/british-library/supplied/inputs/BLC00430/15709639/184401PB/13003968/art510.dtd (No such file or directory)
> 	at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:427)
> 	at net.sf.saxon.event.Sender.send(Sender.java:143)
> 	at net.sf.saxon.Configuration.buildDocument(Configuration.java:3348)
> 	at net.sf.saxon.s9api.DocumentBuilder.build(DocumentBuilder.java:377)
> 	... 11 more
> Caused by: java.io.FileNotFoundException: /Users/nicg/Projects/british-library/supplied/inputs/BLC00430/15709639/184401PB/13003968/art510.dtd (No such file or directory)
> 	at java.io.FileInputStream.open(Native Method)
> 	at java.io.FileInputStream.<init>(FileInputStream.java:146)
> 	at java.io.FileInputStream.<init>(FileInputStream.java:101)
> 	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
> 	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
> 	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:616)
> 	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1290)
> 	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1257)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:260)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1162)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1050)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:938)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
> 	at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:116)
> 	at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
> 	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:846)
> 	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:775)
> 	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
> 	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
> 	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:628)
> 	at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:396)
> 	... 14 more
> Aug 25, 2014 8:16:43 AM com.xmlcalabash.util.DefaultXProcMessageListener error
> SEVERE: err:XC0011:Could not load file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/15709639/184401PB/13003968/main.xml (file:/Users/nicg/Projects/british-library/load-files-test.xpl) dtd-validate=true
> Aug 25, 2014 8:16:43 AM com.xmlcalabash.drivers.Main error
> SEVERE: Unknown error
> com.xmlcalabash.core.XProcException: Could not load file:/Users/nicg/Projects/british-library/supplied/inputs/BLC00430/15709639/184401PB/13003968/main.xml (file:/Users/nicg/Projects/british-library/load-files-test.xpl) dtd-validate=true
> 	at com.xmlcalabash.core.XProcException.stepError(XProcException.java:184)
> 	at com.xmlcalabash.library.Load.run(Load.java:77)
> 	at com.xmlcalabash.runtime.XAtomicStep.run(Unknown Source)
> 	at com.xmlcalabash.runtime.XForEach.run(XForEach.java:116)
> 	at com.xmlcalabash.runtime.XPipeline.doRun(XPipeline.java:235)
> 	at com.xmlcalabash.runtime.XPipeline.run(XPipeline.java:135)
> 	at com.xmlcalabash.drivers.Main.run(Main.java:326)
> 	at com.xmlcalabash.drivers.Main.run(Main.java:96)
> 	at com.xmlcalabash.drivers.Main.main(Main.java:80)
> 
> 
> Other tests have shown me that the resolver is working. The right DTD is referenced in the catalog. 
> 
> So… anyone got any suggestions as to where I go next?
> 
> thanks
> 
> nic
> 
> * yeah, I know. Not my idea of sensible.
> --
> Corbas Consulting / @CorbasLtd
> Digital Publishing Consultancy and Training
> http://www.corbas.co.uk, +44 (0)7718 906817/+44 (0)1273 930765
> 
> 

--
Corbas Consulting / @CorbasLtd
Digital Publishing Consultancy and Training
http://www.corbas.co.uk, +44 (0)7718 906817/+44 (0)1273 930765

Received on Wednesday, 3 September 2014 10:19:01 UTC