W3C home > Mailing lists > Public > xproc-dev@w3.org > January 2011

Re: Baffling error with http-request

From: George Cristian Bina <george@oxygenxml.com>
Date: Wed, 05 Jan 2011 17:05:47 +0200
Message-ID: <4D2488CB.8070007@oxygenxml.com>
To: Tony Rogers <tony@gonk.net>
CC: Romain Deltour <rdeltour@gmail.com>, XProc Dev <xproc-dev@w3.org>
Hi again,

In case this helps, I see that in oXygen we set the source parser class 
Saxon feature to a parser that has the catalog set as entity resolver, 
something like below:

transformerFactory.setAttribute(
             net.sf.saxon.lib.FeatureKeys.SOURCE_PARSER_CLASS, 
CatalogEnabledXMLReader.class.getName());

 From what I looked in the Calabash code it sets only the URI resolver 
on the Saxon configuration and that most probably is not used by Saxon 
when it builds a document and the implementation of p:http-request uses 
Saxon to parse the content, see HttpRequest.java starting with line 872:

if (xmlContentType(partType)) {
   BufferedReader preader = new BufferedReader(new 
InputStreamReader(partStream, charset));
   // Read it as XML
   SAXSource source = new SAXSource(new InputSource(preader));
   DocumentBuilder builder = runtime.getProcessor().newDocumentBuilder();
     tree.addSubtree(builder.build(source));
}

So my guess is that the problem is in Calabash, it does not use the 
entity resolver set when parsing the result of an HTTP request, it 
should either use something like we do in oXygen or find another way to 
set the entity resolver on the Saxon document builder that it uses.

Best Regards,
George
-- 
George Cristian Bina
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

On 1/5/11 3:08 PM, George Cristian Bina wrote:
> Hi Tony,
>
> Have you got this working outside oXygen?
>
> I tested putting the content of that URL
> (http://us.battle.net/sc2/en/forum/40568/) in a local file and then
> passing that through an identity step and it works ok in oXygen, the
> catalog gets called and it resolves the DTD to a local copy.
> However, if I use your sample that reads the content directly from the
> original URL I get the same error as you and the catalog is not used. So
> I am wondering is the http-request step uses the entity resolver set on
> Calabash... from these tests it seems it goes not use it.
>
> Best Regards,
> George
> --
> George Cristian Bina
> <oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
> http://www.oxygenxml.com
>
> On 12/1/10 10:52 PM, Tony Rogers wrote:
>> Hello Romain & Others,
>>
>> So, according to the thread you linked (quoted below), Oxygen should not
>> be giving me the error that I am getting. It says it should pass its
>> catalogs to Calabash automatically.
>>
>> However, the default catalog options appear to be fine. I was still
>> getting the error so I started explicitly adding every XHTML 1.1
>> catalog. I did this both for the Document Type Associations settings and
>> the general XML Catalog settings. I’m still getting the same old
>> annoying error:
>>
>>     “SystemID:
>>
>> /Users/amrogers/Developer/Projects/oXygen_workspace/edu.umd/terpconnect/model/documents/201008/INFM298I/Final
>>
>>     Project/xproc.xpl
>>     Engine name: Calabash XProc
>>     Severity: error
>>     Description: net.sf.saxon.s9api.SaxonApiException:
>>     org.apache.commons.httpclient.HttpException: 404 Not Found for:
>> http://www.w3.org/TR/xhtml11/DTD/xhtml-datatypes-1.mod 404 Not Found
>>     for: http://www.w3.org/TR/xhtml11/DTD/xhtml-datatypes-1.mod”
>>
>>
>> I don’t know what else to do. This is just a simple HTTP request but
>> everything in my pipeline (and this class project) depends on it. And
>> the fact that it mentions a SaxonApiException does not comfort me at all
>> either…
>>
>> —Tony
>>
>>
>> On Nov 29, 2010, at 11:05 AM, Romain Deltour wrote:
>>
>>> See this thread for how to use an XML catalog with Calabash (also when
>>> used with oXygen):
>>> http://markmail.org/thread/che45zm7vge3p5ka
>>>
>>> BR,
>>> Romain.
>>>
>>> Le 29 nov. 10 à 16:43, Inigo Surguy a écrit :
>>>
>>>> Look at the source of http://us.battle.net/sc2/en/forum/ - it has a
>>>> reference to http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd at the top
>>>> in its doctype:
>>>>
>>>> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
>>>> "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
>>>>
>>>> That file, in turn, references xhtml-datatypes-1.mod.
>>>>
>>>> Calabash (or rather, the underlying XML parser) is trying to load the
>>>> doctype to see if there's anything relevant to the XML within it.
>>>>
>>>> Vojtech is right - you should use an XML catalog so a local version of
>>>> the doctype is used. I'm afraid I don't know offhand how to set that
>>>> up with Calabash, but I hope I've made it a bit clearer why it's going
>>>> wrong at least.
>>>>
>>>> Cheers
>>>>
>>>> Inigo
>>>>
>>>> On Mon, Nov 29, 2010 at 3:32 PM, Tony Rogers <tony@gonk.net> wrote:
>>>>>
>>>>> On Nov 29, 2010, at 12:25 AM, Andrew Welch wrote:
>>>>>
>>>>> There's nothing at the end of the url..... Try getting that file
>>>>> yourself.
>>>>>
>>>>> No, that's just it…I requested no files from w3.org anywhere in my
>>>>> pipeline.
>>>>>
>>>>> I have no idea why that URL is popping up in the error.
>>>>>
>>>>
>>>
>>>
>>
Received on Wednesday, 5 January 2011 15:06:51 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 5 January 2011 15:06:52 GMT