- From: Conal Tuohy <ctuohy@unimelb.edu.au>
- Date: Thu, 18 Feb 2010 08:31:26 +1100 (EST)
- To: Stefanie Haupt <st.haupt@gmail.com>
- Cc: xproc-dev@w3.org
Hi Stephanie I think you've misinterpreted p:http-request/@encoding, actually. If you know your HTML files use windows-1252, I suggest you http-request them as binary files (which you will receive as base64-encoded bytestreams), and then pass the result to p:unescape-markup, specifying a charset at that time. Incidentally, Calabash uses tagsoup to parse HTML, so you may well not need html tidy at all. Cheers Con > Hi all, > > I have some messy encoded HTML data which I want to process in a first > step with html tidy and then do some more operations controlled by a > xproc pipeline. Since it's more than one file I understand I use > p:http-request in combination with file protocol (since it's local > data). > So I thought of using try/catch but the try group part either is ignored > or never true as the catch part is invoked for all files. Can you please > have a look and tell me what I'm doing wrong here? > > I'm using Calabash from within <oXygen/> XML Editor 11.1, build > 2009121712 on Linux (Ubuntu). > > <p:try> > <p:group> > <p:http-request encoding="windows-1252"/> > <p:exec command="/usr/bin/tidy" source-is-xml="false" > result-is-xml="true" wrap-result-lines="false" > encoding="windows-1252"> > <p:with-option name="args" select="'--quiet yes --show-warnings no > --output-xml yes --bare yes --doctype omit --numeric-entities yes > --char-encoding win1252'"/> > </p:exec> > <p:exec name="iconv" command="/usr/bin/iconv" result-is-xml="true" > source-is-xml="true" wrap-result-lines="false" > encoding="windows-1252"> > <p:with-option name="args" select="'-f WINDOWS-1252 -t UTF-8'"/> > </p:exec> > </p:group> > > <p:catch> > <p:http-request/> > <p:exec command="/usr/bin/tidy" source-is-xml="false" > result-is-xml="true" wrap-result-lines="false"> > <p:with-option name="args" select="'--quiet yes --show-warnings > no --output-xml yes --bare yes --doctype omit --numeric-entities yes > --char-encoding utf8'"/> > </p:exec> > </p:catch> > </p:try> > > Many thanks for your help! > Stefanie > > > >
Received on Thursday, 18 February 2010 13:06:14 UTC