W3C home > Mailing lists > Public > xproc-dev@w3.org > March 2011

Re: Does Calabash p:exec not preserve newline characters?

From: Alex Muir <alex.g.muir@gmail.com>
Date: Wed, 2 Mar 2011 13:26:53 +0000
Message-ID: <AANLkTink1J1FJKfKF56-jMcNwEfhoEPSdQ1UtUSj9ZXQ@mail.gmail.com>
To: XProc Dev <xproc-dev@w3.org>
Cc: Norman Walsh <ndw@nwalsh.com>
Hi,

The following code preserves the newlines as a result of using
ProcessBuilder to call w3m and outputs a nicely formatted text file.

When I add this code into Calabash replacing the p.getInputStream() with the
inputstream "is" passed into
"ProcessOutputReader(InputStream is"  the newlines are then not preserved.

Norm.. any idea why the inputStream in calabash is not preserving newline in
p:exec?

Thanks Much



    ProcessBuilder pb = new ProcessBuilder("w3m", " -dump", "-T",
"text/html", tmpfile.getAbsolutePath());

            Process p = pb.start();

            System.out.println("*******************");
            ByteArrayOutputStream outStream = new ByteArrayOutputStream();
            int i = 0;
            while (true) {
                int LEN = 1024;
                byte b[] = new byte[LEN];

                System.out.println("Available: " +
p.getInputStream().available());
                int nbytes = p.getInputStream().read(b);

                outStream.write(b, 0, nbytes);
                if (nbytes < LEN) {
                    break; // done read
                }


            }

            //System.out.print(outStream.toString());   /* String has
newlines preserved */
            OutputStream outputStream = new FileOutputStream
("/mnt/xslt_volume/i4ContentOutput/temp/out3.txt");
            outStream.writeTo(outputStream);
            outStream.close();




On Wed, Feb 23, 2011 at 4:03 PM, Alex Muir <alex.g.muir@gmail.com> wrote:

> Hi,
>
> With the following w3m exec call, I get back wrapped lines with the newline
> ignored such that all the formatting that w3m adds in either the screen
> output or > out.txt is lost.
>
>            <p:viewport match="/SECDocument/html/content/chunk/html">
>
>               <p:exec name="exexHTML2Text" command="/usr/bin/w3m"
> source-is-xml="false"
>                 result-is-xml="false" wrap-result-lines="true"  />
>
>             </p:viewport>
>
> I tried adjusting the code below to work with a scanner rather than a
> buffered reader and well it seems there are no newlines in the text returned
> from the InputStream "is" as the following code results in one line rather
> than the many that exist in the document.
>
> if (wrapLines) {
>
>                          Scanner s = new Scanner(is);
>
>
>
>  s.useDelimiter(System.getProperty("line.separator"));
>                         // s.useDelimiter("\\r\\n|\\n");
>                          String line;
>
>                          while (s.hasNext()){
>                             line = s.next();
>
>                             if (showLines) {
>                                 System.err.println(line);
>                             }
>                             tree.addStartElement(c_line);
>                             tree.startContent();
>                             tree.addText(line);
>                             tree.addEndElement();
>                             tree.addText("\n");
>
>                         }
>
>
> So I'm wondering if p:exec isn't preserving newlines or just something I'm
> doing wrong, do I need some other option?
>
> Regards
>
> --
> Alex
> -----
> Currently:
> Freelance Software Engineer 6+ yrs exp
>  <http://www.facebook.com/pages/Bafila/125611807494851>
> Previously:
> https://sites.google.com/a/utg.edu.gm/alex/
>
>
> A Bafila, is two rivers flowing together as one:
> http://www.facebook.com/pages/Bafila/125611807494851
>
>
>
>


-- 
Alex
-----
Currently:
Freelance Software Engineer 6+ yrs exp
<http://www.facebook.com/pages/Bafila/125611807494851>
Previously:
https://sites.google.com/a/utg.edu.gm/alex/


A Bafila, is two rivers flowing together as one:
http://www.facebook.com/pages/Bafila/125611807494851
Received on Wednesday, 2 March 2011 13:27:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 2 March 2011 13:27:36 GMT