- From: Andrew Sales <andrew@andrewsales.com>
- Date: Wed, 9 Oct 2024 16:47:33 +0100
- To: "Piez, Wendell A. (Fed)" <wendell.piez@nist.gov>
- Cc: XProc Dev <xproc-dev@w3.org>
- Message-ID: <CAGD-QPzh4Bt1uWDxCicdS59CcBi6EmXCVooYh=sxO7n=MjscnQ@mail.gmail.com>
Hello, Great stuff - anything that helps with converting OOXML is a boon. As a Schematronist with a re-kindled interest in XProc (wannabe XProcker??), I've taken the liberty of adding links to Matthieu and Wendell's excellent work to the Awesome Schematron repository[1]. Just let me know if how they are referred to there should be changed in any way. Thanks, Andrew [1] https://github.com/Schematron/awesome-schematron?tab=readme-ov-file#applications On Wed, 9 Oct 2024 at 15:31, Piez, Wendell A. (Fed) <wendell.piez@nist.gov> wrote: > Matthieu: > > > > You make an excellent point about running Schematron over the XProc. I am > doing the same in my project at https://github.com/usnistgov/oscal-xproc3 > - saving me many, many hours debugging. > > > > The Schematrons are also applied to the XProc files under CI/CD, i.e. > whenever they are pushed into the repository. Using Morgana under Github > actions. Another pipeline under CI/CD runs XSpec test suites in XProc 3.0.. > Everything is public domain / open source. > > > > So yes! There are lots of good ideas out there … I’ve lifted most of mine > from elsewhere. 😊 > > > > Although I’m not sure I’ll be using the “./” trick except *in extremis*…. > > > > Can I have a Schematron to warn me when I am sending email to you or Geert > when I mean to write to the list? (Don’t look now, the AIs are coming.) > > > > Regards, Wendell > > > > *From:* Matthieu RICAUD-DUSSARGET <m.ricaud-dussarget@lefebvre-dalloz.fr> > *Sent:* Wednesday, October 9, 2024 3:28 AM > *To:* Piez, Wendell A. (Fed) <wendell.piez@nist.gov> > *Subject:* RE: Extract XML from docx file with xproc 3.0 p:unarchive > > > > Hi Wendel, > > > > Thanks for your response. Yes I guess I’ll had a try catch on the whole > process, especially the XSLT which might crash depending on the word > content. > > > > Using href="./{expr}" looks a bit strange, but I would have seen my > mistake before ;) > > While coding in XSLT within Oxygen, I get a warning when using a variable > name without $, this is a schematron control. > > I think a good IDE for developing xproc might help avoiding such typos. > > I did develop a schematron to control XSLT quality ( > https://github.com/mricaud/xslt-quality) maybe doing the same with Xproc > might help ! > > > > Thanks for your feedback and good ideas ! > > > > > > *Cheers* > > *Matthieu Ricaud* > > *De :* Piez, Wendell A. (Fed) <wendell.piez@nist.gov> > *Envoyé :* mardi 8 octobre 2024 23:55 > *À :* Matthieu RICAUD-DUSSARGET <m.ricaud-dussarget@lefebvre-dalloz.fr> > *Objet :* RE: Extract XML from docx file with xproc 3.0 p:unarchive > > > > [Mail EXTERNE]: Vérifiez bien l’expéditeur de l’email avant de cliquer > sur des liens ou pièces-jointes! > > > > Matthieu -- oops! > > > > I wrote also to suggest try/catch for you, but it appears the email went > only to Geert. (Sorry Geert.) > > > > Probably not the last time – and of course it wouldn’t have solved this > problem, only helped to mitigate similar problems caused by actual errors > in inputs, not errors in the code. > > > > For that matter I have also been bitten by the fallback to read the XProc > at path “”. > > > > One thing that occurred to me would be to prepend any href to a relative > path: > > > > <p:load href="./{expr}"/> > > > > And this does error out (in Morgana) if ‘expr’ evaluates to the empty > string. > > > > But part of me says this is bad form (it feels strange and awkward), and I > should just rely on runtime messaging to expose the values for debugging. > > > > Comments? > > > > Regards, Wendell > > > > *From:* Matthieu RICAUD-DUSSARGET <m.ricaud-dussarget@lefebvre-dalloz.fr> > *Sent:* Tuesday, October 8, 2024 5:00 PM > *To:* list.mu@c-moria.com; xproc-dev@w3.org > *Subject:* RE: Extract XML from docx file with xproc 3.0 p:unarchive > > > > Hi all, > > > > Thanks for your responses ! > > > > Christophe : yes the only docx file I have for the moment in the directory > is valid : I can open it with 7-zip and get the xml document inside > > It’s actually a .doc which I have converted with my MS Word (« save as > docx ») > > > > Geert, thanks for all details, yes I might be interested with your script > (though I have about 4 millions .docx to convert !) > > I’ll also have a look to Aspose. > > > > About my pipeline, thanks to your help I find the problem which was .. so > dummy ! > > I forgot a $ before the docx.uri variable reference in <p:load > href="{docx.uri}" … /> > > => The href was empty, so I guess the fallback is to take the current > xproc file as default and raides err:XC0085 "Cannot process document with > media-type 'application/xproc+xml' as a ZIP archive" > > Then when I added explicit binding to understand, then I specified (and > force) the content-type … > > I finally get an the err:XC0085 "Error processing ZIP archive: zip END > header not found" which made me more confused ! > > > > Sorry for that guys ! > > > > As for reminder for later my really simple xpl that works (display the xml > inside the docx) : > > > > <p:declare-step xmlns:p=http://www.w3.org/ns/xproc > <https://urldefense.com/v3/__http:/www.w3.org/ns/xproc__;!!KEc074MNZw!bdGqb4wbmFbglhcs25hQR8f7Qnyrvexr4Xx8DlazA8WdiuL_0jOFiHxG4XGuGua1DtncTQO3DfW0dKtW16SAa0_sJTYxUk95g6FGku-UMs0$> > > xmlns:c=http://www.w3.org/ns/xproc-step > <https://urldefense.com/v3/__http:/www.w3.org/ns/xproc-step__;!!KEc074MNZw!bdGqb4wbmFbglhcs25hQR8f7Qnyrvexr4Xx8DlazA8WdiuL_0jOFiHxG4XGuGua1DtncTQO3DfW0dKtW16SAa0_sJTYxUk95g6FGz5c7UoE$> > > xmlns:xs=http://www.w3.org/2001/XMLSchema > <https://urldefense.com/v3/__http:/www.w3.org/2001/XMLSchema__;!!KEc074MNZw!bdGqb4wbmFbglhcs25hQR8f7Qnyrvexr4Xx8DlazA8WdiuL_0jOFiHxG4XGuGua1DtncTQO3DfW0dKtW16SAa0_sJTYxUk95g6FGru9TTl8$> > > version="3.0"> > > > > <p:input port="source" sequence="true"/> > > <p:output port="result" sequence="true"/> > > > > <p:option name="input-dir" select="resolve-uri('../../test/input-word', > static-base-uri())" as="xs:string"/> > > > > <p:directory-list path="{$input-dir}" include-filter="\.docx$"/> > > > > <p:for-each> > > <p:with-input select="//c:file"/> > > <p:variable name="docx.uri" select="c:file/base-uri(.)"/> > > <p:identity message="Processing {$docx.uri}"/> > > <p:load href="{$docx.uri}"/> > > <p:unarchive format="zip" include-filter="word/document\.xml"/> > > </p:for-each> > > > > </p:declare-step> > > > > *Cheers,* > > *Matthieu Ricaud* > > >
Received on Thursday, 10 October 2024 10:21:27 UTC