- From: Matthieu RICAUD-DUSSARGET <m.ricaud-dussarget@lefebvre-dalloz.fr>
- Date: Tue, 8 Oct 2024 17:30:41 +0000
- To: "xproc-dev@w3.org" <xproc-dev@w3.org>
- Message-ID: <PR3PR03MB6475A1BC91DF7643E52B2338E67E2@PR3PR03MB6475.eurprd03.prod.outlook.com>
Hi,
I have to convert a big amount of docx files into a specific XML format.
I wrote the XSLT that convert de myFile.docx!/word/document.xml after extracting it manually.
I'd like to use Xproc to loop on a full directory of docx fildes to extract each document.xml apply the xslt and validate the result.
After looping on each files of the directory i do :
<p:variable name="docx.uri" select="ancestor::c:directory/base-uri(.) || c:file/base-uri(.)"/>
<p:load href="{docx.uri}" name="load" content-type=" application/vnd.openxmlformats-officedocument.wordprocessingml.document "/>
<p:unarchive>
<p:with-input>
<p:pipe step="load" port="result"/>
</p:with-input>
</p:unarchive>
At this point (p:unarchive) I get a XC0085 error : Error processing ZIP archive: zip END header not found
I tried different content-type like application/zip, but still have the same error.
Does that mean it's not possible to extract .docx archive juste like a zip archive ?
I was confident xproc could do that ?
Or did I missed something here ?
I'm using MorganaXProc-III 1.2.3
By the way most of the files I have are .doc not .docx, so if extraction has a solution from docx, I'll have to first convert them to docx (it seems there's a python script for it, I guess I can't do it from xproc ?)
Thanks in advance for your help,
Cheers,
Matthieu Ricaud-Dussarget
Received on Tuesday, 8 October 2024 17:30:50 UTC