- From: Christophe Marchand <cmarchand@clever-age.com>
- Date: Tue, 8 Oct 2024 19:59:09 +0200
- To: xproc-dev@w3.org
- Message-ID: <99f7dc93-d140-44dc-877d-d0e306da3b02@clever-age.com>
I suppose your .docx file is a correct zip file that you are able to unzip with unzip command ? Christophe Le 08/10/2024 à 19:30, Matthieu RICAUD-DUSSARGET a écrit : > > Hi, > > I have to convert a big amount of docx files into a specific XML format. > > I wrote the XSLT that convert de myFile.docx!/word/document.xml after > extracting it manually. > > I’d like to use Xproc to loop on a full directory of docx fildes to > extract each document.xml apply the xslt and validate the result. > > After looping on each files of the directory i do : > > <p:variable name="docx.uri" select="ancestor::c:directory/base-uri(.) > || c:file/base-uri(.)"/> > > <p:load href="{docx.uri}" name="load" content-type=" > application/vnd.openxmlformats-officedocument.wordprocessingml.document > "/> > > <p:unarchive> > > <p:with-input> > > <p:pipe step="load" port="result"/> > > </p:with-input> > > </p:unarchive> > > At this point (p:unarchive) I get a XC0085 error : Error processing > ZIP archive: zip END header not found > > I tried different content-type like application/zip, but still have > the same error. > > Does that mean it’s not possible to extract .docx archive juste like a > zip archive ? > > I was confident xproc could do that ? > > Or did I missed something here ? > > I’m using MorganaXProc-III 1.2.3 > > By the way most of the files I have are .doc not .docx, so if > extraction has a solution from docx, I’ll have to first convert them > to docx (it seems there’s a python script for it, I guess I can’t do > it from xproc ?) > > Thanks in advance for your help, > > Cheers, > > Matthieu Ricaud-Dussarget >
Received on Tuesday, 8 October 2024 17:59:23 UTC