- From: <list.mu@c-moria.com>
- Date: Tue, 8 Oct 2024 20:53:48 +0200
- To: "'Matthieu RICAUD-DUSSARGET'" <m.ricaud-dussarget@lefebvre-dalloz.fr>, <xproc-dev@w3.org>
- Message-ID: <014601db19b3$69c2b510$3d481f30$@c-moria.com>
You don't need to connect the ports in XProc 3. So your example could be as simple as <p:variable name="docx.uri" select="ancestor::c:directory/base-uri(.) || c:file/base-uri(.)"/> <p:load href="{docx.uri}" /> <p:unarchive/> Though, if all you need is the document.xml from the docx package, you could put that one in an include filter so you get only one document out of the archive into the next step <p:variable name="docx.uri" select="ancestor::c:directory/base-uri(.) || c:file/base-uri(.)"/> <p:load href="{docx.uri}" /> <p:unarchive format="zip"> <p:with-option name="include-filter" select="('word/document\.xml')"/> </p:unarchive> Good luck Geert From: Matthieu RICAUD-DUSSARGET <m.ricaud-dussarget@lefebvre-dalloz.fr> Sent: Tuesday, 8 October 2024 19:31 To: xproc-dev@w3.org Subject: Extract XML from docx file with xproc 3.0 p:unarchive Hi, I have to convert a big amount of docx files into a specific XML format. I wrote the XSLT that convert de myFile.docx!/word/document.xml after extracting it manually. I'd like to use Xproc to loop on a full directory of docx fildes to extract each document.xml apply the xslt and validate the result. After looping on each files of the directory i do : <p:variable name="docx.uri" select="ancestor::c:directory/base-uri(.) || c:file/base-uri(.)"/> <p:load href="{docx.uri}" name="load" content-type=" application/vnd.openxmlformats-officedocument.wordprocessingml.document "/> <p:unarchive> <p:with-input> <p:pipe step="load" port="result"/> </p:with-input> </p:unarchive> At this point (p:unarchive) I get a XC0085 error : Error processing ZIP archive: zip END header not found I tried different content-type like application/zip, but still have the same error. Does that mean it's not possible to extract .docx archive juste like a zip archive ? I was confident xproc could do that ? Or did I missed something here ? I'm using MorganaXProc-III 1.2.3 By the way most of the files I have are .doc not .docx, so if extraction has a solution from docx, I'll have to first convert them to docx (it seems there's a python script for it, I guess I can't do it from xproc ?) Thanks in advance for your help, Cheers, Matthieu Ricaud-Dussarget
Received on Tuesday, 8 October 2024 18:53:56 UTC