- From: Paul Tyson <phtyson@sbcglobal.net>
- Date: Thu, 22 Nov 2007 09:00:55 -0600
- To: Valentin Zacharias <Zacharias@fzi.de>
- CC: semantic-web@w3c.org
A colleague of mine wrote a gleaner in Java using HWPF of the POI project (http://poi.apache.org/) for reading Word documents and the Jena toolkit (http://jena.sourceforge.net) for creating RDF graphs. In our case we decided to put the gleanable metadata in tables with a specific structure and keywords. Other options would be to put metadata in custom document properties or fields. You will want a system for embedding and reading metadata that is minimally disruptive to authors. This approach is simple but has the disadvantage of coupling the gleaner code to the specific authoring practices, and also to a specific RDFS vocabulary. It would not be suitable for gleaning arbitrary, complex metadata from a wide variety of documents. --Paul Valentin Zacharias wrote: >Hi ! > >Are there ideas/concepts/standards/tools on how to embedd RDF data into >Microsoft Office Documentes (like RDFa, just ppt/xls/doc/pptx/xlsx/docx >instead of html). > >(Yea, sounds awful - but in intranet scenarios a lot of the content is MS >Office documents..and almost none of it in the new XML based formats >pptx/xlsx/docx...) > >thanks > >valentin > > >
Received on Friday, 23 November 2007 12:20:35 UTC