- From: Stefano Mazzocchi <stefano@apache.org>
- Date: Thu, 15 Jul 2004 00:40:43 -0400
- To: Simile General <www-rdf-dspace@w3.org>
- Message-ID: <40F60ACB.4080000@apache.org>
One of the recurrent tasks for the SIMILE project is the translation of big quantities of XML into RDF. The problem with large quantities of XML is that you can't simply load it into a browser to see how it looks like and, given the nature of XML, normal text-oriented tools such as split, sort, grep, sed, uniq and such don't work so well. Ah, and I have a problem with Perl. On the other hand, general XML processing tools such as XSLT transformers and DOM-based parsers, require too much memory to be able to load comfortably hundreds of Mb of XML. Thus the need for a better tool and here it is: Gadget is an XML inspector (pun intended :-) Find more at http://simile.mit.edu/gadget/index.html including the inspections of the following datasets: - ARTSTore -> 260Mb - Harvard VIA -> 80Mb - Ubi Erat Lupa -> 1Mb - MIT OpenCourseWare -> 3Mb - GCIDE English Dictionary -> 110Mb - Gateway of Educational Material -> 105Mb Comments/feedback/criticism/patches/improvements will be greatly appreciated. Known limitations: 1) namespace support is fake, I assume that the prefix is consistent thruout the corpus, this will go away soon. 2) no special handling of RDF/XML, again something that will have to be addressed soon. Enjoy. -- Stefano Mazzocchi Research Scientist Digital Libraries Research Group Massachusetts Institute of Technology location: E25-131C 77 Massachusetts Ave telephone: +1 (617) 253-1096 Cambridge, MA 02139-4307 email: stefanom at mit . edu ------------------------------------------------------------------- NB: sending with my apache address since apparently this list doesn't like my MIT one.
Received on Thursday, 15 July 2004 00:42:10 UTC