W3C home > Mailing lists > Public > www-rdf-dspace@w3.org > July 2004

Announcing SIMILE Gadget

From: Stefano Mazzocchi <stefano@apache.org>
Date: Thu, 15 Jul 2004 00:40:43 -0400
Message-ID: <40F60ACB.4080000@apache.org>
To: Simile General <www-rdf-dspace@w3.org>
One of the recurrent tasks for the SIMILE project is the translation of 
big quantities of XML into RDF.

The problem with large quantities of XML is that you can't simply load 
it into a browser to see how it looks like and, given the nature of XML, 
normal text-oriented tools such as split, sort, grep, sed, uniq and such 
don't work so well. Ah, and I have a problem with Perl.

On the other hand, general XML processing tools such as XSLT 
transformers and DOM-based parsers, require too much memory to be able 
to load comfortably hundreds of Mb of XML.

Thus the need for a better tool and here it is: Gadget is an XML 
inspector (pun intended :-)

Find more at

  http://simile.mit.edu/gadget/index.html

including the inspections of the following datasets:

  - ARTSTore -> 260Mb
  - Harvard VIA -> 80Mb
  - Ubi Erat Lupa -> 1Mb
  - MIT OpenCourseWare -> 3Mb
  - GCIDE English Dictionary -> 110Mb
  - Gateway of Educational Material -> 105Mb

Comments/feedback/criticism/patches/improvements will be greatly 
appreciated.

Known limitations:

  1) namespace support is fake, I assume that the prefix is consistent 
thruout the corpus, this will go away soon.

  2) no special handling of RDF/XML, again something that will have to 
be addressed soon.

Enjoy.

-- 
Stefano Mazzocchi
Research Scientist                 Digital Libraries Research Group
Massachusetts Institute of Technology            location: E25-131C
77 Massachusetts Ave                   telephone: +1 (617) 253-1096
Cambridge, MA  02139-4307              email: stefanom at mit . edu
-------------------------------------------------------------------

NB: sending with my apache address since apparently this list doesn't 
like my MIT one.


Received on Thursday, 15 July 2004 00:42:10 EDT

This archive was generated by hypermail pre-2.1.9 : Thursday, 15 July 2004 00:42:14 EDT