- From: Bernhard Haslhofer <bernhard.haslhofer@univie.ac.at>
- Date: Mon, 28 Apr 2008 22:26:45 +0200
- To: Tim Berners-Lee <timbl@w3.org>
- Cc: bernhard.schandl@univie.ac.at, SW-forum Web <semantic-web@w3.org>, MacKenzie Smith <kenzie@MIT.EDU>
Hi Tim, thanks for your response. It is great to get some comments on our work :-) On Apr 27, 2008, at 2:51 AM, Tim Berners-Lee wrote: > Bernhardt and Bernhardt, > > I saw your article chumped on the SWIG IRC channel. > I had been looking for almost exactly what you have produced, to > get into dspace and eprints systems. > > 1. Is it not practical to make a general gateway which, by > including the whole URI of the OAI endpoint in the URI in the > linked data mapping, I could use the gateway to access LOD about > any OAI resource in the world? > > I wonder whether it is the fact that you have to cache most of the > site. Why is that, for speed, or because you can't get all the > links you want by asking the OAI server, and so so yo have to have > a copy of the data as a graph? Could those aspects of the data > which can be got from an OAI fetch be proxied at LOD request time, > and not cached permanently, to save memory? > We use the caching approach for two main reasons: 1.) we also want to provide selective access to metadata via SPARQL. With OAI-PMH you can only fetch certain records or a whole list of records and then apply any selection criteria on the client side - which is not an optimal solution. Of course, the OAI-PMH never meant to be a protocol which supports structured queries - for that, DL community has other protocols. But anyway, fact is that many small- and medium-size instiutions have the OAI-PMH and will most likely never provide any DL query protocol. Thereforefore we decided to take what is there and simply cache metadata 2.) for linking data, we must analyse the source data set (i.e. the metadata coming from an OAI-PMH provider) and the target data set. For each linked target data source we must fetch the source data set at least once from the OAI-PMH data provider - so we already have the data at the client side and can also keep them stored. Furthermore, we must exend existing OAI-PMH metadata with links to other data - and we must store these links somewhere. But you are right, a simple and probably also more scalable solution would be a gateway appraoch where a component exposes "proper" URIs for each item, translates the incoming HTTP requests to OAI-PMH specific HTTP requests, and simply uses some stylesheet to transform the data into RDF. e.g.: http://oai.lcoa1.loc.gov/resources/item/ oai:lcoa1.loc.gov:loc.gdc/gcfr -> http://memory.loc.gov/cgi-bin/ oai2_0?verb=GetRecord&identifier=oai:lcoa1.loc.gov:loc.gdc/gcfr. 0018_0163&metadataPrefix=oai_dc However, the idea of OAI2LOD is also to show that HTTP, URI, RDF (and SPARQL) can cover most of the functionality the OAI-PMH provides (+ also provide means for structuring queries). So if anybody can convince DL solution providers to adopt these technologies directly (by installing something like D2RQ on top of their RDB), there wouldn't be any need for gateways or replicating solutions anymore - but that might take some time :-) > One interesting issue is the fact that the instance of OAI2LOD > needs to be started with some background data. That makes an > automatic gateway difficult, unless there is some way of extracting > the data from the OAI server itself. The current version (0.2) is rather a demo than a production solution - after startup it simply fetches and caches the data in memory. Futher versions will have some (hopefully scalable) triple store in the backend. > > 2. Assuming now that you do have to run a separate OAI2LOD instance > for each OAI endpoint, do you think it would a good idea to make > the convention that the URI > > oai:lcoa1.loc.gov:loc.gdc/gcfr.0018_0163 > > is served from a server at a DNS ("oai" dot (the DNS name in the > OAI URI))? Like > > http://oai.lcoa1.loc.gov/resources/item/oai:lcoa1.loc.gov:loc.gdc/ > gcfr. > > or even maybe like > > http://oai.lcoa1.loc.gov/item/loc.gdc/gcfr. That is actually the goal - institutions that are willing to expose their data as linked data should install an OAI2LOD instance in their own system environment and redirect/rewrite URLs so they fit the scheme you describe above. > > One could build into clients a mapping redirection, or in the short > term configure a generic proxy to do the redirection and configure > existing browsers to use that proxy for the oai: scheme. It would > only happen when following an oai: link, as after that the client > would be in the world of http: names. > > > 3. The use of "sameAs" to link the same work in different > repositories. Is that really what you mean? It allows any > properties of one URI to be associated to the other URI. So you > can't have any properties about the work which only apply to that > repository, like curation, persistence, etc > I have created a sameWorkAs to get around this problem, in the > generic resource ontology > http://www.w3.org/2006/gen/ont#sameWorkAs > SameWorkAs should allow one to transfer properties of the generic > resource, like copyright holder, author, genre. But not language, > curator, byte length, delivery format, etc, which vary repository > by repository would not transfer across sameWorkAs. > With version 0.2 of OAI2LOD you can in fact configure which "linking property" you would like to use for a certain linking rule. In the demos I set up - one links LOC data with DBPedia, the other Austrian National Library Data with DBPedia - we actually use rdfs:seeAlso properties because we haven't found any works that are the "sameAs" or the "sameWorkAs" other works. The motivation for using "sameAs" was that two repositories, might maintain metadata for the the same, for instance, books. But of course, these books would in fact not be the same books but rather the same works. From the library community I know the FRBR model which provides a set of entities distinguishing betwen "work", "expression", "manifestation", and "item".... > The TAG discussed this issue recently. > > I'm on a plane or I would be tempted to try out OAI2LOD directly. > (MacKenzie, have you tried this on MIT Dspace?) There are also two running demos at http://www.mediaspaces.info/tools/ oai2lod/ If you try it out directly, please let us know any further comments, suggestions, etc. > > Tim Best, Bernhard -- _______________________________________________________ Research Group Multimedia Information Systems Department of Distributed and Multimedia Systems Faculty of Computer Science University of Vienna Postal Address: Liebiggasse 4/3-4, 1010 Vienna, Austria Phone: +43 1 42 77 39635 Fax: +43 1 4277 39649 E-Mail: bernhard.haslhofer@univie.ac.at WWW: http://www.cs.univie.ac.at/bernhard.haslhofer
Received on Monday, 28 April 2008 21:33:28 UTC