W3C home > Mailing lists > Public > public-bibframe2schema@w3.org > March 2019

RE: Bibframe to Schema.org conversion examples

From: Hess, Kirk <khes@loc.gov>
Date: Tue, 5 Mar 2019 19:11:01 +0000
To: "public-bibframe2schema@w3.org" <public-bibframe2schema@w3.org>
Message-ID: <5098efd59a744032b9d7b7ee6831f1ba@LCXEX02.LCDS.LOC.GOV>
First, I wanted to point out that we are continually updating the conversion for marc2bibframe2 (https://github.com/lcnetdev/marc2bibframe2). Our most recent release v1.4.0 includes the specifications in the repository, which are excel spreadsheets which are used to modify the stylesheets in the spec directory, and we're trying to write unit tests for all the permutations in the /test directory. As Osma mentioned, other versions of BIBFRAME are out there which may or may not have specifications you can follow but hopefully this helps everyone understand how a particular marc field was mapped to BIBFRAME.

To Richard's question about mapping to schema:Book. The idea is that you would have a work with the type bf:Text which would bf:hasInstance with the type bf:Print, or a bf:Electronic for an e-book, but a bf:Text with a bf:Manuscript instance would not be a book. Unfortunately, as I was looking into this I don't think we're using bf:Print that way in our specs, generally you get a bf:Instance without a subclass. So in our sample data set we published last May (http://id.loc.gov/dowloads) you wouldn't have very many bf:Print since that was only used for the reprint 008 value.  We're going to make this work better in the next few months - since we're developing BIBFRAME to MARC we'll need to generate the correct 008 values. 

At this point you could just simply map bf:Text to schema:Book when this isn't a manuscript and that probably would be right. To be more precise you could write a more complicated query which looks at instance values such as issuance, form, format, content, media or carrier type.

Hope that helps!


-----Original Message-----
From: Osma Suominen <osma.suominen@helsinki.fi> 
Sent: Tuesday, March 05, 2019 8:08 AM
To: public-bibframe2schema@w3.org
Subject: Re: Bibframe to SChema.org conversion examples

Hi Richard, all,

Thanks for sharing your SPARQL conversion queries! These are quite similar to the big CONSTRUCT query I'm using in the NLF Fennica conversion pipeline [1], though it takes the "single big query" approach that has its problems - for example I'm not sure how reusable it is (though I know it has been used with some success on Hungarian bibliographic data - see [2]) and its performance can be pretty bad when there are lots of BIBFRAME entities in the source data set.

Anyone is free to use my SPARQL query/script for any purpose - the whole GitHub repo is CC0 licensed. What would be the best way of sharing it with the bibframe2schema.org community?

I'm also interested in the question on types. I've long intended to separate out some more specific types in the Schema.org output we produce for Fennica, but haven't gotten around to it. Just picking out Map, AudioObject and ImageObject would be a good start. I don't currently have any ideas about how to identify Book entities, but perhaps just separating out the specific types that you know and defaulting to Book for the rest could work, at least for a typical, traditional bibliographic data set (then again, those could include for example brochures and board games...would have to test this approach with real data).

One issue with the SPARQL conversion approach is that you have to make some assumptions about how the BIBFRAME data is modelled. The BIBFRAME
2.0 model has quite a few places where you can choose between a simpler or a more expressive but complex model - representing titles is just one example. In practice I've built my conversion pipeline using the LOC
marc2bibframe2 converter as the previous step in the conversion, so the Schema.org conversion SPARQL query assumes that it is given the kind of BIBFRAME that marc2bibframe2 produces. But I'm pretty sure that another MARC to BIBFRAME conversion tool such as the @cult / SHARE-VDE one would produce somewhat different BIBFRAME which could trip up the conversion - not to speak of other (non-MARC) ways of producing BIBFRAME which could be even more different. So in order to get a robust conversion, there should be a collection of different BIBFRAME data sets, preferably collected "in the wild", to test against. (I'm involved in organizing the European BIBFRAME Workshop in September, and we're planning of including a "Bring your own BIBFRAME" workshop which could perhaps help in this area)



[2] https://github.com/NatLibFi/bib-rdf-pipeline/issues/92

Richard Wallis kirjoitti 27.2.2019 klo 13.51:
> Hi All,
> I have assembled a set of SPARQL scripts that I have used to enhance 
> Bibframe 2.0 RDF entities with Schema.org triples and reproduced them 
> into a page on our Wiki <SPARQL Conversion Examples 
> <https://www.w3.org/community/bibframe2schema/wiki/SPARQL_Conversion_E

> xamples_-_from_RJW>>
> The scripts assume that they are being run against a SPARQL endpoint 
> of a triplestore containing Bibframe 2.0.  They assume that the 
> Schema.org triples will be inserted into the store.  These can easily 
> be changed to just output the triples, as the results of a query, by 
> replacing the INSERT verb with CONSTRUCT.
> A triplestore is not necessarily a prerequisite as the scripts could 
> also be used within an inline process script.  I am working on such a 
> script that I use, to get it into a sharable state -- /more of that 
> later!/
> My approach has been to produce individual scripts for each Biframe 
> entity or property that can usefully used to identify Schema.org 
> equivalents.   I find this produces more maintainable/debuggable 
> scripts than producing a humungous single script that 'does 
> everything', not that this approach doesn't have its own problems.
> They are there to view and comment upon - please join in.
> Apart from simple conversions such as for Organization 
> <https://www.w3.org/community/bibframe2schema/wiki/SPARQL_Conversion_E

> xamples_-_from_RJW#Organization>, the page contains more complex 
> examples, such as Work 
> <https://www.w3.org/community/bibframe2schema/wiki/SPARQL_Conversion_E

> xamples_-_from_RJW#Work>,
> Instance
> <https://www.w3.org/community/bibframe2schema/wiki/SPARQL_Conversion_E

> xamples_-_from_RJW#Instance>,
> and Item
> <https://www.w3.org/community/bibframe2schema/wiki/SPARQL_Conversion_Examples_-_from_RJW#Item>.  
> These attempt to solve some of the more significant challenges 
> encountered with conversion from Bibframe to Schema.org:
>   * Typing - Identifying the correct schema types to assign for audio,
>     video, dataset, etc.
>   * Preserving the relationship between Work, Instance and Item entities
>     using the schema workExample and exampleOfWork properties.
>   * Denormalisation of Work data - copying properties [that in Bibframe
>     that only appear in Work] from the Work entity into Instance and
>     Item definitions. (Schema.org expects that a single entity could
>     stand alone containing all relevant properties)
> One challenge I have not yet been able to solve is '/how the heck do 
> you decide that an entity should be defined as a schema:Book/'.  There 
> is bookFormat & BookFormat, but in my test data they don't appear to 
> be used.  It is a little frustrating that I can easily identify Map, 
> AudioObject, & ImageObject types but can't reliably pick out what is a 
> Book  - Any advice gratefully received!
> Need also to work out what to do with Agent entities that are not also 
> defined as being either a Person or Organization.
> If anybody has similar scripts that we can share to help us towards 
> our goal, please upload to the wiki.
> ~Richard.
> Richard Wallis
> Founder, Data Liberate
> http://dataliberate.com

> Linkedin: http://www.linkedin.com/in/richardwallis

> Twitter: @rjw

Osma Suominen
D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4)
Tel. +358 50 3199529

Received on Tuesday, 5 March 2019 19:14:16 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 5 March 2019 19:14:16 UTC