Re: Bibframe to SChema.org conversion examples from Osma Suominen on 2019-03-05 (public-bibframe2schema@w3.org from March 2019)

From: Osma Suominen <osma.suominen@helsinki.fi>
Date: Tue, 5 Mar 2019 15:08:19 +0200
To: public-bibframe2schema@w3.org
Message-ID: <ea29dda3-aa99-ea72-1b7f-dd9e2afe2adc@helsinki.fi>
Hi Richard, all,

Thanks for sharing your SPARQL conversion queries! These are quite 
similar to the big CONSTRUCT query I'm using in the NLF Fennica 
conversion pipeline [1], though it takes the "single big query" approach 
that has its problems - for example I'm not sure how reusable it is 
(though I know it has been used with some success on Hungarian 
bibliographic data - see [2]) and its performance can be pretty bad when 
there are lots of BIBFRAME entities in the source data set.

Anyone is free to use my SPARQL query/script for any purpose - the whole 
GitHub repo is CC0 licensed. What would be the best way of sharing it 
with the bibframe2schema.org community?

I'm also interested in the question on types. I've long intended to 
separate out some more specific types in the Schema.org output we 
produce for Fennica, but haven't gotten around to it. Just picking out 
Map, AudioObject and ImageObject would be a good start. I don't 
currently have any ideas about how to identify Book entities, but 
perhaps just separating out the specific types that you know and 
defaulting to Book for the rest could work, at least for a typical, 
traditional bibliographic data set (then again, those could include for 
example brochures and board games...would have to test this approach 
with real data).

One issue with the SPARQL conversion approach is that you have to make 
some assumptions about how the BIBFRAME data is modelled. The BIBFRAME 
2.0 model has quite a few places where you can choose between a simpler 
or a more expressive but complex model - representing titles is just one 
example. In practice I've built my conversion pipeline using the LOC 
marc2bibframe2 converter as the previous step in the conversion, so the 
Schema.org conversion SPARQL query assumes that it is given the kind of 
BIBFRAME that marc2bibframe2 produces. But I'm pretty sure that another 
MARC to BIBFRAME conversion tool such as the @cult / SHARE-VDE one would 
produce somewhat different BIBFRAME which could trip up the conversion - 
not to speak of other (non-MARC) ways of producing BIBFRAME which could 
be even more different. So in order to get a robust conversion, there 
should be a collection of different BIBFRAME data sets, preferably 
collected "in the wild", to test against. (I'm involved in organizing 
the European BIBFRAME Workshop in September, and we're planning of 
including a "Bring your own BIBFRAME" workshop which could perhaps help 
in this area)

-Osma

[1] 
https://github.com/NatLibFi/bib-rdf-pipeline/blob/master/sparql/bf-to-schema.rq

[2] https://github.com/NatLibFi/bib-rdf-pipeline/issues/92

Richard Wallis kirjoitti 27.2.2019 klo 13.51:
> Hi All,
> 
> I have assembled a set of SPARQL scripts that I have used to enhance 
> Bibframe 2.0 RDF entities with Schema.org triples and reproduced them 
> into a page on our Wiki <SPARQL Conversion Examples 
> <https://www.w3.org/community/bibframe2schema/wiki/SPARQL_Conversion_Examples_-_from_RJW>>
> 
> The scripts assume that they are being run against a SPARQL endpoint of 
> a triplestore containing Bibframe 2.0.  They assume that the Schema.org 
> triples will be inserted into the store.  These can easily be changed to 
> just output the triples, as the results of a query, by replacing the 
> INSERT verb with CONSTRUCT.
> 
> A triplestore is not necessarily a prerequisite as the scripts could 
> also be used within an inline process script.  I am working on such a 
> script that I use, to get it into a sharable state -- /more of that later!/
> 
> My approach has been to produce individual scripts for each Biframe 
> entity or property that can usefully used to identify Schema.org 
> equivalents.   I find this produces more maintainable/debuggable scripts 
> than producing a humungous single script that 'does everything', not 
> that this approach doesn't have its own problems.
> 
> They are there to view and comment upon - please join in.
> 
> Apart from simple conversions such as for Organization 
> <https://www.w3.org/community/bibframe2schema/wiki/SPARQL_Conversion_Examples_-_from_RJW#Organization>,  
> the page contains more complex examples, such as Work 
> <https://www.w3.org/community/bibframe2schema/wiki/SPARQL_Conversion_Examples_-_from_RJW#Work>, 
> Instance 
> <https://www.w3.org/community/bibframe2schema/wiki/SPARQL_Conversion_Examples_-_from_RJW#Instance>, 
> and Item 
> <https://www.w3.org/community/bibframe2schema/wiki/SPARQL_Conversion_Examples_-_from_RJW#Item>.  
> These attempt to solve some of the more significant challenges 
> encountered with conversion from Bibframe to Schema.org:
> 
>   * Typing - Identifying the correct schema types to assign for audio,
>     video, dataset, etc.
>   * Preserving the relationship between Work, Instance and Item entities
>     using the schema workExample and exampleOfWork properties.
>   * Denormalisation of Work data - copying properties [that in Bibframe
>     that only appear in Work] from the Work entity into Instance and
>     Item definitions. (Schema.org expects that a single entity could
>     stand alone containing all relevant properties)
> 
> One challenge I have not yet been able to solve is '/how the heck do you 
> decide that an entity should be defined as a schema:Book/'.  There is 
> bookFormat & BookFormat, but in my test data they don't appear to be 
> used.  It is a little frustrating that I can easily identify Map, 
> AudioObject, & ImageObject types but can't reliably pick out what is a 
> Book  - Any advice gratefully received!
> 
> Need also to work out what to do with Agent entities that are not also 
> defined as being either a Person or Organization.
> 
> If anybody has similar scripts that we can share to help us towards our 
> goal, please upload to the wiki.
> 
> 
> ~Richard.
> 
> Richard Wallis
> Founder, Data Liberate
> http://dataliberate.com
> Linkedin: http://www.linkedin.com/in/richardwallis
> Twitter: @rjw
> 


-- 
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
osma.suominen@helsinki.fi
http://www.nationallibrary.fi
Received on Tuesday, 5 March 2019 13:08:51 UTC