Re: Big data applications for general users based on RDF - where are they? from Michael Brunnbauer on 2013-06-26 (public-lod@w3.org from June 2013)

From: Michael Brunnbauer <brunni@netestate.de>
Date: Wed, 26 Jun 2013 14:15:14 +0200
To: public-lod@w3.org
Message-ID: <20130626121513.GA13948@netestate.de>
Hello Dominic,

thank you very much for your detailed reply which answers my questions.

Reading this brought up a new question that is not directly related to CIDOCCRM:

When mapping RDF vocabularies in practice, how often will I have to use rule 
based systems instead of reasoning because RDFS/OWL is not expressive enough 
for the mappings ?

Regards,

Michael Brunnbauer

On Tue, Jun 25, 2013 at 07:10:32AM +0100, Dominic Oldman wrote:
> Hi Michael.
> 
> Thanks for the question. I am glad you asked :-) but sorry in advance for the long answer.
> 
> 
> Before I go further though, a fundamental project objective is to make mapping to the CRM straight forward and simple for organisations that understand their own data and the CRM-SIG (Special Interest group) have already made inroads. 
> 
> 
> The CIDOCCRM is an ontology formed in the 1990s and progressed to be an ISO standard in 2006 and has therefore been around for a long time before many other ontologies existed. In fact, the Europeana Data Model used for aggregating both in Europe and the US and covering over 20 million items in Europe borrows the concepts of time and place and people from the CRM despite also borrowing from Dublin Core, SKOS and FOAF! The EDMdoesn't describe a Museum object record being too general.
> 
> The CRM is used in the method I described in my other email without specialisation but by vocabulary plugins that type the events and reify properties. If we specialised them we would extend the CRM by another 200 properties and entities - and that would make it complicated.  ( see web as literature presentation at  http://www.researchspace.org/project-updates/webasliterature-britishlibrary10thjune) 
> 
> 
> The ontology was created bottom up by examining hundreds of data models in the cultural heritage world and abstracting an ontology that would harmonise them. The reality is that museum and cultural heritage data is highly complex. The CIDOCCRM instead of avoiding it addresses this complexity,  and by mapping data to it provides an extremely accessible and easy way to search rich data sets even when it is sourced from many different organisations. While it is comprehensive in its treatment of cultural heritage (it is not confined to museums) it has been careful not to over generalise like, for example, dublin core. It is completely contextual and does not mandate a common set of fields and thereby does not strip down data from its original sources - it provides a complete knowledge representation. (see http://www.oldman.me.uk/blog/costsofculturalheritage/ )
> 
> 
> Museums and other cultural heritage organisations all have collection catalogues that may be based on standards but these standards are wide and are in any event highly customised and employ very different vocabularies. In this respect they probably represent one of the hardest use cases for linked data. The British Museum have internal thesauri and authorities that cover object types, materials, techniques, cultures, subjects, periods, languages as well as bibliographies, biographies and places (both modern and archaic). There are specialist thesauri for particular objects like 'wares' and clock and watches. Moreover there are terminologies that describe processes that are unique to the objects. The model itself reflects particular institutional priorities, policies and customs formed over very many years and we have been digitising the collection for over 30 years. All of this information is important for data harmonisation purposes (particularly for
>  research).
> 
> 
> The processes involved in the production of an object could involve a large number of different methods with different influences and with associations with different groups (artistic or otherwise) and different types of production environment. All these things are addressed differently in different cultural heritage organisations. The use of dates and periods can be particularly complicated for different object types and with different opinions about what a period means and how an object fits within a period. Our dating starts at 2 million years BC to the present day and date classifications can be varied and non-standard.
> 
> 
> In reality the CRM is not a complicated ontology but merely generalises over an extremely rich and complicated data environment to the extent that it can, for example, harmonise the artificial objects of a museum of antiquities with say a Natural History museum with natural objects and with classification following a completely different set of standards. Equally it can be used for HEI research (King's College have starting using it for their specialised research like a prosopography). There are no other ontologies that describes a British Museum object to the level at which it would be useful for research as well as education and engagement. Take object production; 
> 
> 
> Production events typically records the process in relationship to the person and place techniques and time spans with the following variations
> 
> 1. Production by specific process (place and actor)   
> 
> 2. Production by closely related group or pupil
> 3. Production with no specific process (place and actor)
> 
> 4. Production by different ethnic groups
> 5. Production involving likelihood and probability (place and actor) (ie. attributed to or assign to ).
> 
> 6. Production with parts which may have been part of the overall process or created as part of a separate production process.
> 7. Production authorities (the motivation for a production).
> 8. Production influences.
> 9. Production made for a particular place or for a particular person.
> 
> These different variations for production are determined by a British Museum specific set of internal codes are different in other museums. For example
> 
> 5: Drawn by  
> AU: Author  
> BC: Block cut by  
> CA: Calligrapher  
> D: Designed by, DM: Medal designed and made by  
> DE: Decorated by  
> E: Engraved by  
> I: Issuer  
> ID: Intermediary draughtsman  
> J: Modelled by  
> L: Lustred by  
> M: Made by, DM: Medal designed and made by  
> P: Painted by  
> PH: Photographed by  
> SC: Scribe  
> WR: Written by  
> Z: Published by  
> G: Moneyer 
> T: Mint  
> PA: Print artist, PM: Print made by, R: Printed by  
>   
> AG: Office/studio of  
> AJ: Circle/School of  
> F: Factory of  
> O: Official/Office/Dept  
> W: Workshop of  
> A: Attributed to  
> AA: Attributed to an Apprentice/Pupil of  
> AB: Ascribed to  
> AC: Attributed to the Circle of  
> AD: Assigned to  
> AW: Attributed to the Workshop of  
> CB: Claimed to be by  
> AE: Formerly attributed to  
> IR: Inscription by  
> LE: Lettering engraved by  
> MB: Bell made by  
> MC: Case made by  
> MD: Dial made by  
> ME: Ebauche maker  
> MM: Movement made by  
> MP: Watch pendant made by  
> MQ: Dust-cap maker  
> AF: Attributed to a Follower of  
> AI: Attributed to an Imitator of  
> AL: Manner/Style of  
> AM: Attributed to the Manner of  
> AT: After  
> C: Close to  
> CF: Compare with  
> CM: Connected with the Manner of  
> CW: Connected with  
> S: School of/style of  
> RE: Related to  
> NE: Near  
> RC: Recalls  
> 
> 
> These principles apply to other parts of the object record including acquisition, inscription, visual representations and so on. I have attached a construct for "Acquisition From" which is one acquisition construct along with 'Acquisition Through', 'Custody From', 'Acquisitision Motivated by', 'Acquisition Through', 'Legislation', 'Found By' etc all of which have different semantics that are very important to represent. 
> 
> 
> The ontology allows us to take densities of data that have been contextually harmonised and infer new knowledge, correct data that is wrong and to co-reference all the different terms and vocabularies. The Museum has terms, people and places that don't appear in central authorities, Getty, Viaf etc. More obscure artists or artisans, for example, will simply never be co-referenced without the contextualisation that CRM provides. 
> 
> 
> Here (attched) is the record (not the largest) of a significant object - The Rosetta Stone. One object out of 2 million records that represent a large number of object types from art history to archeology representing different internal approaches to classification and knowledge representation. I would suggest that CRM is more of a miracle than an ontology and it would surely take 15 years for anyone to start from from scratch and produce something equivalent. :-)
> 
> Hope this answers the questions a little bit. Sorry for being so long winded.
> 
> 
> Cheers Dominic
> 
> 
> 
> 
> 
> 
> 
> 
>   
> 
> 
> 
>   
> 
> 
> 
> 
> 
> ________________________________
>  From: Michael Brunnbauer <brunni@netestate.de>
> To: public-lod@w3.org 
> Sent: Sunday, 23 June 2013, 18:21
> Subject: Re: Big data applications for general users based on RDF - where     are they?
>  
> 
> 
> Hello Dominic,
> 
> On Sun, Jun 23, 2013 at 09:35:53AM +0100, Dominic Oldman wrote:
> > I take the point about the ability to set things up quickly, but this just points to the fact that we have some way to go on a number of strands. But we all know we on the right path.  I would say that focusing in on some of the huge range of potential applications that you couldn't do with a relational database will help move things along more.
> > 
> > On ontology here is my experience. You need a solid ontology that describes your domain at precisely the right level to represent domain knowledge to establish key relationships but which supports specialisation below this level. This level is just at the
>  point above which the domain varies. However after going down the specialisation route we have found a more accessible and portable approach. 
> > 
> > We have used an ontology that does precisely the above but used it to create a set of ready made constructs for key domain concepts that are uncontentious. A particular concept may have a number of alternative constructs from which an organisation can select as appropriate. We then avoid the need to specialise the constructs using sub classes and sub properties and instead provide a mechanism for plugging in local vocabularies. This transfers the issue of co-referencing ontology extensions to co-referencing vocabularies. This is far more accessible for two reasons. Firstly, the contextualisation of the non-specialised elements provide enough knowledge representation to perform the co-referencing.  Secondly, there are many vocabulary co-referencing initiatives that are becoming more mature
>  and accessible. The plugin approach is supported by typing whole event constructs and reification of key properties with local terminology, people and place
> >  authorities, but also terminology unique to the organisation (Institutional context).
> > 
> > For example, the production of something may have a generalised property of "carried out by". This could be specialised in a large number of ways. Instead we can look at the local specialisations and use them as a vocabulary to either type the full event or to reify the property itself. E.g. "designed by".
> > 
> > This process avoids a whole range of issues and also has the potential to be built into accessible implementation tools useful for organisations without technical resources. It means that we can start producing the applications that we can't do with relational databases and which operate across many different datasets robustly.
> > 
> > How does this
>  sound?
> 
> This sounds complicated :-)
> 
> That the cultural heritage crowd seems to have a need for it's own upper level
> ontology underlines my point about schematic and structural heterogeneity.
> 
> Why does CIDOC define general concepts as place, event, spacial coordinates ?
> Are there no suitable existing ontologies for this ?
> 
> Can CIDOC also be used without specialization of properties and classes ?
> 
> I think museums have used controlled vocabularies for quite a while. Can you
> give an example that illustrates why the additional effort required for your 
> project is justified ?
> 
> Regards,
> 
> Michael Brunnbauer
> 
> -- 
> ++  Michael Brunnbauer
> ++  netEstate GmbH
> ++  GeisenhausenerStraße 11a
> ++  81379 München
> ++  Tel +49 89 32 19 77 80
> ++  Fax +49 89 32 19 77 89 
> ++  E-Mail brunni@netestate.de
> ++  http://www.netestate.de/
> ++
> ++  Sitz: München, HRBNr.142452 (Handelsregister B München)
> ++  USt-IdNr. DE221033342
> ++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
> ++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel



-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail brunni@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Received on Wednesday, 26 June 2013 12:15:38 UTC