Metashare as used by LingHub from John P. McCrae on 2015-01-27 (public-ld4lt@w3.org from January 2015)

From: John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de>
Date: Tue, 27 Jan 2015 16:49:50 +0100
To: public-ld4lt@w3.org
Message-ID: <CAC5njqrrm2htXO1VeoStRXZPP+P0hQokhcn84_8rPv7G6uYyow@mail.gmail.com>

Hi,

As requested at the last telco, I created a spreadsheet of the MetaShare
model used by LingHub

https://docs.google.com/spreadsheets/d/1fHazSb3MfyfsgiENM-ZReOqL7mW3H-EsGlgWl-Dnm8g/edit#gid=2129609025

(Public for comment not editing)

Here are some of the issues, I detected doing this and comparing to the
previous spreadsheet and the IULA-UPF OWL file:

1) Some names have been shortened, e.g.,
'ConformanceToBestStandardsAndPractices' ->
'StandardsBestPractices', should we accept such names or stay true to
MetaShare?

2) A lot of MetaShare names have (unnecessarily) the words 'Info', 'Type'
or 'InfoType', we could eliminate these.

3) IULA have split the AnnotationType class into 5 subclasses
(DiscourseAnnotation, etc.)

4) There are many properties suggested by IULA or in the 'DISTRIBUTION'
model that have no correspondence in the MetaShare data... we should
discuss these on a case-by-case basis, right?

5) The Prev. Google Doc proposed mapping to both SWRC and BIBO, do we need
to do BIBO as well (SWRC seems sufficient)?

6) I added the license modelling that LingHub does in ODRL, could one of
our ODRL experts look at it and fix the last one?

7) Some property values, especially *resource types*, such as *ontology* or
*corpus* were created as classes in the Google Doc, shall we confirm this
usage pattern?

8) *See attached diagram.* There is a big difference in granularity between
the XSD and IULA-UPF's ontology. For example, there are 4 tags between the
resource and its actual usage in the XML, e.g.,

<resourceInfo> ...
  <usageInfo> ...
    <actualUsageInfo> ....
      <useNLPspecific>parsing</useNLPspecific> ....

Where is in the IULA model this is considerably simplified to

:resource a ms:Resource ;
  ms:actualUse ms:parsing

This would be great, but it also loses information, for example, the IULA
schema associates the *availability* with the *Resource*. However, the XSD
schema associates an *availability* with each *Distribution* (download
file). In fact, there are resources that have different availability for
different downloads (e.g., BabelNet), so there is information loss here.
Thus, LingHub is very conservative and sticks to the XSD, e.g.,

:resource a ms:ResourceInfo ;
  ms:usageInfo [
    ms:actualUsageInfo [
      ms:useNLPspecific ms:parsing ] ]

What shall we recommend here?

Regards,
John

Attachments

image/png attachment: IULA_vs_XSD_MetaShre.png

Received on Tuesday, 27 January 2015 15:50:19 UTC