W3C home > Mailing lists > Public > public-ld4lt@w3.org > January 2015

Metashare as used by LingHub

From: John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de>
Date: Tue, 27 Jan 2015 16:49:50 +0100
Message-ID: <CAC5njqrrm2htXO1VeoStRXZPP+P0hQokhcn84_8rPv7G6uYyow@mail.gmail.com>
To: public-ld4lt@w3.org

As requested at the last telco, I created a spreadsheet of the MetaShare
model used by LingHub


(Public for comment not editing)

Here are some of the issues, I detected doing this and comparing to the
previous spreadsheet and the IULA-UPF OWL file:

1) Some names have been shortened, e.g.,
'ConformanceToBestStandardsAndPractices' ->
'StandardsBestPractices', should we accept such names or stay true to

2) A lot of MetaShare names have (unnecessarily) the words 'Info', 'Type'
or 'InfoType', we could eliminate these.

3) IULA have split the AnnotationType class into 5 subclasses
(DiscourseAnnotation, etc.)

4) There are many properties suggested by IULA or in the 'DISTRIBUTION'
model that have no correspondence in the MetaShare data... we should
discuss these on a case-by-case basis, right?

5) The Prev. Google Doc proposed mapping to both SWRC and BIBO, do we need
to do BIBO as well (SWRC seems sufficient)?

6) I added the license modelling that LingHub does in ODRL, could one of
our ODRL experts look at it and fix the last one?

7) Some property values, especially *resource types*, such as *ontology* or
*corpus* were created as classes in the Google Doc, shall we confirm this
usage pattern?

8) *See attached diagram.* There is a big difference in granularity between
the XSD and IULA-UPF's ontology. For example, there are 4 tags between the
resource and its actual usage in the XML, e.g.,

<resourceInfo> ...
  <usageInfo> ...
    <actualUsageInfo> ....
      <useNLPspecific>parsing</useNLPspecific> ....

Where is in the IULA model this is considerably simplified to

:resource a ms:Resource ;
  ms:actualUse ms:parsing

This would be great, but it also loses information, for example, the IULA
schema associates the *availability* with the *Resource*. However, the XSD
schema associates an *availability* with each *Distribution* (download
file). In fact, there are resources that have different availability for
different downloads (e.g., BabelNet), so there is information loss here.
Thus, LingHub is very conservative and sticks to the XSD, e.g.,

:resource a ms:ResourceInfo ;
  ms:usageInfo [
    ms:actualUsageInfo [
      ms:useNLPspecific ms:parsing ] ]

What shall we recommend here?


(image/png attachment: IULA_vs_XSD_MetaShre.png)

Received on Tuesday, 27 January 2015 15:50:19 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:16:11 UTC