Re: Metashare as used by LingHub from John P. McCrae on 2015-01-28 (public-ld4lt@w3.org from January 2015)

From: John P. McCrae <jmccrae@cit-ec.uni-bielefeld.de>
Date: Wed, 28 Jan 2015 11:28:23 +0100
To: Penny Labropoulou <penny@ilsp.gr>
Cc: public-ld4lt@w3.org
Message-ID: <CAC5njqoq8tZqUJAUfOEfE9axOOnm4mpEc6DS=xYHjqySQq2crQ@mail.gmail.com>
On Tue, Jan 27, 2015 at 10:03 PM, Penny Labropoulou <penny@ilsp.gr> wrote:

> Hi John and all.
>
> Thanx for the quick work!
>
> Below are a few comments/replies in between the lines.
>
>
>
> 1) Some names have been shortened, e.g.,
> 'ConformanceToBestStandardsAndPractices' ->
> 'StandardsBestPractices', should we accept such names or stay true to
> MetaShare?
> I think we should decide this on a case-by-case basis; although some names
> are long, they are self-explanatory. In general, at ld4lt we have changed
> some names (e.g. resource to language resource) when it was agreed that the
> new label is better.
>
Hmm... change for the sake of change is difficult, particularly when it is
only a small part of the vocabulary, that creates gotchas.

>
> 2) A lot of MetaShare names have (unnecessarily) the words 'Info', 'Type'
> or 'InfoType', we could eliminate these.
> All “info” elements are in fact component names: in accordance to the CMDI
> principles, elements (and other components) are grouped into semantically
> coherent components. For instance, the identificationInfo groups together
> elements that are used for the identification of a resource, such as the
> resourceId, a url used as landing page, the resourceName and shortName, the
> description etc. If I have understood well, this structure is not
> needed/not a good practice for RDF and this is why they have been
> eliminated already at the IULA/UPF mapping.
>
> “type” elements are used in MetaShare for components that can be re-used:
> e.g. persons can be licensors, contact points, resource creators etc., but
> in all cases they are encoded using the personInfoType, which groups
> together given name, surname, communication information etc. Again, I think
> this is not mapped in RDF as such, if I understand well.
>
Yeah that is my feeling too, I would like to shorten the names, however it
seems hard to do this consistently as it would create clashes, e.g.,
ActualUse/ActualUseInfo, DocumentType/DocumentInfo

>
> 3) IULA have split the AnnotationType class into 5 subclasses
> (DiscourseAnnotation, etc.)
>
> That’s an improvement from the original model and I suggest we stick to it.
>
> 4) There are many properties suggested by IULA or in the 'DISTRIBUTION'
> model that have no correspondence in the MetaShare data... we should
> discuss these on a case-by-case basis, right?
>
> We have already discussed with Victor the distribution and licensing
> module and have come up with a proposal re-introducing some of the original
> MetaShare elements that were not mapped in the IULA/UPF version and using
> the odrl (mainly) and cc vocabularies ; the general ideas are to be found
> at
> https://www.w3.org/community/ld4lt/wiki/Metashare_vocabulary_for_licenses
> and https://www.w3.org/community/ld4lt/wiki/Examples and the mappings
> were documented in the previous googlesheet. I will add these to the new
> googlesheet by next week.
>
I incorporated all the functional (non-documentary) information from the
distribution model already... or at least I tried, let me know if I missed
anything.

> 5) The Prev. Google Doc proposed mapping to both SWRC and BIBO, do we
> need to do BIBO as well (SWRC seems sufficient)?
>
> 6) I added the license modelling that LingHub does in ODRL, could one of
> our ODRL experts look at it and fix the last one?
>
> Please, see also the two wikis on licensing, especially the examples. And
> as discussed, together with Victor we will provide a file with the RDF
> representations in odrl of the licenses used in MetaShare (of course, only
> of those that have not already been RDFized).
>
This refers to "R4 To neatly represent conditions of use"... but I couldn't
find the structured definitions of conditions of use so I wrote my own in
the sheet titled "License Modelling"

> 7) Some property values, especially *resource types*, such as *ontology*
> or *corpus* were created as classes in the Google Doc, shall we confirm
> this usage pattern?
>
> This needs some more thinking, checking the various cases. Is there a list
> of these?
>
This seems to be individuals of the classes 'ResourceType' and
'LexicalConceptualResourceType', approximately, here are the lists for
reference:

In Prev. Google Doc: BabelNet*, ComputationalLexicon, Corpus, CorpusAudio*,
CorpusCollection*, CorpusImage*, CorpusText*, CorpusTextNgram*,
CorpusTextNumerical*, CorpusVideo*, Framenet, LexicalConceptualResource,
Lexicon, MachineReadableDictionary, Ontology, TerminologicalResource,
Thesaurus, ToolService*, WordList, WordNet

>From Metashare: computationalLexicon, framenet, lexicon,
machineReadableDictionary, ontology, other*, terminologicalResource,
thesaurus, wordList, wordnet, corpus, languageDescription*,
lexicalConcepturalResource

*Unique elements

>
>
> 8) *See attached diagram.* There is a big difference in granularity
> between the XSD and IULA-UPF's ontology. For example, there are 4 tags
> between the resource and its actual usage in the XML, e.g.,
>
> <resourceInfo> ...
>
>   <usageInfo> ...
>
>     <actualUsageInfo> ....
>
>       <useNLPspecific>parsing</useNLPspecific> ....
>
> Where is in the IULA model this is considerably simplified to
>
> :resource a ms:Resource ;
>
>   ms:actualUse ms:parsing
>
>
>
> This would be great, but it also loses information, for example, the IULA
> schema associates the *availability* with the *Resource*. However, the
> XSD schema associates an *availability* with each *Distribution*
> (download file). In fact, there are resources that have different
> availability for different downloads (e.g., BabelNet), so there is
> information loss here. Thus, LingHub is very conservative and sticks to the
> XSD, e.g.,
>
> :resource a ms:ResourceInfo ;
>
>   ms:usageInfo [
>
>     ms:actualUsageInfo [
>
>       ms:useNLPspecific ms:parsing ] ]
>
> What shall we recommend here?
>
>
>
> Again, discuss on a case-by-case basis. For instance, for availability, we
> have re-introduced the distribution element, as  otherwise we lose in
> semantics. For other cases, I think we should see them more closely. The
> grouping into components made sense in XSD because it brought together
> elements. I will have to look at them more closely and explain for each
> case why this grouping was meant, so that we can decide if this should also
> remain in the RDF mapping. Is there an easy way of spotting these cases?
>
OK, we should discuss this in a telco.

>
>
> A final question: how will we add the comments/decisions from the previous
> googlesheet to the current one? As said, I can do this for the
> distribution/licensing module elements but for the rest?
>
Add any comments you want (possibly copied from previous doc). Apart from
that I would like to keep the sheet itself clean until the next ldl4lt
telco at least

Regards,
John

>
>
> Best,
>
> Penny
>
>
>
Received on Wednesday, 28 January 2015 10:28:52 UTC