W3C home > Mailing lists > Public > public-ld4lt@w3.org > June 2014

Re: [ISSUE-2] Module suggestions for META-SHARE RDF vocabulary

From: Marta Villegas <marta.villegas@gmail.com>
Date: Fri, 27 Jun 2014 15:46:48 +0200
Message-ID: <CAPq_VF=wxMUgBM2-sG794yMAe7Qt4CgORdKD9sTbNxfq+Ra5Eg@mail.gmail.com>
To: "Antonio J. Roa Valverde" <antonio.roa@englishbubble.com>
Cc: public-ld4lt@w3.org
Hi all,

I'm sorry I missed most of your discussions ...

I'm sending you some info about the MetaShare2LOD conversion and some
suggestions concerning modules:

MetaShare schema wraps information into a number of ‘top’ nodes used to
collect semantically related information. These are:












All such elements are common (used in all resources).

The *resourceComponentType* element is defined as a choice element. Thus a
specific ‘choice’ is defined for each resource type (services, corpus,
lexical conceptual resources,…)

When moving to LOD, these wrapping elements simply disappear and their
children elements become properties. Thus for example the XML path
*ResourceInfo/InformationInfo/resourceName* generates the property ‘
*resourceName’*.  (no class is created for indentificationInfo)

Though, in general, these  ‘wrapper’ XML elements do not translate in the
LOD version, we could use them to create the different modules (maybe not
for all, but sure for *distribution/license* ad possibly for *version*).

The choice in the  XML ‘Component’ element  generates the resource
subclasses in the LOD version.  I’m not sure if such sub classes could be
distributed into the corresponding modules (probably not as they share a
lot of properties). Here an alternative approach could be adopted: to
define modules according to the media type (text, image, audio,…) .

*NOTE that* the IULA-UPF LOD version does *not* include most image/audio
stuff (you can check the mappings in the excel file I sent). At this point
I’d like to notice:

-          Image/audio stuff was not included as we only had text
resources, in any case

-          The amount of information in the MetaShare image/audio elements
is (i) too fine grained (ii) overpopulates the dataset with lots of ‘too
technical and specific’ properties/classes and more important (iii) it
seems that metadata creators *do not use them*.

If you take a look at the MetaShare repository (ELDA's) you will get these

*Resources by Type:*






*Facets for Audio*

Audio <http://metashare.elda.org/repository/search/?q=> (659)

*Audio Genre*


*Speech Genre*



Broadcast News




*Speech Items*

Free Speech


Isolated Words

Isolated Words, Isolated Digits, Natural Numbers

Isolated Words, Natural Numbers, Other




Read Speech




I hope this helps.

Best regards

2014-06-12 13:13 GMT+02:00 Antonio J. Roa Valverde <

> Hi all,
> I have been following this discussion quite passively until now and I
> think I can give my 2 cents here. I believe the following survey gives a
> good overview of the topic.
> Leonardo Lezcano, Salvador Sánchez-Alonso, Antonio J. Roa-Valverde, (2013)
> "A survey on the exchange of linguistic resources: Publishing linguistic
> linked open data on the Web", Program: electronic library and information
> systems, Vol. 47 Iss: 3, pp.263 - 281
> http://www.emeraldinsight.com/journals.htm?issn=0033-0337&volume=47&issue=3&articleid=17093339&show=html
> I am not sure I can just share a copy in this thread due to copyright
> issues though.
> Best regards,
> Antonio
> On Thu, Jun 12, 2014 at 11:49 AM, Sebastian Hellmann <
> hellmann@informatik.uni-leipzig.de> wrote:
>>  Hi Asun, all,
>> the image looks quite appropriate, here are some things that are an
>> addition to the mentioned names on the image:
>> ## General Metadata
>> The DBpedia community is currently pursuing an implementation and
>> extension of DCat and VOID called DataID [1]. While DCat and VOID are
>> vocabularies, DataID will provide some guidelines how and where to exactly
>> publish the DataID file (similar to the robots.txt or sitemap file). There
>> will be a validator implementation to help adoption.
>> ## Linguistic Specific Metadata
>> Language Codes for 639-1 and 639-2 are provided by the Library of
>> Congress (LoC):
>> http://id.loc.gov/vocabulary/iso639-1/ab
>> http://id.loc.gov/vocabulary/iso639-2/eng
>> Also in RDF:
>> http://id.loc.gov/vocabulary/iso639-2/eng.rdf
>> Sadly, the most popular code, i.e. iso639-3 are not available by LoC:
>> http://lexvo.org is the authority here at the moment:
>> http://www.lexvo.org/page/iso639-3/eng
>> ## Linguistic Data
>> In my opinion, NIF and lemon are able to cover most industrial use cases.
>> lemon for dictionaries and terminological data
>> NIF as an annotation format for text
>> While NIF itself provides mechanisms to model (offset) annotations as
>> linked data, here are the incorporated NIF modules for expressing the
>> annotations itself:
>> * ITS RDF ontology - http://www.w3.org/2005/11/its/rdf# based on
>> http://www.w3..org/TR/its20/ <http://www.w3.org/TR/its20/>
>> * NERD - for entity classification  (person, location, ...)
>> http://nerd.eurecom.fr/ontology
>> * MARL - for sentiment analysis http://purl.org/marl/0.1/ns
>> * OLiA - for morpho-syntax, POS tag sets, etc.  http://purl.org/olia
>> * DBpedia + DBpedia Ontology for Entity Linking:
>> http://dbpedia.org/resource/Barack_Obama
>> We started to collect them all here:
>> https://github.com/NLP2RDF/ontologies/tree/master/vm
>> ## Limitations of the above:
>> * If the language codes of ISO are not enough http://glottolog.org/ is
>> an option
>> * If you need fine grained features  like annotations of annotations
>> http://www..openannotation.org/ <http://www.openannotation.org/> can be
>> used. The triple count is much higher than NIF though and scalability can
>> be a problem.
>> All the best,
>> Sebastian
>> [1]  http://wiki.dbpedia.org/coop/DataIDUnit
>> Am 31.05.2014 11:49, schrieb Asunción Gómez Pérez:
>> Dear all,
>> Please consider the following picture as a starting point to try to
>> identify different metadata in clusters and  splitting it from the  content
>> oriented part of the LR . Issues related with country codes are not
>> included in this slide, but it should be easy to extend.  In the middle,
>> the white boxes refer to candidate vocabularies to be reused or to
>> initiatives that could help us with the deffinition of the properties and
>> their values.
>> I hope that it helps
>> Asun
>> El 22/05/2014 14:00, Marta Villegas escribió:
>> Dear Penny Dave and all,
>> non-linguistic things) we could use existing ontologies like foaf, doap,
>> bibo srwc etc.... (just chose the one that fits more your purpose)
>> Also for language names/codes, country names, mime-types (we did not find
>> anything but ...) etc.
>>  Best
>>  2014-05-22 11:55 GMT+02:00 Penny Labropoulou <penny@ilsp.gr>:
>>> Dear Dave and all,
>>> We agree that a separation into modules will help the discussion, and we
>>> basically agree with your proposal.
>>> One point as regards the RESOURCE_TYPE module: all LRs are described via
>>> the
>>> same set of "administrative/descriptive" components + an additional set
>>> of
>>> more specific components, depending on their resourceType AND mediaType
>>> values - the latter set corresponds to all the components included in the
>>> resourceComponentType part. So, there's a specific set of components for
>>> corpora, lexical/conceptual resources, language descriptions and
>>> tools/services (the four resource types recognized by META-SHARE); inside
>>> these, we have separate components, depending on the mediaType, so we
>>> have
>>> text corpora components, video corpora components, audio corpora
>>> components,
>>> but also lexical/conceptual text components etc. Inside each of these
>>> combinations, some elements are shared (e.g. linguality and language,
>>> time
>>> classification etc.) or can be similar (e.g. there are similar
>>> classification components for text, audio, video and image). So, it
>>> might be
>>> more convenient to separate RESOURCE_TYPE and MEDIA_TYPE modules. What do
>>> you think?
>>> We also suggest that we add three further modules: ORGANIZATION, PROJECT
>>> and
>>> DOCUMENT - corresponding to the organizationInfo, projectInfo &
>>> documentationInfo parts of the original model.
>>> Best,
>>> Penny
>>> -----Original Message-----
>>> From: Dave Lewis [mailto:dave.lewis@cs.tcd.ie]
>>> Sent: Thursday, May 22, 2014 12:38 PM
>>> To: public-ld4lt@w3.org
>>> Subject: [ISSUE-2] Module suggestions for META-SHARE RDF vocabulary
>>> Hi all,
>>> At the last call we discussed the template for the meta-share ontology as
>>> kindly initiated by Jorge:
>>> https://docs.google.com/spreadsheets/d/15SE4_qAqYFostmD52uKxpkCPZh1f5TrPeoXK
>>> NTlDYpQ/edit#gid=0
>>> <https://docs.google.com/spreadsheets/d/15SE4_qAqYFostmD52uKxpkCPZh1f5TrPeoXK%0ANTlDYpQ/edit#gid=0>
>>> with further information at:
>>> https://www.w3.org/community/ld4lt/wiki/Meta-Share_OWL_metamodel
>>> We discussed modules for this to help break down the taks and to
>>> partition
>>> parts that might take more time to agree or need involvement by different
>>> subgroups compared to others.
>>> We already agreed to have a CORE component and split out a LICENSES
>>> module,
>>> but had asked for other suggestions.
>>> I'd like to propose two further modules:
>>> RESOURCE_TYPE corresponding to the resrouceComponentType part of the
>>> meta-share schema:
>>> http://www.meta-share.org/portal/knowledgebase/Resourcecomponenttype
>>> and
>>> USAGE_TYPE corresponding to the usageInfo part of the meta-share schema:
>>> http://www.meta-share.org/portal/knowledgebase/Usageinfo
>>> These contain large enumerations that could both be subject to ongoing
>>> debate and likely candidate for extension/specialization. By separating
>>> these out we can avoid such debate delaying work on the CORe module.
>>> Should we add these as modules to the spreadsheet?
>>>  From an ontology modelling viewpoint, how should we manage the
>>> modelling in
>>> these proposed modules, would a class taxonomy be a better approach and
>>> an
>>> enumeration?
>>> Kind Regards,
>>> Dave
>>  --
>> Marta Villegas
>> marta.villegas@gmail.com
>> --
>> Prof. Asunción Gómez-Pérez
>> Catedrática de Universidad
>> Director of the Ontology Engineering Group
>> Facultad de Informática owl:sameAs Escuela Técnica Superior de Ingenieros Informáticos
>> Universidad Politécnica de Madrid
>> Campus de Montegancedo, sn
>> Boadilla del Monte, 28660, Spain
>> Home page: www.oeg-upm..net <http://www.oeg-upm.net>
>> Email: asun@fi.upm.es
>> Phone: (34-91) 336-7417
>> Fax: (34-91) 352-4819

Marta Villegas

(image/png attachment: 01-part)

Received on Friday, 27 June 2014 13:47:18 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:16:09 UTC