- From: Antonio J. Roa Valverde <antonio.roa@englishbubble.com>
- Date: Thu, 12 Jun 2014 13:13:07 +0200
- To: public-ld4lt@w3.org
- Message-ID: <CALzoxk5cdtL2Gmq2WzSQ-+Eh+1STQ8_U181ogMgevOWd2F2K3g@mail.gmail.com>
Hi all, I have been following this discussion quite passively until now and I think I can give my 2 cents here. I believe the following survey gives a good overview of the topic. Leonardo Lezcano, Salvador Sánchez-Alonso, Antonio J. Roa-Valverde, (2013) "A survey on the exchange of linguistic resources: Publishing linguistic linked open data on the Web", Program: electronic library and information systems, Vol. 47 Iss: 3, pp.263 - 281 http://www.emeraldinsight.com/journals.htm?issn=0033-0337&volume=47&issue=3&articleid=17093339&show=html I am not sure I can just share a copy in this thread due to copyright issues though. Best regards, Antonio On Thu, Jun 12, 2014 at 11:49 AM, Sebastian Hellmann < hellmann@informatik.uni-leipzig.de> wrote: > Hi Asun, all, > > the image looks quite appropriate, here are some things that are an > addition to the mentioned names on the image: > > ## General Metadata > The DBpedia community is currently pursuing an implementation and > extension of DCat and VOID called DataID [1]. While DCat and VOID are > vocabularies, DataID will provide some guidelines how and where to exactly > publish the DataID file (similar to the robots.txt or sitemap file). There > will be a validator implementation to help adoption. > > ## Linguistic Specific Metadata > > Language Codes for 639-1 and 639-2 are provided by the Library of Congress > (LoC): > http://id.loc.gov/vocabulary/iso639-1/ab > http://id.loc.gov/vocabulary/iso639-2/eng > Also in RDF: > http://id.loc.gov/vocabulary/iso639-2/eng.rdf > > Sadly, the most popular code, i.e. iso639-3 are not available by LoC: > http://lexvo.org is the authority here at the moment: > http://www.lexvo.org/page/iso639-3/eng > > ## Linguistic Data > In my opinion, NIF and lemon are able to cover most industrial use cases. > > lemon for dictionaries and terminological data > NIF as an annotation format for text > > While NIF itself provides mechanisms to model (offset) annotations as > linked data, here are the incorporated NIF modules for expressing the > annotations itself: > > * ITS RDF ontology - http://www.w3.org/2005/11/its/rdf# based on > http://www.w3.org/TR/its20/ > * NERD - for entity classification (person, location, ...) > http://nerd.eurecom.fr/ontology > * MARL - for sentiment analysis http://purl.org/marl/0.1/ns > * OLiA - for morpho-syntax, POS tag sets, etc. http://purl.org/olia > * DBpedia + DBpedia Ontology for Entity Linking: > http://dbpedia.org/resource/Barack_Obama > > We started to collect them all here: > https://github.com/NLP2RDF/ontologies/tree/master/vm > > ## Limitations of the above: > > * If the language codes of ISO are not enough http://glottolog.org/ is an > option > * If you need fine grained features like annotations of annotations > http://www.openannotation.org/ can be used. The triple count is much > higher than NIF though and scalability can be a problem. > > All the best, > Sebastian > > [1] http://wiki.dbpedia.org/coop/DataIDUnit > > > > Am 31.05.2014 11:49, schrieb Asunción Gómez Pérez: > > > Dear all, > > Please consider the following picture as a starting point to try to > identify different metadata in clusters and splitting it from the content > oriented part of the LR . Issues related with country codes are not > included in this slide, but it should be easy to extend. In the middle, > the white boxes refer to candidate vocabularies to be reused or to > initiatives that could help us with the deffinition of the properties and > their values. > > > I hope that it helps > > Asun > > > > > El 22/05/2014 14:00, Marta Villegas escribió: > > Dear Penny Dave and all, > > For things like ORGANIZATION, PROJECT, DOCUMENT, PEOPLE (ie > non-linguistic things) we could use existing ontologies like foaf, doap, > bibo srwc etc.... (just chose the one that fits more your purpose) > Also for language names/codes, country names, mime-types (we did not find > anything but ...) etc. > > Best > > > > > 2014-05-22 11:55 GMT+02:00 Penny Labropoulou <penny@ilsp.gr>: > >> Dear Dave and all, >> >> We agree that a separation into modules will help the discussion, and we >> basically agree with your proposal. >> >> One point as regards the RESOURCE_TYPE module: all LRs are described via >> the >> same set of "administrative/descriptive" components + an additional set of >> more specific components, depending on their resourceType AND mediaType >> values - the latter set corresponds to all the components included in the >> resourceComponentType part. So, there's a specific set of components for >> corpora, lexical/conceptual resources, language descriptions and >> tools/services (the four resource types recognized by META-SHARE); inside >> these, we have separate components, depending on the mediaType, so we have >> text corpora components, video corpora components, audio corpora >> components, >> but also lexical/conceptual text components etc. Inside each of these >> combinations, some elements are shared (e.g. linguality and language, time >> classification etc.) or can be similar (e.g. there are similar >> classification components for text, audio, video and image). So, it might >> be >> more convenient to separate RESOURCE_TYPE and MEDIA_TYPE modules. What do >> you think? >> >> We also suggest that we add three further modules: ORGANIZATION, PROJECT >> and >> DOCUMENT - corresponding to the organizationInfo, projectInfo & >> documentationInfo parts of the original model. >> >> Best, >> Penny >> >> -----Original Message----- >> From: Dave Lewis [mailto:dave.lewis@cs.tcd.ie] >> Sent: Thursday, May 22, 2014 12:38 PM >> To: public-ld4lt@w3.org >> Subject: [ISSUE-2] Module suggestions for META-SHARE RDF vocabulary >> >> Hi all, >> At the last call we discussed the template for the meta-share ontology as >> kindly initiated by Jorge: >> >> https://docs.google.com/spreadsheets/d/15SE4_qAqYFostmD52uKxpkCPZh1f5TrPeoXK >> NTlDYpQ/edit#gid=0 >> <https://docs.google.com/spreadsheets/d/15SE4_qAqYFostmD52uKxpkCPZh1f5TrPeoXK%0ANTlDYpQ/edit#gid=0> >> >> with further information at: >> https://www.w3.org/community/ld4lt/wiki/Meta-Share_OWL_metamodel >> >> We discussed modules for this to help break down the taks and to partition >> parts that might take more time to agree or need involvement by different >> subgroups compared to others. >> >> We already agreed to have a CORE component and split out a LICENSES >> module, >> but had asked for other suggestions. >> >> I'd like to propose two further modules: >> >> RESOURCE_TYPE corresponding to the resrouceComponentType part of the >> meta-share schema: >> http://www.meta-share.org/portal/knowledgebase/Resourcecomponenttype >> >> and >> >> USAGE_TYPE corresponding to the usageInfo part of the meta-share schema: >> http://www.meta-share.org/portal/knowledgebase/Usageinfo >> >> These contain large enumerations that could both be subject to ongoing >> debate and likely candidate for extension/specialization. By separating >> these out we can avoid such debate delaying work on the CORe module. >> >> Should we add these as modules to the spreadsheet? >> >> From an ontology modelling viewpoint, how should we manage the modelling >> in >> these proposed modules, would a class taxonomy be a better approach and an >> enumeration? >> >> Kind Regards, >> Dave >> >> >> >> >> >> > > > -- > Marta Villegas > marta.villegas@gmail.com > > > -- > Prof. Asunción Gómez-Pérez > Catedrática de Universidad > Director of the Ontology Engineering Group > Facultad de Informática owl:sameAs Escuela Técnica Superior de Ingenieros Informáticos > Universidad Politécnica de Madrid > Campus de Montegancedo, sn > Boadilla del Monte, 28660, Spain > Home page: www.oeg-upm.net > Email: asun@fi.upm.es > Phone: (34-91) 336-7417 > Fax: (34-91) 352-4819 > > >
Attachments
- image/png attachment: 01-part
Received on Thursday, 12 June 2014 11:42:26 UTC