- From: Dave Lewis <dave.lewis@cs.tcd.ie>
- Date: Fri, 23 May 2014 10:06:35 +0100
- To: Felix Sasaki <fsasaki@w3.org>, Marta Villegas <marta.villegas@gmail.com>
- CC: Penny Labropoulou <penny@ilsp.gr>, public-ld4lt@w3.org, Maria Gavriilidou <maria@ilsp.gr>, "Dr. David Filip" <David.Filip@ul.ie>
- Message-ID: <537F0F9B.1030307@cs.tcd.ie>
Hi Felix, Thanks for those pointers. The W3C I18n activity web page on this is also useful: http://www.w3.org/International/questions/qa-choosing-language-tags.en One related topic: Yesterday in the BPMLOD community group call we started fleshing out some guidelines for publishing different types of LR data as linked data, see: https://www.w3.org/community/bpmlod/wiki/Guidelines_for_LD_generation_of_Language_resources_-_previous_notes Obviously we are keeping a close eye on this LR-meta data linked data vocab since we need to make sure they are well aligned! I've already kicked off a document there on representing bitext as LD, but this needs to be grounded in how we can currently represent bitext resources on the web. So i have two questions: 1) does anyone know the current state of media type registration for existing bitext formats: I'm thinking specifically of XLIFF, TMX, TBX and perhaps PO files 2) Is there any best practice in language tags that could be used in HTTP content negotiation for bi-text files. Obviously they contain two languages, and therefore need the two codes, but typically we care also which is the source and which is the target language. I remember at the 2012 MLW workshop in Dublin, Mark Davis introduced us to the Unicode BCP47 't' extensions from http://tools.ietf.org/html/rfc6497 Here we can express codes like: ja-t-it meaning "The content is Japanese, transformed from Italian" Is there any best practice out there for using these t codes for negotiating bitext content types? Regards, Dave On 23/05/2014 07:58, Felix Sasaki wrote: > > Am 22.05.2014 um 14:00 schrieb Marta Villegas > <marta.villegas@gmail.com <mailto:marta.villegas@gmail.com>>: > >> Dear Penny Dave and all, >> >> For things like ORGANIZATION, PROJECT, DOCUMENT, PEOPLE (ie >> non-linguistic things) we could use existing ontologies like foaf, >> doap, bibo srwc etc.... (just chose the one that fits more your purpose) >> Also for language names/codes, country names, mime-types (we did not >> find anything but ...) etc. > > Agree. And for mime-types there is the IANA registry which also comes > in an XML version > http://www.iana.org/assignments/media-types/media-types.xml > if URIs are needed for each mime type one could generate an RDF > version out of that. > > For language subtags there is the sub tag registry > http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry > and lingvoj provides a linked data version with a lot of additional > information > http://www.lingvoj.org/languages/all.html > > Best, > > Felix > >> >> Best >> >> >> >> >> 2014-05-22 11:55 GMT+02:00 Penny Labropoulou <penny@ilsp.gr >> <mailto:penny@ilsp.gr>>: >> >> Dear Dave and all, >> >> We agree that a separation into modules will help the discussion, >> and we >> basically agree with your proposal. >> >> One point as regards the RESOURCE_TYPE module: all LRs are >> described via the >> same set of "administrative/descriptive" components + an >> additional set of >> more specific components, depending on their resourceType AND >> mediaType >> values - the latter set corresponds to all the components >> included in the >> resourceComponentType part. So, there's a specific set of >> components for >> corpora, lexical/conceptual resources, language descriptions and >> tools/services (the four resource types recognized by >> META-SHARE); inside >> these, we have separate components, depending on the mediaType, >> so we have >> text corpora components, video corpora components, audio corpora >> components, >> but also lexical/conceptual text components etc. Inside each of these >> combinations, some elements are shared (e.g. linguality and >> language, time >> classification etc.) or can be similar (e.g. there are similar >> classification components for text, audio, video and image). So, >> it might be >> more convenient to separate RESOURCE_TYPE and MEDIA_TYPE modules. >> What do >> you think? >> >> We also suggest that we add three further modules: ORGANIZATION, >> PROJECT and >> DOCUMENT - corresponding to the organizationInfo, projectInfo & >> documentationInfo parts of the original model. >> >> Best, >> Penny >> >> -----Original Message----- >> From: Dave Lewis [mailto:dave.lewis@cs.tcd.ie >> <mailto:dave.lewis@cs.tcd.ie>] >> Sent: Thursday, May 22, 2014 12:38 PM >> To: public-ld4lt@w3.org <mailto:public-ld4lt@w3.org> >> Subject: [ISSUE-2] Module suggestions for META-SHARE RDF vocabulary >> >> Hi all, >> At the last call we discussed the template for the meta-share >> ontology as >> kindly initiated by Jorge: >> https://docs.google.com/spreadsheets/d/15SE4_qAqYFostmD52uKxpkCPZh1f5TrPeoXK >> NTlDYpQ/edit#gid=0 >> <https://docs.google.com/spreadsheets/d/15SE4_qAqYFostmD52uKxpkCPZh1f5TrPeoXKNTlDYpQ/edit#gid=0> >> >> with further information at: >> https://www.w3.org/community/ld4lt/wiki/Meta-Share_OWL_metamodel >> >> We discussed modules for this to help break down the taks and to >> partition >> parts that might take more time to agree or need involvement by >> different >> subgroups compared to others. >> >> We already agreed to have a CORE component and split out a >> LICENSES module, >> but had asked for other suggestions. >> >> I'd like to propose two further modules: >> >> RESOURCE_TYPE corresponding to the resrouceComponentType part of the >> meta-share schema: >> http://www.meta-share.org/portal/knowledgebase/Resourcecomponenttype >> >> and >> >> USAGE_TYPE corresponding to the usageInfo part of the meta-share >> schema: >> http://www.meta-share.org/portal/knowledgebase/Usageinfo >> >> These contain large enumerations that could both be subject to >> ongoing >> debate and likely candidate for extension/specialization. By >> separating >> these out we can avoid such debate delaying work on the CORe module. >> >> Should we add these as modules to the spreadsheet? >> >> From an ontology modelling viewpoint, how should we manage the >> modelling in >> these proposed modules, would a class taxonomy be a better >> approach and an >> enumeration? >> >> Kind Regards, >> Dave >> >> >> >> >> >> >> >> >> -- >> Marta Villegas >> marta.villegas@gmail.com <mailto:marta.villegas@gmail.com> >
Received on Friday, 23 May 2014 09:03:16 UTC