Re: [ISSUE-2] Module suggestions for META-SHARE RDF vocabulary

Hi Felix,
Thanks for those pointers. The W3C I18n activity web page on this is 
also useful:
http://www.w3.org/International/questions/qa-choosing-language-tags.en

One related topic: Yesterday in the BPMLOD community group call we 
started fleshing out some guidelines for publishing different types of 
LR data as linked data, see:
https://www.w3.org/community/bpmlod/wiki/Guidelines_for_LD_generation_of_Language_resources_-_previous_notes

Obviously we are keeping a close eye on this LR-meta data linked data 
vocab since we need to make sure they are well aligned!

I've already kicked off a document there on representing bitext as LD, 
but this needs to be grounded in how we can currently represent bitext 
resources on the web. So i have two questions:

1) does anyone know the current state of media type registration for 
existing bitext formats: I'm thinking specifically of XLIFF, TMX, TBX 
and perhaps PO files

2) Is there any best practice in language tags that could be used in 
HTTP content negotiation for bi-text files. Obviously they contain two 
languages, and therefore need the two codes, but typically we care also 
which is the source and which is the target language.  I remember at the 
2012 MLW workshop in Dublin, Mark Davis introduced us to the Unicode 
BCP47 't' extensions from http://tools.ietf.org/html/rfc6497
Here we can express codes like:
ja-t-it meaning "The content is Japanese, transformed from Italian"

Is there any best practice out there for using these t codes for 
negotiating bitext content types?

Regards,
Dave


On 23/05/2014 07:58, Felix Sasaki wrote:
>
> Am 22.05.2014 um 14:00 schrieb Marta Villegas 
> <marta.villegas@gmail.com <mailto:marta.villegas@gmail.com>>:
>
>> Dear Penny Dave and all,
>>
>> For things like ORGANIZATION, PROJECT, DOCUMENT, PEOPLE (ie 
>> non-linguistic things) we could use existing ontologies like foaf, 
>> doap, bibo srwc etc.... (just chose the one that fits more your purpose)
>> Also for language names/codes, country names, mime-types (we did not 
>> find anything but ...) etc.
>
> Agree. And for mime-types there is the IANA registry which also comes 
> in an XML version
> http://www.iana.org/assignments/media-types/media-types.xml
> if URIs are needed for each mime type one could generate an RDF 
> version out of that.
>
> For language subtags there is the sub tag registry
> http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
> and lingvoj provides a linked data version with a lot of additional 
> information
> http://www.lingvoj.org/languages/all.html
>
> Best,
>
> Felix
>
>>
>> Best
>>
>>
>>
>>
>> 2014-05-22 11:55 GMT+02:00 Penny Labropoulou <penny@ilsp.gr 
>> <mailto:penny@ilsp.gr>>:
>>
>>     Dear Dave and all,
>>
>>     We agree that a separation into modules will help the discussion,
>>     and we
>>     basically agree with your proposal.
>>
>>     One point as regards the RESOURCE_TYPE module: all LRs are
>>     described via the
>>     same set of "administrative/descriptive" components + an
>>     additional set of
>>     more specific components, depending on their resourceType AND
>>     mediaType
>>     values - the latter set corresponds to all the components
>>     included in the
>>     resourceComponentType part. So, there's a specific set of
>>     components for
>>     corpora, lexical/conceptual resources, language descriptions and
>>     tools/services (the four resource types recognized by
>>     META-SHARE); inside
>>     these, we have separate components, depending on the mediaType,
>>     so we have
>>     text corpora components, video corpora components, audio corpora
>>     components,
>>     but also lexical/conceptual text components etc. Inside each of these
>>     combinations, some elements are shared (e.g. linguality and
>>     language, time
>>     classification etc.) or can be similar (e.g. there are similar
>>     classification components for text, audio, video and image). So,
>>     it might be
>>     more convenient to separate RESOURCE_TYPE and MEDIA_TYPE modules.
>>     What do
>>     you think?
>>
>>     We also suggest that we add three further modules: ORGANIZATION,
>>     PROJECT and
>>     DOCUMENT - corresponding to the organizationInfo, projectInfo &
>>     documentationInfo parts of the original model.
>>
>>     Best,
>>     Penny
>>
>>     -----Original Message-----
>>     From: Dave Lewis [mailto:dave.lewis@cs.tcd.ie
>>     <mailto:dave.lewis@cs.tcd.ie>]
>>     Sent: Thursday, May 22, 2014 12:38 PM
>>     To: public-ld4lt@w3.org <mailto:public-ld4lt@w3.org>
>>     Subject: [ISSUE-2] Module suggestions for META-SHARE RDF vocabulary
>>
>>     Hi all,
>>     At the last call we discussed the template for the meta-share
>>     ontology as
>>     kindly initiated by Jorge:
>>     https://docs.google.com/spreadsheets/d/15SE4_qAqYFostmD52uKxpkCPZh1f5TrPeoXK
>>     NTlDYpQ/edit#gid=0
>>     <https://docs.google.com/spreadsheets/d/15SE4_qAqYFostmD52uKxpkCPZh1f5TrPeoXKNTlDYpQ/edit#gid=0>
>>
>>     with further information at:
>>     https://www.w3.org/community/ld4lt/wiki/Meta-Share_OWL_metamodel
>>
>>     We discussed modules for this to help break down the taks and to
>>     partition
>>     parts that might take more time to agree or need involvement by
>>     different
>>     subgroups compared to others.
>>
>>     We already agreed to have a CORE component and split out a
>>     LICENSES module,
>>     but had asked for other suggestions.
>>
>>     I'd like to propose two further modules:
>>
>>     RESOURCE_TYPE corresponding to the resrouceComponentType part of the
>>     meta-share schema:
>>     http://www.meta-share.org/portal/knowledgebase/Resourcecomponenttype
>>
>>     and
>>
>>     USAGE_TYPE corresponding to the usageInfo part of the meta-share
>>     schema:
>>     http://www.meta-share.org/portal/knowledgebase/Usageinfo
>>
>>     These contain large enumerations that could both be subject to
>>     ongoing
>>     debate and likely candidate for extension/specialization. By
>>     separating
>>     these out we can avoid such debate delaying work on the CORe module.
>>
>>     Should we add these as modules to the spreadsheet?
>>
>>      From an ontology modelling viewpoint, how should we manage the
>>     modelling in
>>     these proposed modules, would a class taxonomy be a better
>>     approach and an
>>     enumeration?
>>
>>     Kind Regards,
>>     Dave
>>
>>
>>
>>
>>
>>
>>
>>
>> -- 
>> Marta Villegas
>> marta.villegas@gmail.com <mailto:marta.villegas@gmail.com>
>

Received on Friday, 23 May 2014 09:03:16 UTC