W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > June 2012

Re: [All] domain data category section proposal, please review

From: Declan Groves <dgroves@computing.dcu.ie>
Date: Tue, 26 Jun 2012 16:39:07 +0100
Message-ID: <CAOi_1PYOC=+MEq_uucfxqKWmw5QM1mC1JGa-NDYrQoFcTiiTbQ@mail.gmail.com>
To: Felix Sasaki <fsasaki@w3.org>
Cc: Arle Lommel <arle.lommel@dfki.de>, Thomas Ruedesheim <thomas.ruedesheim@lucysoftware.com>, "<public-multilingualweb-lt@w3.org>" <public-multilingualweb-lt@w3.org>

Thanks for your proposal for domain category, which I think outlines the
best approach for dealing with the complex domain category so good job!

The data category agnostic approach makes more sense, and allows for more
flexibility, particularly for existing commercial MT service providers who
will already have their own list of pre-defined domain categories. I am not
too familiar with DCR so I dont feel qualified to comment on Arle's
suggestion. o

Using Dublin Core, however, is a good pointer to use due to its fairly wide
adoption (on this - is it worth providing a URL to the relevant Dublin Core
content?) - I know that many MT systems that do implement domain metadata
do so using high-level domains either taken directly from Dublin Core or
adapted from it (e.g. I think the LetsMT project use dublin core as a
starting point for defining domain).  One thing to keep in mind is that the
proposal should be as clear and concise as possible. In terms of providing
pointers to what codes people can use, I think we are better off limiting
this as promoting interoperability is key and providing a list of
alternative implementation strategies may over-complicate things.

It is good to emphasise the optional domainMapping attribute, and I would
perhaps add to the paragraph concerning the explanation of domainMapping
that although optional, it is recommended that details for the attribute be
provided. For our implementation, I expect to carry out something similar
to Thomas - create a mapping from the provided domain metadata to domains
that are available for our trained systems.

typo: "In source content... " -> "In the source content..."
      "no agreed upon set of value sets" -> "no agreed upon value sets"


On 25 June 2012 15:43, Felix Sasaki <fsasaki@w3.org> wrote:

> Hi Arle, Thomas, all,
> thanks for your feedback, Thomas, I'll fix the typos you found.
> 2012/6/25 Arle Lommel <arle.lommel@dfki.de>
>> Was this an area where the ISO data category registry might come into
>> play?
> No - this proposal is "data category agnostic". The idea is to provide a
> mechanism to map existing value lists (like the one Thomas mentioned).
>> That is, could we declare an agreed upon selection of fairly broad
>> top-level domains to promote interoperability while still allowing for
>> specification by users?
> After our discussion in Dublin and quite a few mails about this, see e.g.
> the summary at
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012May/0165.html
> or David's proposal at
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012May/0079.html
> I don't see an agreement for even top level domains.
>> Unfortunately there is a lot of complexity around this issue in general
>> that we will not resolve and that may indeed be fundamentally unresolvable.
>> But perhaps using the DCR as a place where domain ontologies can be
>> declared in an authoritative resource and pointed to we could at least
>> provide a way for someone to share what they mean.
> There are so many running systems using their own value lists for domain -
> I wouldn't expect that Lucy software or others would change their systems.
> The benefit they would get with the proposal in this thread is that
> connecting systems (e.g. MT + CMS) gets easier.
> Of course one could point users to what codes they should use. The dublin
> core subject field I have put into the draft is such a pointer. In addition
> I would be happy to name DCR as another area to look into, like TAUS top
> level categories, Let's MT top level categories, etc. That is, of course we
> want people to be aware of DCR.
> I also saw your question wrt DCR in the other thread, but I also don't
> recall an area where we would have a direct dependency. But as I said
> above, it would be good to inform readers of ITS 2.0 about where relying on
> DCR makes sense.
> A related question: if I want to refer to DCR in an HTML "meta" element,
> how would the DCR "scheme" be identified? Here is an example from dublin
> core:
> <meta name="DCTERMS.issued" scheme="DCTERMS.W3CDTF" content="2003-11-01"
> />
> If there is an approach to do that with DCR, I think we should have an
> example about it in ITS 2.0. Maybe you can check with the DCR experts in
> Madrid?
> Best,
> Felix
>> Arle
>> --
>> Arle Lommel
>> Berlin, Germany
>> Skype: arle_lommel
>> Phone (US): +1 707 709 8650
>> Sent from a mobile device. Please excuse any typos.
>> On Jun 25, 2012, at 16:02, "Thomas Ruedesheim" <
>> thomas.ruedesheim@lucysoftware.com> wrote:
>> Hi Felix,
>> I agree with your proposal. (There are just 2 typos in the examples: ""
>> in domainPointer attributes.)
>> Lucy's MT engine accepts a global SUBJECT_AREAS parameter holding a list
>> of domain names. Domains are organized in a hierarchy.
>> Here is a short excerpt (first 2 levels):
>>   General Vocabulary
>>     Common Social Voc.
>>       Art & Literature
>>       Ecology, Environment Protection
>>       Economy & Trade
>>       Law & Legal Science
>>       ...
>>     Common Technical Voc.
>>       Agriculture & Fishing
>>       Civil Engineering
>>       Data Processing
>>       ...
>> We will read the meta data and apply the mapping. Of course, the mapping
>> is specific for the used MT tool.
>> Cheers,
>> Thomas
>>  ------------------------------
>> *From:* Felix Sasaki [mailto:fsasaki@w3.org]
>> *Sent:* Montag, 25. Juni 2012 08:48
>> *To:* public-multilingualweb-lt@w3.org
>> *Subject:* [All] domain data category section proposal, please review
>> Hi all,
>> I have created a proposal for the domain data category, see attachment.
>> This would resolve ISSUE-11, with the input from ACTION-87 taken into
>> account.
>> Declan, Thomas, I think this is esp. important for you - we need to know
>> whether an implementation as described would be feasible and useful for
>> you. Of course, others, feel welcome to contribute.
>> Please make comments in this thread - I will use them to provide another
>> version of the section.
>> Thanks,
>> Felix
>> --
>> Felix Sasaki
>> DFKI / W3C Fellow
> --
> Felix Sasaki
> DFKI / W3C Fellow

Dr. Declan Groves
Research Integration Officer
Centre for Next Generation Localisation (CNGL)
Dublin City University

email: dgroves@computing.dcu.ie <dgroves@computing.dcu.ie>
 phone: +353 (0)1 700 6906
Received on Tuesday, 26 June 2012 15:39:38 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:45 UTC