W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > June 2012

Re: [All] domain data category section proposal, please review

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 27 Jun 2012 00:13:32 +0200
Message-ID: <CAL58czrF2TkMz1e_on9k=kLa-smH2ExgSFE-veLiwa=t69rG-w@mail.gmail.com>
To: Declan Groves <dgroves@computing.dcu.ie>
Cc: Arle Lommel <arle.lommel@dfki.de>, Thomas Ruedesheim <thomas.ruedesheim@lucysoftware.com>, "<public-multilingualweb-lt@w3.org>" <public-multilingualweb-lt@w3.org>
Declan, all, thanks a lot for your feedback. I think we are close to
consensus about this, and I have given myself an ACTION-144 to put this
into the draft by next week.

Best,

Felix

2012/6/26 Declan Groves <dgroves@computing.dcu.ie>

> Felix,
>
> Thanks for your proposal for domain category, which I think outlines the
> best approach for dealing with the complex domain category so good job!
>
> The data category agnostic approach makes more sense, and allows for more
> flexibility, particularly for existing commercial MT service providers who
> will already have their own list of pre-defined domain categories. I am not
> too familiar with DCR so I dont feel qualified to comment on Arle's
> suggestion. o
>
> Using Dublin Core, however, is a good pointer to use due to its fairly
> wide adoption (on this - is it worth providing a URL to the relevant Dublin
> Core content?) - I know that many MT systems that do implement domain
> metadata do so using high-level domains either taken directly from Dublin
> Core or adapted from it (e.g. I think the LetsMT project use dublin core as
> a starting point for defining domain).  One thing to keep in mind is that
> the proposal should be as clear and concise as possible. In terms of
> providing pointers to what codes people can use, I think we are better off
> limiting this as promoting interoperability is key and providing a list
> of alternative implementation strategies may over-complicate things.
>
> It is good to emphasise the optional domainMapping attribute, and I would
> perhaps add to the paragraph concerning the explanation of domainMapping
> that although optional, it is recommended that details for the attribute be
> provided. For our implementation, I expect to carry out something similar
> to Thomas - create a mapping from the provided domain metadata to domains
> that are available for our trained systems.
>
> typo: "In source content... " -> "In the source content..."
>       "no agreed upon set of value sets" -> "no agreed upon value sets"
>
> Declan
>
>
>
> On 25 June 2012 15:43, Felix Sasaki <fsasaki@w3.org> wrote:
>
>> Hi Arle, Thomas, all,
>>
>> thanks for your feedback, Thomas, I'll fix the typos you found.
>>
>> 2012/6/25 Arle Lommel <arle.lommel@dfki.de>
>>
>>> Was this an area where the ISO data category registry might come into
>>> play?
>>>
>>
>> No - this proposal is "data category agnostic". The idea is to provide a
>> mechanism to map existing value lists (like the one Thomas mentioned).
>>
>>
>>> That is, could we declare an agreed upon selection of fairly broad
>>> top-level domains to promote interoperability while still allowing for
>>> specification by users?
>>>
>>
>>
>> After our discussion in Dublin and quite a few mails about this, see e.g.
>> the summary at
>>
>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012May/0165.html
>> or David's proposal at
>>
>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012May/0079.html
>>
>> I don't see an agreement for even top level domains.
>>
>>
>>
>>>
>>> Unfortunately there is a lot of complexity around this issue in general
>>> that we will not resolve and that may indeed be fundamentally unresolvable.
>>> But perhaps using the DCR as a place where domain ontologies can be
>>> declared in an authoritative resource and pointed to we could at least
>>> provide a way for someone to share what they mean.
>>>
>>
>>
>> There are so many running systems using their own value lists for domain
>> - I wouldn't expect that Lucy software or others would change their
>> systems. The benefit they would get with the proposal in this thread is
>> that connecting systems (e.g. MT + CMS) gets easier.
>>
>> Of course one could point users to what codes they should use. The dublin
>> core subject field I have put into the draft is such a pointer. In addition
>> I would be happy to name DCR as another area to look into, like TAUS top
>> level categories, Let's MT top level categories, etc. That is, of course we
>> want people to be aware of DCR.
>>
>> I also saw your question wrt DCR in the other thread, but I also don't
>> recall an area where we would have a direct dependency. But as I said
>> above, it would be good to inform readers of ITS 2.0 about where relying on
>> DCR makes sense.
>>
>> A related question: if I want to refer to DCR in an HTML "meta" element,
>> how would the DCR "scheme" be identified? Here is an example from dublin
>> core:
>>
>> <meta name="DCTERMS.issued" scheme="DCTERMS.W3CDTF" content="2003-11-01"
>> />
>>
>>
>> If there is an approach to do that with DCR, I think we should have an
>> example about it in ITS 2.0. Maybe you can check with the DCR experts in
>> Madrid?
>>
>>
>> Best,
>>
>> Felix
>>
>>
>>>
>>> Arle
>>>
>>> --
>>> Arle Lommel
>>> Berlin, Germany
>>> Skype: arle_lommel
>>> Phone (US): +1 707 709 8650
>>>
>>> Sent from a mobile device. Please excuse any typos.
>>>
>>> On Jun 25, 2012, at 16:02, "Thomas Ruedesheim" <
>>> thomas.ruedesheim@lucysoftware.com> wrote:
>>>
>>> Hi Felix,
>>>
>>> I agree with your proposal. (There are just 2 typos in the examples: ""
>>> in domainPointer attributes.)
>>> Lucy's MT engine accepts a global SUBJECT_AREAS parameter holding a list
>>> of domain names. Domains are organized in a hierarchy.
>>> Here is a short excerpt (first 2 levels):
>>>   General Vocabulary
>>>     Common Social Voc.
>>>       Art & Literature
>>>       Ecology, Environment Protection
>>>       Economy & Trade
>>>       Law & Legal Science
>>>       ...
>>>     Common Technical Voc.
>>>       Agriculture & Fishing
>>>       Civil Engineering
>>>       Data Processing
>>>       ...
>>> We will read the meta data and apply the mapping. Of course, the mapping
>>> is specific for the used MT tool.
>>>
>>> Cheers,
>>> Thomas
>>>
>>>
>>>
>>>  ------------------------------
>>> *From:* Felix Sasaki [mailto:fsasaki@w3.org]
>>> *Sent:* Montag, 25. Juni 2012 08:48
>>> *To:* public-multilingualweb-lt@w3.org
>>> *Subject:* [All] domain data category section proposal, please review
>>>
>>> Hi all,
>>>
>>> I have created a proposal for the domain data category, see attachment.
>>> This would resolve ISSUE-11, with the input from ACTION-87 taken into
>>> account.
>>>
>>> Declan, Thomas, I think this is esp. important for you - we need to know
>>> whether an implementation as described would be feasible and useful for
>>> you. Of course, others, feel welcome to contribute.
>>>
>>> Please make comments in this thread - I will use them to provide another
>>> version of the section.
>>>
>>> Thanks,
>>>
>>> Felix
>>>
>>> --
>>> Felix Sasaki
>>> DFKI / W3C Fellow
>>>
>>>
>>
>>
>> --
>> Felix Sasaki
>> DFKI / W3C Fellow
>>
>>
>
>
> --
> Dr. Declan Groves
> Research Integration Officer
> Centre for Next Generation Localisation (CNGL)
> Dublin City University
>
> email: dgroves@computing.dcu.ie <dgroves@computing.dcu.ie>
>  phone: +353 (0)1 700 6906
>



-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Tuesday, 26 June 2012 22:13:59 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 9 June 2013 00:24:56 UTC