Re: [All] domain data category section proposal, please review

Hi Dave,

2012/7/4 Dave Lewis <dave.lewis@cs.tcd.ie>

>  Hi Felix,
> One question on the domainMapping example you give for the domain data
> category. This assumes the workflow has a single canonical set of IDs
> identifying 'auto', 'medicine', 'law', but this may not always be the case,
> e.g. where SMT engines are trained on a mix of parallel data with their own
> separate corpora domain naming schemes.
>

Couldn't you accomodate that by having several domainRule elements?


> So a simple naming scheme means that the workflow provider must ensure
> consistency of that scheme and that the document editor (often the client)
> has knowledge of that scheme.
>
> So could the data category  as is accommodate multiple naming schemes
> (e.g. from the client and from third parties) within the workflow by simply
> using a URL instead of a simple name?  e.g.
>
> domainMapping="automotive auto, medical medicine, 'criminal law' http://www.taus.org/domain/law, 'property law' http://www.client.com/domain-names/law"
>
> This has to be answered by Thomas and Declan, I think: they (and one
external provider) agreed on the simple scheme. I'm fine with introducing
URIs, but we need implementations making use of them.

Best,

Felix


>
> cheers,
> Dave
>
>
> On 29/06/2012 07:52, Felix Sasaki wrote:
>
> Hi all,
>
>  FYI, I wrote the domain section based on the initial proposal and this
> thread, please have a look at
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#domain
>
>  This closes ACTION-144. I also updated
>
>
> http://www.w3.org/International/multilingualweb/lt/wiki/Implementation_Commitments#New_ITS_2.0_categories
> With a link to the section.
>
>  Best,
>
>  Felix
>
> 2012/6/27 Felix Sasaki <fsasaki@w3.org>
>
>> Declan, all, thanks a lot for your feedback. I think we are close to
>> consensus about this, and I have given myself an ACTION-144 to put this
>> into the draft by next week.
>>
>> Best,
>>
>>  Felix
>>
>>
>> 2012/6/26 Declan Groves <dgroves@computing.dcu.ie>
>>
>>> Felix,
>>>
>>> Thanks for your proposal for domain category, which I think outlines the
>>> best approach for dealing with the complex domain category so good job!
>>>
>>> The data category agnostic approach makes more sense, and allows for
>>> more flexibility, particularly for existing commercial MT service providers
>>> who will already have their own list of pre-defined domain categories. I am
>>> not too familiar with DCR so I dont feel qualified to comment on Arle's
>>> suggestion. o
>>>
>>> Using Dublin Core, however, is a good pointer to use due to its fairly
>>> wide adoption (on this - is it worth providing a URL to the relevant Dublin
>>> Core content?) - I know that many MT systems that do implement domain
>>> metadata do so using high-level domains either taken directly from Dublin
>>> Core or adapted from it (e.g. I think the LetsMT project use dublin core as
>>> a starting point for defining domain).  One thing to keep in mind is
>>> that the proposal should be as clear and concise as possible. In terms of
>>> providing pointers to what codes people can use, I think we are better off
>>> limiting this as promoting interoperability is key and providing a list
>>> of alternative implementation strategies may over-complicate things.
>>>
>>> It is good to emphasise the optional domainMapping attribute, and I
>>> would perhaps add to the paragraph concerning the explanation of
>>> domainMapping that although optional, it is recommended that details for
>>> the attribute be provided. For our implementation, I expect to carry out
>>> something similar to Thomas - create a mapping from the provided domain
>>> metadata to domains that are available for our trained systems.
>>>
>>> typo: "In source content... " -> "In the source content..."
>>>       "no agreed upon set of value sets" -> "no agreed upon value sets"
>>>
>>> Declan
>>>
>>>
>>>
>>> On 25 June 2012 15:43, Felix Sasaki <fsasaki@w3.org> wrote:
>>>
>>>> Hi Arle, Thomas, all,
>>>>
>>>>  thanks for your feedback, Thomas, I'll fix the typos you found.
>>>>
>>>>  2012/6/25 Arle Lommel <arle.lommel@dfki.de>
>>>>
>>>>>  Was this an area where the ISO data category registry might come
>>>>> into play?
>>>>>
>>>>
>>>>  No - this proposal is "data category agnostic". The idea is to
>>>> provide a mechanism to map existing value lists (like the one Thomas
>>>> mentioned).
>>>>
>>>>
>>>>>  That is, could we declare an agreed upon selection of fairly broad
>>>>> top-level domains to promote interoperability while still allowing for
>>>>> specification by users?
>>>>>
>>>>
>>>>
>>>>  After our discussion in Dublin and quite a few mails about this, see
>>>> e.g. the summary at
>>>>
>>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012May/0165.html
>>>> or David's proposal at
>>>>
>>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012May/0079.html
>>>>
>>>>  I don't see an agreement for even top level domains.
>>>>
>>>>
>>>>
>>>>>
>>>>>  Unfortunately there is a lot of complexity around this issue in
>>>>> general that we will not resolve and that may indeed be fundamentally
>>>>> unresolvable. But perhaps using the DCR as a place where domain ontologies
>>>>> can be declared in an authoritative resource and pointed to we could at
>>>>> least provide a way for someone to share what they mean.
>>>>>
>>>>
>>>>
>>>>  There are so many running systems using their own value lists for
>>>> domain - I wouldn't expect that Lucy software or others would change their
>>>> systems. The benefit they would get with the proposal in this thread is
>>>> that connecting systems (e.g. MT + CMS) gets easier.
>>>>
>>>>  Of course one could point users to what codes they should use. The
>>>> dublin core subject field I have put into the draft is such a pointer. In
>>>> addition I would be happy to name DCR as another area to look into, like
>>>> TAUS top level categories, Let's MT top level categories, etc. That is, of
>>>> course we want people to be aware of DCR.
>>>>
>>>>  I also saw your question wrt DCR in the other thread, but I also
>>>> don't recall an area where we would have a direct dependency. But as I said
>>>> above, it would be good to inform readers of ITS 2.0 about where relying on
>>>> DCR makes sense.
>>>>
>>>>  A related question: if I want to refer to DCR in an HTML "meta"
>>>> element, how would the DCR "scheme" be identified? Here is an example from
>>>> dublin core:
>>>>
>>>>   <meta name="DCTERMS.issued" scheme="DCTERMS.W3CDTF"
>>>> content="2003-11-01" />
>>>>
>>>>
>>>>  If there is an approach to do that with DCR, I think we should have
>>>> an example about it in ITS 2.0. Maybe you can check with the DCR experts in
>>>> Madrid?
>>>>
>>>>
>>>>  Best,
>>>>
>>>>  Felix
>>>>
>>>>
>>>>>
>>>>>  Arle
>>>>>
>>>>> --
>>>>> Arle Lommel
>>>>> Berlin, Germany
>>>>> Skype: arle_lommel
>>>>> Phone (US): +1 707 709 8650 <%2B1%20707%20709%208650>
>>>>>
>>>>>  Sent from a mobile device. Please excuse any typos.
>>>>>
>>>>> On Jun 25, 2012, at 16:02, "Thomas Ruedesheim" <
>>>>> thomas.ruedesheim@lucysoftware.com> wrote:
>>>>>
>>>>>     Hi Felix,
>>>>>
>>>>> I agree with your proposal. (There are just 2 typos in the examples:
>>>>> "" in domainPointer attributes.)
>>>>> Lucy's MT engine accepts a global SUBJECT_AREAS parameter holding a
>>>>> list of domain names. Domains are organized in a hierarchy.
>>>>> Here is a short excerpt (first 2 levels):
>>>>>   General Vocabulary
>>>>>     Common Social Voc.
>>>>>       Art & Literature
>>>>>       Ecology, Environment Protection
>>>>>       Economy & Trade
>>>>>       Law & Legal Science
>>>>>       ...
>>>>>     Common Technical Voc.
>>>>>       Agriculture & Fishing
>>>>>       Civil Engineering
>>>>>       Data Processing
>>>>>       ...
>>>>> We will read the meta data and apply the mapping. Of course, the
>>>>> mapping is specific for the used MT tool.
>>>>>
>>>>> Cheers,
>>>>> Thomas
>>>>>
>>>>>
>>>>>
>>>>>  ------------------------------
>>>>> *From:* Felix Sasaki [mailto:fsasaki@w3.org]
>>>>> *Sent:* Montag, 25. Juni 2012 08:48
>>>>> *To:* public-multilingualweb-lt@w3.org
>>>>> *Subject:* [All] domain data category section proposal, please review
>>>>>
>>>>>   Hi all,
>>>>>
>>>>>  I have created a proposal for the domain data category, see
>>>>> attachment. This would resolve ISSUE-11, with the input from ACTION-87
>>>>> taken into account.
>>>>>
>>>>>  Declan, Thomas, I think this is esp. important for you - we need to
>>>>> know whether an implementation as described would be feasible and useful
>>>>> for you. Of course, others, feel welcome to contribute.
>>>>>
>>>>>  Please make comments in this thread - I will use them to provide
>>>>> another version of the section.
>>>>>
>>>>>  Thanks,
>>>>>
>>>>>  Felix
>>>>>
>>>>>  --
>>>>> Felix Sasaki
>>>>> DFKI / W3C Fellow
>>>>>
>>>>>
>>>>
>>>>
>>>>  --
>>>> Felix Sasaki
>>>> DFKI / W3C Fellow
>>>>
>>>>
>>>
>>>
>>>  --
>>> Dr. Declan Groves
>>> Research Integration Officer
>>> Centre for Next Generation Localisation (CNGL)
>>> Dublin City University
>>>
>>> email: dgroves@computing.dcu.ie <dgroves@computing.dcu.ie>
>>>  phone: +353 (0)1 700 6906
>>>
>>
>>
>>
>>  --
>> Felix Sasaki
>> DFKI / W3C Fellow
>>
>>
>
>
>  --
> Felix Sasaki
> DFKI / W3C Fellow
>
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Wednesday, 4 July 2012 04:45:51 UTC