- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 4 Jul 2012 06:45:25 +0200
- To: Dave Lewis <dave.lewis@cs.tcd.ie>
- Cc: public-multilingualweb-lt@w3.org
- Message-ID: <CAL58czrMqX8oVJiOxr9jO7XsedD92y5LucdMu3KiGQSMpRyR8Q@mail.gmail.com>
Hi Dave, 2012/7/4 Dave Lewis <dave.lewis@cs.tcd.ie> > Hi Felix, > One question on the domainMapping example you give for the domain data > category. This assumes the workflow has a single canonical set of IDs > identifying 'auto', 'medicine', 'law', but this may not always be the case, > e.g. where SMT engines are trained on a mix of parallel data with their own > separate corpora domain naming schemes. > Couldn't you accomodate that by having several domainRule elements? > So a simple naming scheme means that the workflow provider must ensure > consistency of that scheme and that the document editor (often the client) > has knowledge of that scheme. > > So could the data category as is accommodate multiple naming schemes > (e.g. from the client and from third parties) within the workflow by simply > using a URL instead of a simple name? e.g. > > domainMapping="automotive auto, medical medicine, 'criminal law' http://www.taus.org/domain/law, 'property law' http://www.client.com/domain-names/law" > > This has to be answered by Thomas and Declan, I think: they (and one external provider) agreed on the simple scheme. I'm fine with introducing URIs, but we need implementations making use of them. Best, Felix > > cheers, > Dave > > > On 29/06/2012 07:52, Felix Sasaki wrote: > > Hi all, > > FYI, I wrote the domain section based on the initial proposal and this > thread, please have a look at > http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#domain > > This closes ACTION-144. I also updated > > > http://www.w3.org/International/multilingualweb/lt/wiki/Implementation_Commitments#New_ITS_2.0_categories > With a link to the section. > > Best, > > Felix > > 2012/6/27 Felix Sasaki <fsasaki@w3.org> > >> Declan, all, thanks a lot for your feedback. I think we are close to >> consensus about this, and I have given myself an ACTION-144 to put this >> into the draft by next week. >> >> Best, >> >> Felix >> >> >> 2012/6/26 Declan Groves <dgroves@computing.dcu.ie> >> >>> Felix, >>> >>> Thanks for your proposal for domain category, which I think outlines the >>> best approach for dealing with the complex domain category so good job! >>> >>> The data category agnostic approach makes more sense, and allows for >>> more flexibility, particularly for existing commercial MT service providers >>> who will already have their own list of pre-defined domain categories. I am >>> not too familiar with DCR so I dont feel qualified to comment on Arle's >>> suggestion. o >>> >>> Using Dublin Core, however, is a good pointer to use due to its fairly >>> wide adoption (on this - is it worth providing a URL to the relevant Dublin >>> Core content?) - I know that many MT systems that do implement domain >>> metadata do so using high-level domains either taken directly from Dublin >>> Core or adapted from it (e.g. I think the LetsMT project use dublin core as >>> a starting point for defining domain). One thing to keep in mind is >>> that the proposal should be as clear and concise as possible. In terms of >>> providing pointers to what codes people can use, I think we are better off >>> limiting this as promoting interoperability is key and providing a list >>> of alternative implementation strategies may over-complicate things. >>> >>> It is good to emphasise the optional domainMapping attribute, and I >>> would perhaps add to the paragraph concerning the explanation of >>> domainMapping that although optional, it is recommended that details for >>> the attribute be provided. For our implementation, I expect to carry out >>> something similar to Thomas - create a mapping from the provided domain >>> metadata to domains that are available for our trained systems. >>> >>> typo: "In source content... " -> "In the source content..." >>> "no agreed upon set of value sets" -> "no agreed upon value sets" >>> >>> Declan >>> >>> >>> >>> On 25 June 2012 15:43, Felix Sasaki <fsasaki@w3.org> wrote: >>> >>>> Hi Arle, Thomas, all, >>>> >>>> thanks for your feedback, Thomas, I'll fix the typos you found. >>>> >>>> 2012/6/25 Arle Lommel <arle.lommel@dfki.de> >>>> >>>>> Was this an area where the ISO data category registry might come >>>>> into play? >>>>> >>>> >>>> No - this proposal is "data category agnostic". The idea is to >>>> provide a mechanism to map existing value lists (like the one Thomas >>>> mentioned). >>>> >>>> >>>>> That is, could we declare an agreed upon selection of fairly broad >>>>> top-level domains to promote interoperability while still allowing for >>>>> specification by users? >>>>> >>>> >>>> >>>> After our discussion in Dublin and quite a few mails about this, see >>>> e.g. the summary at >>>> >>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012May/0165.html >>>> or David's proposal at >>>> >>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012May/0079.html >>>> >>>> I don't see an agreement for even top level domains. >>>> >>>> >>>> >>>>> >>>>> Unfortunately there is a lot of complexity around this issue in >>>>> general that we will not resolve and that may indeed be fundamentally >>>>> unresolvable. But perhaps using the DCR as a place where domain ontologies >>>>> can be declared in an authoritative resource and pointed to we could at >>>>> least provide a way for someone to share what they mean. >>>>> >>>> >>>> >>>> There are so many running systems using their own value lists for >>>> domain - I wouldn't expect that Lucy software or others would change their >>>> systems. The benefit they would get with the proposal in this thread is >>>> that connecting systems (e.g. MT + CMS) gets easier. >>>> >>>> Of course one could point users to what codes they should use. The >>>> dublin core subject field I have put into the draft is such a pointer. In >>>> addition I would be happy to name DCR as another area to look into, like >>>> TAUS top level categories, Let's MT top level categories, etc. That is, of >>>> course we want people to be aware of DCR. >>>> >>>> I also saw your question wrt DCR in the other thread, but I also >>>> don't recall an area where we would have a direct dependency. But as I said >>>> above, it would be good to inform readers of ITS 2.0 about where relying on >>>> DCR makes sense. >>>> >>>> A related question: if I want to refer to DCR in an HTML "meta" >>>> element, how would the DCR "scheme" be identified? Here is an example from >>>> dublin core: >>>> >>>> <meta name="DCTERMS.issued" scheme="DCTERMS.W3CDTF" >>>> content="2003-11-01" /> >>>> >>>> >>>> If there is an approach to do that with DCR, I think we should have >>>> an example about it in ITS 2.0. Maybe you can check with the DCR experts in >>>> Madrid? >>>> >>>> >>>> Best, >>>> >>>> Felix >>>> >>>> >>>>> >>>>> Arle >>>>> >>>>> -- >>>>> Arle Lommel >>>>> Berlin, Germany >>>>> Skype: arle_lommel >>>>> Phone (US): +1 707 709 8650 <%2B1%20707%20709%208650> >>>>> >>>>> Sent from a mobile device. Please excuse any typos. >>>>> >>>>> On Jun 25, 2012, at 16:02, "Thomas Ruedesheim" < >>>>> thomas.ruedesheim@lucysoftware.com> wrote: >>>>> >>>>> Hi Felix, >>>>> >>>>> I agree with your proposal. (There are just 2 typos in the examples: >>>>> "" in domainPointer attributes.) >>>>> Lucy's MT engine accepts a global SUBJECT_AREAS parameter holding a >>>>> list of domain names. Domains are organized in a hierarchy. >>>>> Here is a short excerpt (first 2 levels): >>>>> General Vocabulary >>>>> Common Social Voc. >>>>> Art & Literature >>>>> Ecology, Environment Protection >>>>> Economy & Trade >>>>> Law & Legal Science >>>>> ... >>>>> Common Technical Voc. >>>>> Agriculture & Fishing >>>>> Civil Engineering >>>>> Data Processing >>>>> ... >>>>> We will read the meta data and apply the mapping. Of course, the >>>>> mapping is specific for the used MT tool. >>>>> >>>>> Cheers, >>>>> Thomas >>>>> >>>>> >>>>> >>>>> ------------------------------ >>>>> *From:* Felix Sasaki [mailto:fsasaki@w3.org] >>>>> *Sent:* Montag, 25. Juni 2012 08:48 >>>>> *To:* public-multilingualweb-lt@w3.org >>>>> *Subject:* [All] domain data category section proposal, please review >>>>> >>>>> Hi all, >>>>> >>>>> I have created a proposal for the domain data category, see >>>>> attachment. This would resolve ISSUE-11, with the input from ACTION-87 >>>>> taken into account. >>>>> >>>>> Declan, Thomas, I think this is esp. important for you - we need to >>>>> know whether an implementation as described would be feasible and useful >>>>> for you. Of course, others, feel welcome to contribute. >>>>> >>>>> Please make comments in this thread - I will use them to provide >>>>> another version of the section. >>>>> >>>>> Thanks, >>>>> >>>>> Felix >>>>> >>>>> -- >>>>> Felix Sasaki >>>>> DFKI / W3C Fellow >>>>> >>>>> >>>> >>>> >>>> -- >>>> Felix Sasaki >>>> DFKI / W3C Fellow >>>> >>>> >>> >>> >>> -- >>> Dr. Declan Groves >>> Research Integration Officer >>> Centre for Next Generation Localisation (CNGL) >>> Dublin City University >>> >>> email: dgroves@computing.dcu.ie <dgroves@computing.dcu.ie> >>> phone: +353 (0)1 700 6906 >>> >> >> >> >> -- >> Felix Sasaki >> DFKI / W3C Fellow >> >> > > > -- > Felix Sasaki > DFKI / W3C Fellow > > > > -- Felix Sasaki DFKI / W3C Fellow
Received on Wednesday, 4 July 2012 04:45:51 UTC