- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 9 May 2012 18:09:10 +0200
- To: "Dr. David Filip" <David.Filip@ul.ie>
- Cc: Declan Groves <dgroves@computing.dcu.ie>, Milan Karasek <MilanK@moraviaworldwide.com>, public-multilingualweb-lt@w3.org, Georg Rehm <georg.rehm@dfki.de>
- Message-ID: <CAL58czqu6yzLhLgC4Eq1=97Mw4dT+NRmn=Wk58huxjjjj0P34g@mail.gmail.com>
2012/5/9 Dr. David Filip <David.Filip@ul.ie> > Thanks Declan, I also think that at least domain should be primarily > approached as a fixed ontology (it remains to be seen which or which set of > ontologies) > > What I suggested was that some and *maybe all* of the categories should > allow for user defined extensions. > I would rather prefer to say: the metadata we are developing doesn't say anything about other metadata that may be related to the content we are adding it to. That means, everybody is free to develop his own, private values. Having private values within one field is problematic. Felix > Of course the machine to machine automation is hindered if private values > are being used, but at least the consumers would know that private values > can occur and would be prepared to display them, eventually map them based > on user preference.. > > But as George rightly points out the main issue with all the related terms > (rather than categories) is that the community are using them freely and > interchangeably although they are distinct and different concepts.. > > Rgds > dF > > Dr. David Filip > ======================= > LRC | CNGL | LT-Web | CSIS > University of Limerick, Ireland > telephone: +353-6120-2781 > *cellphone: +353-86-0222-158* > facsimile: +353-6120-2734 > mailto: david.filip@ul.ie > > > > On Wed, May 9, 2012 at 1:17 PM, Declan Groves <dgroves@computing.dcu.ie>wrote: > >> Felix, >> >> I would with David that it is something that warrants discussion at the >> meeting in Dublin. >> >> In terms of domain/genre, we do have a number of very closely related >> data categories: >> >> - Domain >> - Genre >> - Purpose >> - Register >> >> It is important to capture both domain and style of the text (which is >> determined by both the purpose and register data categories) for >> contextually-accurate translation. I feel that "genre" may be superfluous >> to our needs, but that we should retain purpose (reflects the end consumer >> of the content) and register (reflects the language style of the content). >> It may be an idea to rename these to 'target audience' and 'type', >> respectively, as David has suggested, if it makes the distinction clearer. >> >> I would suggest domain be mapped to an existing ontology, and therefore >> restricted (i.e. to NOT allow user defined values), but that for the other >> two we can leave these as user defined. >> >> >> Declan >> >> >> >> >> >> On 9 May 2012 12:43, Dr. David Filip <David.Filip@ul.ie> wrote: >> >>> Felix, I see where are you coming from and see your argumentation line >>> as simple = more machine to machine interoperability >>> >>> My personal experience with large corpora such as TDA was that a single >>> plain category is not enough to facilitate slicing and dicing needed to >>> prepare a consistent training corpus from data collected in the wild. MT >>> tuners often need more orthogonal categories >>> >>> In LetsMT!, they were addressing the slicing and dicing need by having 3 >>> orthogonal data categories (from my recollection as WIP, not necessarily >>> accurate) >>> domain (ISO categories subset mapped onto TDA), worked quite nicely >>> target audience (general, expert, channel partner, internal) >>> type (social, web, UA, printed doc, marcom) >>> >>> Milan should be able to provide more up to date detail, as he (Moravia) >>> is actively in the project.. >>> >>> And finally domain has pretty much the same vagueness as the other three >>> you are proposing to drop and the match with existing copora and trained >>> machines categorization won't be great, so I would not expect a big gain in >>> machine to machine automation with domain only.. >>> >>> I suggest not to drop them at least till Dublin workshop. the interested >>> parties should be able to come with a workable set of orthogonal >>> categories, possibly consolidated (but not less than 2 IMHO) and more >>> inline with what the industry is doing. We should also consider user >>> defined values as an attractive option for all but domain, or maybe even >>> for domain.. >>> >>> That is my two cents :-) >>> >>> Rgds >>> dF >>> >>> Dr. David Filip >>> ======================= >>> LRC | CNGL | LT-Web | CSIS >>> University of Limerick, Ireland >>> telephone: +353-6120-2781 >>> *cellphone: +353-86-0222-158* >>> facsimile: +353-6120-2734 >>> mailto: david.filip@ul.ie >>> >>> >>> >>> On Wed, May 9, 2012 at 10:51 AM, Felix Sasaki <fsasaki@w3.org> wrote: >>> >>>> Hi all, >>>> >>>> See ISSUE-11 : I propose to drop the genre, purpose and register data >>>> category proposals. Main reasons: >>>> - I don't see a way to come up with an agreeable and interoperable set >>>> of values. >>>> - There is no way to generate or check this kind of metadata >>>> automatically. This will lead to same path as the "keywords" attribute in >>>> the HTML meta element. >>>> >>>> I propose to have one data category, probably focusing on "domain", >>>> that can be produced at least to some extend automatically (= Tadej) and >>>> that can be taken up by planned implementations (=Declan). >>>> >>>> Thoughts? >>>> >>>> Felix >>>> >>>> -- >>>> Felix Sasaki >>>> DFKI / W3C Fellow >>>> >>> >>> >> >> >> -- >> Dr. Declan Groves >> Research Integration Officer >> Centre for Next Generation Localisation (CNGL) >> Dublin City University >> >> email: dgroves@computing.dcu.ie <dgroves@computing.dcu.ie> >> phone: +353 (0)1 700 6906 >> > > -- Felix Sasaki DFKI / W3C Fellow
Received on Wednesday, 9 May 2012 16:09:40 UTC