- From: Tadej Stajner <tadej.stajner@ijs.si>
- Date: Wed, 09 May 2012 19:34:35 +0200
- To: public-multilingualweb-lt@w3.org
- Message-ID: <4FAAAAAB.40109@ijs.si>
From my perspective, genre and purpose could use some consolidation. Looking at the requirements page, it seems that genre is dangerously close to purpose, even in the examples (genre=advertising, purpose=advertisement). I'm indifferent to whether genre is called type (although 'type' is a very overloaded term when dealing with software), but I'm in favor of targetAudience instead of purpose, as it makes the distinction more obvious. -- Tadej On 5/9/2012 6:09 PM, Felix Sasaki wrote: > > > 2012/5/9 Dr. David Filip <David.Filip@ul.ie <mailto:David.Filip@ul.ie>> > > Thanks Declan, I also think that at least domain should be > primarily approached as a fixed ontology (it remains to be seen > which or which set of ontologies) > > What I suggested was that some and *maybe all* of the categories > should allow for user defined extensions. > > > I would rather prefer to say: the metadata we are developing doesn't > say anything about other metadata that may be related to the content > we are adding it to. That means, everybody is free to develop his own, > private values. Having private values within one field is problematic. > > Felix > > Of course the machine to machine automation is hindered if private > values are being used, but at least the consumers would know that > private values can occur and would be prepared to display them, > eventually map them based on user preference.. > > But as George rightly points out the main issue with all the > related terms (rather than categories) is that the community are > using them freely and interchangeably although they are distinct > and different concepts.. > > Rgds > dF > > Dr. David Filip > ======================= > LRC | CNGL | LT-Web | CSIS > University of Limerick, Ireland > telephone: +353-6120-2781 <tel:%2B353-6120-2781> > *cellphone: +353-86-0222-158 <tel:%2B353-86-0222-158>* > facsimile: +353-6120-2734 <tel:%2B353-6120-2734> > mailto: david.filip@ul.ie <mailto:david.filip@ul.ie> > > > > On Wed, May 9, 2012 at 1:17 PM, Declan Groves > <dgroves@computing.dcu.ie <mailto:dgroves@computing.dcu.ie>> wrote: > > Felix, > > I would with David that it is something that warrants > discussion at the meeting in Dublin. > > In terms of domain/genre, we do have a number of very closely > related data categories: > > * Domain > * Genre > * Purpose > * Register > > It is important to capture both domain and style of the text > (which is determined by both the purpose and register data > categories) for contextually-accurate translation. I feel that > "genre" may be superfluous to our needs, but that we should > retain purpose (reflects the end consumer of the content) and > register (reflects the language style of the content). It may > be an idea to rename these to 'target audience' and 'type', > respectively, as David has suggested, if it makes the > distinction clearer. > > I would suggest domain be mapped to an existing ontology, and > therefore restricted (i.e. to NOT allow user defined values), > but that for the other two we can leave these as user defined. > > > Declan > > > > > > On 9 May 2012 12:43, Dr. David Filip <David.Filip@ul.ie > <mailto:David.Filip@ul.ie>> wrote: > > Felix, I see where are you coming from and see your > argumentation line as simple = more machine to machine > interoperability > > My personal experience with large corpora such as TDA was > that a single plain category is not enough to facilitate > slicing and dicing needed to prepare a consistent training > corpus from data collected in the wild. MT tuners often > need more orthogonal categories > > In LetsMT!, they were addressing the slicing and dicing > need by having 3 orthogonal data categories (from my > recollection as WIP, not necessarily accurate) > domain (ISO categories subset mapped onto TDA), worked > quite nicely > target audience (general, expert, channel partner, internal) > type (social, web, UA, printed doc, marcom) > > Milan should be able to provide more up to date detail, as > he (Moravia) is actively in the project.. > > And finally domain has pretty much the same vagueness as > the other three you are proposing to drop and the match > with existing copora and trained machines categorization > won't be great, so I would not expect a big gain in > machine to machine automation with domain only.. > > I suggest not to drop them at least till Dublin workshop. > the interested parties should be able to come with a > workable set of orthogonal categories, possibly > consolidated (but not less than 2 IMHO) and more inline > with what the industry is doing. We should also consider > user defined values as an attractive option for all but > domain, or maybe even for domain.. > > That is my two cents :-) > > Rgds > dF > > Dr. David Filip > ======================= > LRC | CNGL | LT-Web | CSIS > University of Limerick, Ireland > telephone: +353-6120-2781 <tel:%2B353-6120-2781> > *cellphone: +353-86-0222-158 <tel:%2B353-86-0222-158>* > facsimile: +353-6120-2734 <tel:%2B353-6120-2734> > mailto: david.filip@ul.ie <mailto:david.filip@ul.ie> > > > > On Wed, May 9, 2012 at 10:51 AM, Felix Sasaki > <fsasaki@w3.org <mailto:fsasaki@w3.org>> wrote: > > Hi all, > > See ISSUE-11 : I propose to drop the genre, purpose > and register data category proposals. Main reasons: > - I don't see a way to come up with an agreeable and > interoperable set of values. > - There is no way to generate or check this kind of > metadata automatically. This will lead to same path as > the "keywords" attribute in the HTML meta element. > > I propose to have one data category, probably focusing > on "domain", that can be produced at least to some > extend automatically (= Tadej) and that can be taken > up by planned implementations (=Declan). > > Thoughts? > > Felix > > -- > Felix Sasaki > DFKI / W3C Fellow > > > > > > -- > Dr. Declan Groves > Research Integration Officer > Centre for Next Generation Localisation (CNGL) > Dublin City University > > email: dgroves@computing.dcu.ie > <mailto:dgroves@computing.dcu.ie><mailto:dgroves@computing.dcu.ie> > phone: +353 (0)1 700 6906 > > > > > > -- > Felix Sasaki > DFKI / W3C Fellow >
Received on Wednesday, 9 May 2012 17:35:22 UTC