W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > May 2012

Re: [All] new issue: propose to drop genre, purpose and register data category proposals

From: Georg Rehm <georg.rehm@dfki.de>
Date: Wed, 9 May 2012 14:27:10 +0200
Cc: Felix Sasaki <fsasaki@w3.org>, "Dr. David Filip" <David.Filip@ul.ie>, Milan Karasek <MilanK@moraviaworldwide.com>, public-multilingualweb-lt@w3.org
Message-Id: <9030E093-FD13-4069-AA96-D29B2AF11430@dfki.de>
To: Declan Groves <dgroves@computing.dcu.ie>
Dear all,

I've been involved in genre- and web-genre-related research from 2000 until ca. 2008. As far as I know, there is neither a clear or shared definition of what "genre" (or "style" or "register", for that matter) actually refers to, nor a clearly defined set of actual values. We've tried a couple of times to assemble a community that defines a set of values for the concept of genre but all of these attempts have, long story short, failed. One of the main problems is that "genre", "register", "domain", "text type" and a few other terms are used arbitrarily and interchangeably by the community even though they are completely different concepts.

If you want I could approach Marina Santini who is generally regarded as the main person in this field and who is still involved in genre and web genre research. I'm sure Marina would give a short statement in Dublin with regard to that topic.

Best,
Georg


On May 9, 2012, at 14:17, Declan Groves wrote:

> Felix,
> 
> I would with David that it is something that warrants discussion at the meeting in Dublin.
> 
> In terms of domain/genre, we do have a number of very closely related data categories:
> Domain
> Genre
> Purpose
> Register
> It is important to capture both domain and style of the text (which is determined by both the purpose and register data categories) for contextually-accurate translation. I feel that "genre" may be superfluous to our needs, but that we should retain purpose (reflects the end consumer of the content) and register (reflects the language style of the content). It may be an idea to rename these to 'target audience' and 'type', respectively, as David has suggested, if it makes the distinction clearer. 
> 
> 
> I would suggest domain be mapped to an existing ontology, and therefore restricted (i.e. to NOT allow user defined values), but that for the other two we can leave these as user defined.
> 
> 
> 
> Declan
> 
> 
> 
> 
> 
> 
> On 9 May 2012 12:43, Dr. David Filip <David.Filip@ul.ie> wrote:
> Felix, I see where are you coming from and see your argumentation line as simple = more machine to machine interoperability
> 
> My personal experience with large corpora such as TDA was that a single plain category is not enough to facilitate slicing and dicing needed to prepare a consistent training corpus from data collected in the wild. MT tuners often need more orthogonal categories 
> 
> In LetsMT!, they were addressing the slicing and dicing need by having 3 orthogonal data categories (from my recollection as WIP, not necessarily accurate)
> domain (ISO categories subset mapped onto TDA), worked quite nicely
> target audience (general, expert, channel partner, internal)
> type (social, web, UA, printed doc, marcom)  
> 
> Milan should be able to provide more up to date detail, as he (Moravia) is actively in the project..
> 
> And finally domain has pretty much the same vagueness as the other three you are proposing to drop and the match with existing copora and trained machines categorization won't be great, so I would not expect a big gain in machine to machine automation with domain only..
> 
> I suggest not to drop them at least till Dublin workshop. the interested parties should be able to come with a workable set of  orthogonal categories, possibly consolidated (but not less than 2 IMHO) and more inline with what the industry is doing. We should also consider user defined values as an attractive option for all but domain, or maybe even for domain..
> 
> That is my two cents :-)
> 
> Rgds
> dF 
> 
> Dr. David Filip
> =======================
> LRC | CNGL | LT-Web | CSIS
> University of Limerick, Ireland
> telephone: +353-6120-2781
> cellphone: +353-86-0222-158
> facsimile: +353-6120-2734
> mailto: david.filip@ul.ie
> 
> 
> 
> On Wed, May 9, 2012 at 10:51 AM, Felix Sasaki <fsasaki@w3.org> wrote:
> Hi all,
> 
> See ISSUE-11 : I propose to drop the genre, purpose and register data category proposals. Main reasons:
> - I don't see a way to come up with an agreeable and interoperable set of values.
> - There is no way to generate or check this kind of metadata automatically. This will lead to same path as the "keywords" attribute in the HTML meta element.
> 
> I propose to have one data category, probably focusing on "domain", that can be produced at least to some extend automatically (= Tadej) and that can be taken up by planned implementations (=Declan).  
> 
> Thoughts?
> 
> Felix
> 
> -- 
> Felix Sasaki
> DFKI / W3C Fellow
> 
> 
> 
> 
> -- 
> Dr. Declan Groves
> Research Integration Officer
> Centre for Next Generation Localisation (CNGL)
> Dublin City University
> 
> email: dgroves@computing.dcu.ie
> phone: +353 (0)1 700 6906

-- 
Dr. Georg Rehm
Network Manager META-NET

DFKI GmbH, Alt-Moabit 91c, 10559 Berlin, Germany
Phone: +49 30 23895-1833 – Fax: -1810
Mobile: +49 173 2735829
georg.rehm@dfki.de – georg.rehm@meta-net.eu
Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
Geschäftsführung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender), Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Received on Wednesday, 9 May 2012 21:28:09 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 9 June 2013 00:24:55 UTC