W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > May 2012

Re: [All] new issue: propose to drop genre, purpose and register data category proposals

From: Tadej Stajner <tadej.stajner@ijs.si>
Date: Wed, 09 May 2012 19:34:35 +0200
Message-ID: <4FAAAAAB.40109@ijs.si>
To: public-multilingualweb-lt@w3.org
 From my perspective, genre and purpose could use some consolidation. 
Looking at the requirements page, it seems that genre is dangerously 
close to purpose, even in the examples (genre=advertising, 
purpose=advertisement). I'm indifferent to whether genre is called type 
(although 'type' is a very overloaded term when dealing with software), 
but I'm in favor of targetAudience instead of purpose, as it makes the 
distinction more obvious.

-- Tadej

On 5/9/2012 6:09 PM, Felix Sasaki wrote:
>
>
> 2012/5/9 Dr. David Filip <David.Filip@ul.ie <mailto:David.Filip@ul.ie>>
>
>     Thanks Declan, I also think that at least domain should be
>     primarily approached as a fixed ontology (it remains to be seen
>     which or which set of ontologies)
>
>     What I suggested was that some and *maybe all* of the categories
>     should allow for user defined extensions.
>
>
> I would rather prefer to say: the metadata we are developing doesn't 
> say anything about other metadata that may be related to the content 
> we are adding it to. That means, everybody is free to develop his own, 
> private values. Having private values within one field is problematic.
>
> Felix
>
>     Of course the machine to machine automation is hindered if private
>     values are being used, but at least the consumers would know that
>     private values can occur and would be prepared to display them,
>     eventually map them based on user preference..
>
>     But as George rightly points out the main issue with all the
>     related terms (rather than categories) is that the community are
>     using them freely and interchangeably although they are distinct
>     and different concepts..
>
>     Rgds
>     dF
>
>     Dr. David Filip
>     =======================
>     LRC | CNGL | LT-Web | CSIS
>     University of Limerick, Ireland
>     telephone: +353-6120-2781 <tel:%2B353-6120-2781>
>     *cellphone: +353-86-0222-158 <tel:%2B353-86-0222-158>*
>     facsimile: +353-6120-2734 <tel:%2B353-6120-2734>
>     mailto: david.filip@ul.ie <mailto:david.filip@ul.ie>
>
>
>
>     On Wed, May 9, 2012 at 1:17 PM, Declan Groves
>     <dgroves@computing.dcu.ie <mailto:dgroves@computing.dcu.ie>> wrote:
>
>         Felix,
>
>         I would with David that it is something that warrants
>         discussion at the meeting in Dublin.
>
>         In terms of domain/genre, we do have a number of very closely
>         related data categories:
>
>           * Domain
>           * Genre
>           * Purpose
>           * Register
>
>         It is important to capture both domain and style of the text
>         (which is determined by both the purpose and register data
>         categories) for contextually-accurate translation. I feel that
>         "genre" may be superfluous to our needs, but that we should
>         retain purpose (reflects the end consumer of the content) and
>         register (reflects the language style of the content). It may
>         be an idea to rename these to 'target audience' and 'type',
>         respectively, as David has suggested, if it makes the
>         distinction clearer.
>
>         I would suggest domain be mapped to an existing ontology, and
>         therefore restricted (i.e. to NOT allow user defined values),
>         but that for the other two we can leave these as user defined.
>
>
>         Declan
>
>
>
>
>
>         On 9 May 2012 12:43, Dr. David Filip <David.Filip@ul.ie
>         <mailto:David.Filip@ul.ie>> wrote:
>
>             Felix, I see where are you coming from and see your
>             argumentation line as simple = more machine to machine
>             interoperability
>
>             My personal experience with large corpora such as TDA was
>             that a single plain category is not enough to facilitate
>             slicing and dicing needed to prepare a consistent training
>             corpus from data collected in the wild. MT tuners often
>             need more orthogonal categories
>
>             In LetsMT!, they were addressing the slicing and dicing
>             need by having 3 orthogonal data categories (from my
>             recollection as WIP, not necessarily accurate)
>             domain (ISO categories subset mapped onto TDA), worked
>             quite nicely
>             target audience (general, expert, channel partner, internal)
>             type (social, web, UA, printed doc, marcom)
>
>             Milan should be able to provide more up to date detail, as
>             he (Moravia) is actively in the project..
>
>             And finally domain has pretty much the same vagueness as
>             the other three you are proposing to drop and the match
>             with existing copora and trained machines categorization
>             won't be great, so I would not expect a big gain in
>             machine to machine automation with domain only..
>
>             I suggest not to drop them at least till Dublin workshop.
>             the interested parties should be able to come with a
>             workable set of  orthogonal categories, possibly
>             consolidated (but not less than 2 IMHO) and more inline
>             with what the industry is doing. We should also consider
>             user defined values as an attractive option for all but
>             domain, or maybe even for domain..
>
>             That is my two cents :-)
>
>             Rgds
>             dF
>
>             Dr. David Filip
>             =======================
>             LRC | CNGL | LT-Web | CSIS
>             University of Limerick, Ireland
>             telephone: +353-6120-2781 <tel:%2B353-6120-2781>
>             *cellphone: +353-86-0222-158 <tel:%2B353-86-0222-158>*
>             facsimile: +353-6120-2734 <tel:%2B353-6120-2734>
>             mailto: david.filip@ul.ie <mailto:david.filip@ul.ie>
>
>
>
>             On Wed, May 9, 2012 at 10:51 AM, Felix Sasaki
>             <fsasaki@w3.org <mailto:fsasaki@w3.org>> wrote:
>
>                 Hi all,
>
>                 See ISSUE-11 : I propose to drop the genre, purpose
>                 and register data category proposals. Main reasons:
>                 - I don't see a way to come up with an agreeable and
>                 interoperable set of values.
>                 - There is no way to generate or check this kind of
>                 metadata automatically. This will lead to same path as
>                 the "keywords" attribute in the HTML meta element.
>
>                 I propose to have one data category, probably focusing
>                 on "domain", that can be produced at least to some
>                 extend automatically (= Tadej) and that can be taken
>                 up by planned implementations (=Declan).
>
>                 Thoughts?
>
>                 Felix
>
>                 -- 
>                 Felix Sasaki
>                 DFKI / W3C Fellow
>
>
>
>
>
>         -- 
>         Dr. Declan Groves
>         Research Integration Officer
>         Centre for Next Generation Localisation (CNGL)
>         Dublin City University
>
>         email: dgroves@computing.dcu.ie
>         <mailto:dgroves@computing.dcu.ie><mailto:dgroves@computing.dcu.ie>
>         phone: +353 (0)1 700 6906
>
>
>
>
>
> -- 
> Felix Sasaki
> DFKI / W3C Fellow
>
Received on Wednesday, 9 May 2012 17:35:22 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 9 June 2013 00:24:55 UTC