Re: [All] new issue: propose to drop genre, purpose and register data category proposals

 From my perspective, genre and purpose could use some consolidation. 
Looking at the requirements page, it seems that genre is dangerously 
close to purpose, even in the examples (genre=advertising, 
purpose=advertisement). I'm indifferent to whether genre is called type 
(although 'type' is a very overloaded term when dealing with software), 
but I'm in favor of targetAudience instead of purpose, as it makes the 
distinction more obvious.

-- Tadej

On 5/9/2012 6:09 PM, Felix Sasaki wrote:
>
>
> 2012/5/9 Dr. David Filip <David.Filip@ul.ie <mailto:David.Filip@ul.ie>>
>
>     Thanks Declan, I also think that at least domain should be
>     primarily approached as a fixed ontology (it remains to be seen
>     which or which set of ontologies)
>
>     What I suggested was that some and *maybe all* of the categories
>     should allow for user defined extensions.
>
>
> I would rather prefer to say: the metadata we are developing doesn't 
> say anything about other metadata that may be related to the content 
> we are adding it to. That means, everybody is free to develop his own, 
> private values. Having private values within one field is problematic.
>
> Felix
>
>     Of course the machine to machine automation is hindered if private
>     values are being used, but at least the consumers would know that
>     private values can occur and would be prepared to display them,
>     eventually map them based on user preference..
>
>     But as George rightly points out the main issue with all the
>     related terms (rather than categories) is that the community are
>     using them freely and interchangeably although they are distinct
>     and different concepts..
>
>     Rgds
>     dF
>
>     Dr. David Filip
>     =======================
>     LRC | CNGL | LT-Web | CSIS
>     University of Limerick, Ireland
>     telephone: +353-6120-2781 <tel:%2B353-6120-2781>
>     *cellphone: +353-86-0222-158 <tel:%2B353-86-0222-158>*
>     facsimile: +353-6120-2734 <tel:%2B353-6120-2734>
>     mailto: david.filip@ul.ie <mailto:david.filip@ul.ie>
>
>
>
>     On Wed, May 9, 2012 at 1:17 PM, Declan Groves
>     <dgroves@computing.dcu.ie <mailto:dgroves@computing.dcu.ie>> wrote:
>
>         Felix,
>
>         I would with David that it is something that warrants
>         discussion at the meeting in Dublin.
>
>         In terms of domain/genre, we do have a number of very closely
>         related data categories:
>
>           * Domain
>           * Genre
>           * Purpose
>           * Register
>
>         It is important to capture both domain and style of the text
>         (which is determined by both the purpose and register data
>         categories) for contextually-accurate translation. I feel that
>         "genre" may be superfluous to our needs, but that we should
>         retain purpose (reflects the end consumer of the content) and
>         register (reflects the language style of the content). It may
>         be an idea to rename these to 'target audience' and 'type',
>         respectively, as David has suggested, if it makes the
>         distinction clearer.
>
>         I would suggest domain be mapped to an existing ontology, and
>         therefore restricted (i.e. to NOT allow user defined values),
>         but that for the other two we can leave these as user defined.
>
>
>         Declan
>
>
>
>
>
>         On 9 May 2012 12:43, Dr. David Filip <David.Filip@ul.ie
>         <mailto:David.Filip@ul.ie>> wrote:
>
>             Felix, I see where are you coming from and see your
>             argumentation line as simple = more machine to machine
>             interoperability
>
>             My personal experience with large corpora such as TDA was
>             that a single plain category is not enough to facilitate
>             slicing and dicing needed to prepare a consistent training
>             corpus from data collected in the wild. MT tuners often
>             need more orthogonal categories
>
>             In LetsMT!, they were addressing the slicing and dicing
>             need by having 3 orthogonal data categories (from my
>             recollection as WIP, not necessarily accurate)
>             domain (ISO categories subset mapped onto TDA), worked
>             quite nicely
>             target audience (general, expert, channel partner, internal)
>             type (social, web, UA, printed doc, marcom)
>
>             Milan should be able to provide more up to date detail, as
>             he (Moravia) is actively in the project..
>
>             And finally domain has pretty much the same vagueness as
>             the other three you are proposing to drop and the match
>             with existing copora and trained machines categorization
>             won't be great, so I would not expect a big gain in
>             machine to machine automation with domain only..
>
>             I suggest not to drop them at least till Dublin workshop.
>             the interested parties should be able to come with a
>             workable set of  orthogonal categories, possibly
>             consolidated (but not less than 2 IMHO) and more inline
>             with what the industry is doing. We should also consider
>             user defined values as an attractive option for all but
>             domain, or maybe even for domain..
>
>             That is my two cents :-)
>
>             Rgds
>             dF
>
>             Dr. David Filip
>             =======================
>             LRC | CNGL | LT-Web | CSIS
>             University of Limerick, Ireland
>             telephone: +353-6120-2781 <tel:%2B353-6120-2781>
>             *cellphone: +353-86-0222-158 <tel:%2B353-86-0222-158>*
>             facsimile: +353-6120-2734 <tel:%2B353-6120-2734>
>             mailto: david.filip@ul.ie <mailto:david.filip@ul.ie>
>
>
>
>             On Wed, May 9, 2012 at 10:51 AM, Felix Sasaki
>             <fsasaki@w3.org <mailto:fsasaki@w3.org>> wrote:
>
>                 Hi all,
>
>                 See ISSUE-11 : I propose to drop the genre, purpose
>                 and register data category proposals. Main reasons:
>                 - I don't see a way to come up with an agreeable and
>                 interoperable set of values.
>                 - There is no way to generate or check this kind of
>                 metadata automatically. This will lead to same path as
>                 the "keywords" attribute in the HTML meta element.
>
>                 I propose to have one data category, probably focusing
>                 on "domain", that can be produced at least to some
>                 extend automatically (= Tadej) and that can be taken
>                 up by planned implementations (=Declan).
>
>                 Thoughts?
>
>                 Felix
>
>                 -- 
>                 Felix Sasaki
>                 DFKI / W3C Fellow
>
>
>
>
>
>         -- 
>         Dr. Declan Groves
>         Research Integration Officer
>         Centre for Next Generation Localisation (CNGL)
>         Dublin City University
>
>         email: dgroves@computing.dcu.ie
>         <mailto:dgroves@computing.dcu.ie><mailto:dgroves@computing.dcu.ie>
>         phone: +353 (0)1 700 6906
>
>
>
>
>
> -- 
> Felix Sasaki
> DFKI / W3C Fellow
>

Received on Wednesday, 9 May 2012 17:35:22 UTC