W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > April 2012

RE: [ACTION-14]: Pedro to address cache and MT disambiguation with Arle

From: Des Oates <doates@adobe.com>
Date: Thu, 5 Apr 2012 17:00:05 +0100
To: Pedro L. Díez Orzas <pedro.diez@linguaserve.com>, "'Yves Savourel'" <ysavourel@enlaso.com>, "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
Message-ID: <7B8D77012FE36343856B6DE17A307DD284A7D9A5F1@eurmbx01.eur.adobe.com>
Hi Pedro, all

I tend to agree somewhat with Yves to have a minimal set of default values for the domain value here, as it does allow for implementers to map their own meaningful values to the set where appropriate. However it should also be possible to use arbitrary values as well.  Because of the diversity of possible domains employing MT, it will not be possible to define a single enumerated set of tokens that can meaningfully disambiguate in every scenario. Furthermore implementers will train engines with very specific terminology sets which would be impossible to capture and define.  E.g. At Adobe we pass our MT stack a domain selection parameter like this that maps to a product or product range. This is used to help route the job to the engine best suited for the terminology associated with that product or range.

I also have another candidate for this data category:

   'Confidentiality'

 This is used to disambiguate content that should not be sent to public MT services such as Google Translate or Bing due to the confidential nature of the content. Rather the content should be routed to internal MT engines, or not MT'd at all.  In it's simplest form it would be a simple true/false Boolean but it could also be defined as a scalar

Do you think this would be a valid candidate to add to this category?

Thanks
Des

-----Original Message-----
From: Pedro L. Díez Orzas [mailto:pedro.diez@linguaserve.com] 
Sent: 05 April 2012 15:36
To: 'Yves Savourel'; public-multilingualweb-lt@w3.org
Subject: RE: [ACTION-14]: Pedro to address cache and MT disambiguation with Arle

Hi Yves, you are rigth, but when we talk about ontolgies and semantic info is still harder to agree on any standard reference. 

This is the reason why I addopted a neutral possition with open values, since web clients and MT providers will define with domain structure and/or semantic features should be used: they do their mapping, according to their systems and needs.

If this minimal info about that is not useful, feel free to change. I would like to know what think about it people related to MT. Maybe other working groups could also go agead with standard values for these.

Cheers,
Pedro 


-----Mensaje original-----
De: Yves Savourel [mailto:ysavourel@enlaso.com] Enviado el: jueves, 05 de abril de 2012 15:44
Para: public-multilingualweb-lt@w3.org
Asunto: RE: [ACTION-14]: Pedro to address cache and MT disambiguation with Arle

Hi Perdo, all,

> mt disambiguation data
> . domain selector: plain text content to be used by the system.
> This content is not defined and may be application specific, e.g., a 
> code used by the system, a subject name, a pointer to a location in a 
> domain ontology.
> . semantic selector: plain text content to be used by the system.
> This content is not defined and may be application specific, e.g., a 
> code used by the system, a synonym, a pointer to a location in a 
> semantic network.

I've noted from experience that selectors without a set on pre-defined values are usually useless outside the tool that defines them.

If the value to carry in a "standard" attribute is not standardized, there is little point to have such attribute.

To have minimal interoperability it seems a selector needs to have at least a minimum set of defined values. Then each system can map those to their own corresponding labels.

Cheers,
-yves
Received on Friday, 6 April 2012 07:09:30 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 9 June 2013 00:24:55 UTC