W3C home > Mailing lists > Public > public-multilingualweb-lt-comments@w3.org > January 2013

Comment on ITS 2.0 WD-its20-20121206 - Domain

From: Lieske, Christian <christian.lieske@sap.com>
Date: Thu, 10 Jan 2013 15:52:54 +0100
To: "public-multilingualweb-lt-comments@w3.org" <public-multilingualweb-lt-comments@w3.org>
Message-ID: <8EA44C66E2911C4AB21558F4720695DC60D7D73D87@DEWDFECCR01.wdf.sap.corp>

Please find below comments/observations/questions/ideas concerning the ITS 2.0 working draft dated December 6, 2012 (http://www.w3.org/TR/2012/WD-its20-20121206/).  Please feel free to contact me for clarifications if anything is unclear.

Many of the manual or automated language-related processes I am aware of benefit from information on what often is referred to as "domain" (or subject matter area). Accordingly, relating ITS 2.0  - with its direction to move ITS 1.0 closer to Natural Language Processing (NLP) - may help to address this.

While looking at "domain" from this angle, I started to wonder if it could benefit from additions/modifications. I apologize in advance if a reply to this comment may require that discussions which presumably already took place may have to be summarized.

Here are my observations/questions/ideas:

1.       As it stands, "domain" only allows "pointing". Some scenarios may require a "direct encoding" (e.g. via something like its-domain="financials")

2.       Currently, "domain" does not seem to take into account the following realities that I have seen

a.       Domain "systems" may not be harmonized across a processing chain. A Translation Memory component may for example work with different domains than a Machine Translation system that is part of the same processing chain. Since ITS 2.0 "domain" currently does not allow to capture the information "This is for component X" these scenarios cannot be addressed.

b.      Implementations that work with domain information often apply concepts that sometimes are termed "fallback", "secondary component", or the like. Example: A Terminology Management component that is used for automated term lookup may encode a rule such as "Search in domain A-A1-A1X and all its ancestors (ie. also A-A1 and A). Hits from domains deeper in the hierarchy should receive a higher score than hits further up - thus, a hit from A would receive a lower score than a hit from A-A1-A1X".  This currently cannot be addressed due to the modeling chosen for "domainMapping".

Received on Thursday, 10 January 2013 14:53:26 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:32:26 UTC