Re: mlw-lt-track-ISSUE-42 (tool-and-confidence-related-information): Tool and confidence related information is similar for mtConfidence, textAnalysisAnnotation, quality

Some more information about this issue: a while ago I had a mail exchange
with Christian Lieske who pointed me to what might be an existing solution
for specifying “Configurations for/Information on Linguistic Processors”,
or a "lingProcInfo" data category. Below information is tailored towards
terminology, but it might provide us with some ideas.

Felix

---------- Forwarded message ----------
From: Lieske, Christian <christian.lieske@sap.com>
Date: 2012/7/27
Subject: Meta Data related to Configurations for/Information on Linguistic
Processors
To: "naber@danielnaber.de" <naber@danielnaber.de>, "Nickel, Inna" <
inna.nickel@sap.com>
Cc: "'Felix Sasaki' (felix.sasaki@dfki.de)" <felix.sasaki@dfki.de>


Hi,****

** **

As mentioned by Felix, MultilingualWeb-LT Working
Group<http://www.w3.org/International/multilingualweb/lt/> is
discussing what possibly can be called “Configurations for/Information on
Linguistic Processors”: If for example a translation is produced by a
Machine Translation engine, the corresponding translation may need to be
annotated with information such as “This translation was produce with
version X of engine Y. The dataset that was used to train the engine was D,
and the corresponding language model was L”. In a world which recognizes
the importance of trust/reliability this kind of meta data from my point of
view is how importance. From my point of view, the “Configurations
for/Information on Linguistic Processors” thus is also related to
discussion surrounding “provenance”.****

** **

The MLW-LT discussion reminded me of the fact that I did some work on the
topic a while ago: In the context of the Open Lexicon Interchange Format
(OLIF; see http://www.tekom.de/upload/2284/OASIS_40_Lieske.pdf ), I
investigated what type of configuration/information for example may have to
be captured in the context of Term Extraction processors. The outcome of
the investigations found its way into the latest version of OLIF in the
disguise of the “termExtrInfo” element (see attached screenshot of the OLIF
3.0 schema, and schema at
http://www.olif.net/downloads/OLIF-3.0-Beta-20Feb2008-v5.zip).****

** **

The “termExtrInfo” is basically a set of data categories that allow you to
capture for example the following:****

** **

**1.       **Tool Info and Features: Info on features of the tool that was
used (you can for example provide info on which approach to morphological
analysis is implemented)****

**2.       **Input Info: Info on features of the data that was fed into the
tool (you can for example get a feeling for the quality you can expect)****

**3.       **Process Info: Info on the process that involved tool and input
(you can for example capture that you did something for a specific client)**
**

** **

Possibly, the “termExtrInfo” could serve as a model for a more general
“lingProcInfo”.****

** **

Cheers,****

Christian****

** **

[image: Description: cid:image001.jpg@01CD6A4E.153C0220]****

*Christian Lieske**
*Knowledge Architect
SAP Language Services (SLS) - “*Translating SAP for the World*“
SAP Globalization Services
*SAP AG
*SAP Allee 15
D-68789 St. Leon-Rot
Germany
T +49 (62 27) 7 - 6 13 03
F +49 (62 27) 7 – 2 54 18
mail to:*christian.lieske@sap.com**
**www.sap.com*****

Pflichtangaben/Mandatory Disclosure Statements: *
http://www.sap.com/company/legal/impressum.epx*<http://www.sap.com/company/legal/impressum.epx>
Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige
vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich
erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine
Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt. Bitte
benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen
Dank.

This e-mail may contain trade secrets or privileged, undisclosed, or
otherwise confidential information. If you have received this e-mail in
error, you are hereby notified that any review, copying, or distribution of
it is strictly prohibited. Please inform us immediately and destroy the
original transmittal. Thank you for your cooperation.****

Received on Thursday, 9 August 2012 10:57:43 UTC