Further input to issue-42 (Fwd: Meta Data related to Configurations for/Information on Linguistic Processors)

Hi all,

quite a while ago I had a mail exchange with Christian Lieske who had made
me aware of a potential solution to issue-42. Please have a look below and
let's discuss this also in Prague.

- Felix


2012/7/27 Lieske, Christian <christian.lieske@sap.com>

Hi,****
>
> ** **
>
> As mentioned by Felix, MultilingualWeb-LT Working Group<http://www.w3.org/International/multilingualweb/lt/>is discussing what possibly can be called “Configurations for/Information
> on Linguistic Processors”: If for example a translation is produced by a
> Machine Translation engine, the corresponding translation may need to be
> annotated with information such as “This translation was produce with
> version X of engine Y. The dataset that was used to train the engine was D,
> and the corresponding language model was L”. In a world which recognizes
> the importance of trust/reliability this kind of meta data from my point of
> view is how importance. From my point of view, the “Configurations
> for/Information on Linguistic Processors” thus is also related to
> discussion surrounding “provenance”.****
>
> ** **
>
> The MLW-LT discussion reminded me of the fact that I did some work on the
> topic a while ago: In the context of the Open Lexicon Interchange Format
> (OLIF; see http://www.tekom.de/upload/2284/OASIS_40_Lieske.pdf ), I
> investigated what type of configuration/information for example may have to
> be captured in the context of Term Extraction processors. The outcome of
> the investigations found its way into the latest version of OLIF in the
> disguise of the “termExtrInfo” element (see attached screenshot of the OLIF
> 3.0 schema, and schema at
> http://www.olif.net/downloads/OLIF-3.0-Beta-20Feb2008-v5.zip).****
>
> ** **
>
> The “termExtrInfo” is basically a set of data categories that allow you to
> capture for example the following:****
>
> ** **
>
> **1.       **Tool Info and Features: Info on features of the tool that
> was used (you can for example provide info on which approach to
> morphological analysis is implemented)****
>
> **2.       **Input Info: Info on features of the data that was fed into
> the tool (you can for example get a feeling for the quality you can expect)
> ****
>
> **3.       **Process Info: Info on the process that involved tool and
> input (you can for example capture that you did something for a specific
> client)****
>
> ** **
>
> Possibly, the “termExtrInfo” could serve as a model for a more general
> “lingProcInfo”.****
>
> ** **
>
> Cheers,****
>
> Christian****
>
> ** **
>
> [image: Description: cid:image001.jpg@01CD6A4E.153C0220]****
>
> *Christian Lieske**
> *Knowledge Architect
> SAP Language Services (SLS) - “*Translating SAP for the World*“
> SAP Globalization Services
> *SAP AG
> *SAP Allee 15
> D-68789 St. Leon-Rot
> Germany
> T +49 (62 27) 7 - 6 13 03
> F +49 (62 27) 7 – 2 54 18
> mail to:*christian.lieske@sap.com**
> **www.sap.com* ****
>
> Pflichtangaben/Mandatory Disclosure Statements: *
> http://www.sap.com/company/legal/impressum.epx*<http://www.sap.com/company/legal/impressum.epx>
> Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige
> vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich
> erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine
> Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt. Bitte
> benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen
> Dank.
>
> This e-mail may contain trade secrets or privileged, undisclosed, or
> otherwise confidential information. If you have received this e-mail in
> error, you are hereby notified that any review, copying, or distribution of
> it is strictly prohibited. Please inform us immediately and destroy the
> original transmittal. Thank you for your cooperation.****
>







-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Tuesday, 18 September 2012 10:26:30 UTC