- From: Felix Sasaki <fsasaki@w3.org>
- Date: Thu, 9 Aug 2012 12:57:14 +0200
- To: MultilingualWeb-LT Working Group <public-multilingualweb-lt@w3.org>
- Message-ID: <CAL58czoRh9=mgmD_oBd3cver3SNOJ4ULbyXm-AUXuEfnp9fNVA@mail.gmail.com>
Some more information about this issue: a while ago I had a mail exchange with Christian Lieske who pointed me to what might be an existing solution for specifying “Configurations for/Information on Linguistic Processors”, or a "lingProcInfo" data category. Below information is tailored towards terminology, but it might provide us with some ideas. Felix ---------- Forwarded message ---------- From: Lieske, Christian <christian.lieske@sap.com> Date: 2012/7/27 Subject: Meta Data related to Configurations for/Information on Linguistic Processors To: "naber@danielnaber.de" <naber@danielnaber.de>, "Nickel, Inna" < inna.nickel@sap.com> Cc: "'Felix Sasaki' (felix.sasaki@dfki.de)" <felix.sasaki@dfki.de> Hi,**** ** ** As mentioned by Felix, MultilingualWeb-LT Working Group<http://www.w3.org/International/multilingualweb/lt/> is discussing what possibly can be called “Configurations for/Information on Linguistic Processors”: If for example a translation is produced by a Machine Translation engine, the corresponding translation may need to be annotated with information such as “This translation was produce with version X of engine Y. The dataset that was used to train the engine was D, and the corresponding language model was L”. In a world which recognizes the importance of trust/reliability this kind of meta data from my point of view is how importance. From my point of view, the “Configurations for/Information on Linguistic Processors” thus is also related to discussion surrounding “provenance”.**** ** ** The MLW-LT discussion reminded me of the fact that I did some work on the topic a while ago: In the context of the Open Lexicon Interchange Format (OLIF; see http://www.tekom.de/upload/2284/OASIS_40_Lieske.pdf ), I investigated what type of configuration/information for example may have to be captured in the context of Term Extraction processors. The outcome of the investigations found its way into the latest version of OLIF in the disguise of the “termExtrInfo” element (see attached screenshot of the OLIF 3.0 schema, and schema at http://www.olif.net/downloads/OLIF-3.0-Beta-20Feb2008-v5.zip).**** ** ** The “termExtrInfo” is basically a set of data categories that allow you to capture for example the following:**** ** ** **1. **Tool Info and Features: Info on features of the tool that was used (you can for example provide info on which approach to morphological analysis is implemented)**** **2. **Input Info: Info on features of the data that was fed into the tool (you can for example get a feeling for the quality you can expect)**** **3. **Process Info: Info on the process that involved tool and input (you can for example capture that you did something for a specific client)** ** ** ** Possibly, the “termExtrInfo” could serve as a model for a more general “lingProcInfo”.**** ** ** Cheers,**** Christian**** ** ** [image: Description: cid:image001.jpg@01CD6A4E.153C0220]**** *Christian Lieske** *Knowledge Architect SAP Language Services (SLS) - “*Translating SAP for the World*“ SAP Globalization Services *SAP AG *SAP Allee 15 D-68789 St. Leon-Rot Germany T +49 (62 27) 7 - 6 13 03 F +49 (62 27) 7 – 2 54 18 mail to:*christian.lieske@sap.com** **www.sap.com***** Pflichtangaben/Mandatory Disclosure Statements: * http://www.sap.com/company/legal/impressum.epx*<http://www.sap.com/company/legal/impressum.epx> Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse oder sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt. Bitte benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen Dank. This e-mail may contain trade secrets or privileged, undisclosed, or otherwise confidential information. If you have received this e-mail in error, you are hereby notified that any review, copying, or distribution of it is strictly prohibited. Please inform us immediately and destroy the original transmittal. Thank you for your cooperation.****
Attachments
- image/jpeg attachment: image003.jpg
Received on Thursday, 9 August 2012 10:57:43 UTC