W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > September 2012

Re: Further input to issue-42 (Fwd: Meta Data related to Configurations for/Information on Linguistic Processors)

From: Tadej Štajner <tadej.stajner@ijs.si>
Date: Tue, 18 Sep 2012 13:52:01 +0200
Message-ID: <50586061.8070100@ijs.si>
To: public-multilingualweb-lt@w3.org
Hi,
this looks very well-developed and something that had practical usage. 
One thing that surprised me a bit is that the termExtrInfo data category 
(or the generalized lingProcInfo) doesn't seem to focus on targeting 
specific annotations, but annotates the whole document via the header.

My question (for everyone using ISSUE-42 data categories), is it 
necessary to point to individual elements, or is a document-level 
annotation enough? As it stands, the TA Annotation draft has a 
per-instance mechanism via selectors. Could we simplify to a 
per-document ('e.g. this document was processed by 'ACME-Extractor 2.0') 
rule?

-- Tadej

On 18. 09. 2012 12:26, Felix Sasaki wrote:
> Hi all,
>
> quite a while ago I had a mail exchange with Christian Lieske who had 
> made me aware of a potential solution to issue-42. Please have a look 
> below and let's discuss this also in Prague.
>
> - Felix
>
>
> 2012/7/27 Lieske, Christian <christian.lieske@sap.com 
> <mailto:christian.lieske@sap.com>>
>
>     Hi,
>
>     As mentioned by Felix, MultilingualWeb-LT Working Group
>     <http://www.w3.org/International/multilingualweb/lt/> is
>     discussing what possibly can be called “Configurations
>     for/Information on Linguistic Processors”: If for example a
>     translation is produced by a Machine Translation engine, the
>     corresponding translation may need to be annotated with
>     information such as “This translation was produce with version X
>     of engine Y. The dataset that was used to train the engine was D,
>     and the corresponding language model was L”. In a world which
>     recognizes the importance of trust/reliability this kind of meta
>     data from my point of view is how importance. From my point of
>     view, the “Configurations for/Information on Linguistic
>     Processors” thus is also related to discussion surrounding
>     “provenance”.
>
>     The MLW-LT discussion reminded me of the fact that I did some work
>     on the topic a while ago: In the context of the Open Lexicon
>     Interchange Format (OLIF; see
>     http://www.tekom.de/upload/2284/OASIS_40_Lieske.pdf), I
>     investigated what type of configuration/information for example
>     may have to be captured in the context of Term Extraction
>     processors. The outcome of the investigations found its way into
>     the latest version of OLIF in the disguise of the “termExtrInfo”
>     element (see attached screenshot of the OLIF 3.0 schema, and
>     schema
>     athttp://www.olif.net/downloads/OLIF-3.0-Beta-20Feb2008-v5.zip).
>
>     The “termExtrInfo” is basically a set of data categories that
>     allow you to capture for example the following:
>
>     1.Tool Info and Features: Info on features of the tool that was
>     used (you can for example provide info on which approach to
>     morphological analysis is implemented)
>
>     2.Input Info: Info on features of the data that was fed into the
>     tool (you can for example get a feeling for the quality you can
>     expect)
>
>     3.Process Info: Info on the process that involved tool and input
>     (you can for example capture that you did something for a specific
>     client)
>
>     Possibly, the “termExtrInfo” could serve as a model for a more
>     general “lingProcInfo”.
>
>     Cheers,
>
>     Christian
>
>     Description: cid:image001.jpg@01CD6A4E.153C0220
>
>     *Christian Lieske**
>     *Knowledge Architect
>     SAP Language Services (SLS) - “*Translating SAP for the World*“
>     SAP Globalization Services
>     *SAP AG
>     *SAP Allee 15
>     D-68789 St. Leon-Rot
>     Germany
>     T +49 (62 27) 7 - 6 13 03
>     <tel:%2B49%20%2862%2027%29%207%20-%206%2013%2003>
>     F +49 (62 27) 7–2 54 18
>     mail to:*christian.lieske@sap.com*_
>     _*www.sap.com*
>
>     Pflichtangaben/Mandatory Disclosure
>     Statements:*http://www.sap.com/company/legal/impressum.epx*
>     Diese E-Mail kann Betriebs-oder Geschäftsgeheimnisse oder sonstige
>     vertrauliche Informationen enthalten. Sollten Sie diese E-Mail
>     irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des
>     Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail
>     ausdrücklich untersagt. Bitte benachrichtigen Sie uns und
>     vernichten Sie die empfangene E-Mail.Vielen Dank.
>
>     This e-mail may contain trade secrets or privileged, undisclosed,
>     or otherwise confidential information. If you have received this
>     e-mail in error, you are hereby notified that any review, copying,
>     or distribution of it is strictly prohibited. Please inform us
>     immediately and destroy the original transmittal. Thank you for
>     your cooperation.
>
>
>
>
>
>
>
>
> -- 
> Felix Sasaki
> DFKI / W3C Fellow
>
Received on Tuesday, 18 September 2012 11:53:07 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:53 UTC