- From: Serge Gladkoff <serge.gladkoff@gmail.com>
- Date: Wed, 16 Jul 2014 13:10:35 +0200
- To: "'Felix Sasaki'" <felix.sasaki@dfki.de>, "Dave Lewis" <dave.lewis@cs.tcd.ie>
- Cc: <public-i18n-its-ig@w3.org>
- Message-ID: <05b101cfa0e6$923765a0$b6a630e0$@gmail.com>
Hello Felix, Dave and others, I have made several proposed additions to the document, specifically: A. Industry sector engagement issues. 1. This document is focuses on solely the vision of "data annotation" community, and this is very narrow from the point of view of other future stakeholders, such as language technology and language service communities, not to mention the public which is the intended audience of Public Automated Translation Services, by the way. I would therefore widen the "Interoperability Goals" with: "To ensure presence of interoperability data and service features enabling full and meaningful engagement of language service and language technology communities, as well as necessary prerequisites for the professional and public feedback and participation." 2. I added the keyword "European Language Cloud" to the Interoperability goals. I think this is quite important to mention European Language Cloud here, especially as many proprietary clouds are now going to emerge, we need to make sure they use these requirements as well for interoperability purposes. B. Quality Management issues. The MT is completely meaningless without the reliable measure of whether it is good or bad for intended purpose, and the output is actually usable. But the aspect of LQA is underplayed in this document, I would seek to improve this somewhat. 1. I improved the definition of "standards" in this document J. I think we should mention ITS2.0 as the standard in this document J, as well as linked data standards developed by W3C. J 2. The Reference Model graphics refers to "Trans QA" - there is no such term in the industry; the industry uses the term LQA ("Linguistic quality Assurance"), as "standard process box" so to speak to verify that the language quality is up to the requirements. LQA is used as internal process step to build various translation processes. I therefore added LQA definition to the Terminology section and changing the name on the graphics, accordingly. 3. Since quality assessment is so important for production processes, I would insist on changing the data management requirement 5 to M (Mandatory): "It should be possible for third parties to submit error, QA or corrective annotations to published data, provided it is presented in a common format." Without this requirement to be mandatory recommendation the Pandora box will be open for producing the content which is unusable, without any correction or feedback mechanisms. 4. Also, it is important that any error feedback is conformant to common practices, so I would propose to amend the above requirement 5 as follows: "It should be possible for third parties (general public, individual experts and language service providers alike, as well as automated language services) to submit error, QA or corrective annotations to published data, provided it is presented in a common format, with metadata conformant to one of the commonly accepted and documented universal error typologies, and/or appropriate quality metrics.". Otherwise we will get all types of error annotations that are incompatible and therefore meaningless; we also absolutely need to provide a channel to the feedback and annotation mechanisms of all industry stakeholders. This is required to plug in various essential services and tools. 5. Currently there are no public language quality assessment methodologies, however, there's a work item currently in development in ASTM that is targeted specifically at "Development of a complete methodology, including a simplified quality metric, for crowd-sourced expert language quality assessment targeted at nonprofit web sites and other documents of public interest". I suggest at least mentioning it here in definition section. I also invite participation of the group in development discussion of this public standard which is intended precisely for the public content quality assessment. C. Lack of stakeholders for this document to be published I think that we need to seek additional feedback and support from communities, such as LT-Innovate and perhaps GALA. The reason is that any proposed framework for machine translation must be supported by wider audiences who should see their possible participation. I would propose to launch some outreach effort so we get more qualified participation from the industry sectors. What we could do is to launch an outreach seeking further input. Such an outreach would be a method of influence in itself that this group can engage in putting these ideas forward. I am looking forward to our conference call. Regards, Serge Gladkoff President, Logrus International GALA CRISP Lead From: Felix Sasaki [mailto:felix.sasaki@dfki.de] Sent: Wednesday, July 16, 2014 11:25 AM To: Dave Lewis Cc: public-i18n-its-ig@w3.org Subject: Re: [Agenda] ITS IG call 16 July noon utc Hi Dave, Am 16.07.2014 um 11:02 schrieb Dave Lewis <dave.lewis@cs.tcd.ie>: Hi Felix, I probably won't be able to make this today. Two things to note however: i) David has negotiated having a one day, single track FEISGILTT with LocWorld on 29th October It is a bit a pity that this overlaps with TPAC technical plenary day. So I'll probably won't be able to join you. I'll try to prepare input to the ITS topics. in Vancouver, so we are now preparing Call for Papers. Topics we were planning to cover are; * TBX-RDF models/migration and integration with open lexical-conceptual resources; * new XLIFF2.0 modules, in particular for ITS; * objection model/APIs for XLIFF/ITS * MQM-ITS-RDF integration; * publishing bitext I'll draft some CFP text for comment later today, but if people on the call or the list have any further topics they would like to see addressed please let me know. I guess its also a good opportunity to have some wider input from the Microsoft guys as it will be an easy trip for them. ii) Sadly, Leroy will be moving on from TCD on the 4th August to join IBM in Dublin. I hope you will join me in saying a big thank you to him for all the work he did at TCD in helping to develop ITS2.0 and in particular the implementation of the Test Suite. Indeed! This sounds like a great opportunity for Leroy and congrats to that, and indeed a big thank you for the work on the test suite. Without you we would not have managed to move this forward the way needed - thanks a lot! And as usual: let's stay in touch and keep us posted what you do - maybe we can squeeze in some ITS, linked data - or both :) Best, Felix Regards, Dave On 15/07/2014 19:07, Felix Sasaki wrote: Your time http://www.timeanddate.com/worldclock/fixedtime.html?iso=20140716T12 Dial-in info https://www.w3.org/International/its/wiki/Dial_in_info_for_regular_call Please join IRC via your client or via http://irc.w3.org <http://irc.w3.org/> Channel: #i18nits Topics: the same as two weeks ago. We will go through this quickly and then see how to continue calls after the summer break. 0) action items http://www.w3.org/International/its/track/actions/open 1) Open Data Management position statement - see latest state at https://www.w3.org/International/its/wiki/Open_Data_Management_for_Public_Au tomated_Translation_Services 2) ITS and XLIFF (placeholder) 3) MQM and ITS (placeholder) 4) Wiki clean up and reasonable planning of topics for the next months https://www.w3.org/International/its/wiki/ Anything else? Best, Felix
Attachments
- application/vnd.openxmlformats-officedocument.wordprocessingml.document attachment: Open_Data_Management_for_Public_Automated_Translation_Services-revSerge.docx
Received on Wednesday, 16 July 2014 11:11:22 UTC