- From: Felix Sasaki <fsasaki@w3.org>
- Date: Thu, 5 Jul 2012 13:43:09 +0200
- To: Dave Lewis <dave.lewis@cs.tcd.ie>
- Cc: public-multilingualweb-lt@w3.org
- Message-ID: <CAL58czqd3P-Qvdc-gCG_nfxbvC+oLzShNTuPx8H5bXEc+1LA2A@mail.gmail.com>
2012/7/5 Dave Lewis <dave.lewis@cs.tcd.ie> > OK. that one is fairly straightforward, as the reference is to meta-data > in an existing standard format in the document. It would just require part > of the test suite specifying workflow domain to MT engine mappings, e.g. > auto->MT1 > medicine->MT2 > law->MT3 > and perhaps for good measure: > medicine AND law->MT4 > > I was thinking more of Arle suggestion to a dumb pointer to a document > with a set of translation job parameters for example, in a LINPORT format. > Do we want to test suite to require correct parsing of this external file > and then have so checking criteria for the correct behaviour in the output > of the translation process? > You mean for the "translation parameter" suggestion? One reason for me to oppose that suggestion that interoperability is really hard to achieve, which is shown by your test suite question: "which parameters, which format for the parameters, etc.?" So I'd rather encourage implementors to team up to develop solutions together, and when we document them - as part of ITS 2.0 or a best practice document. We will need to fix ITS 2.0 in November anyway, and when have a year to do testing, best practices documentation etc. Best, Felix > > I don't see this as a problem either way, I'm just trying to tease out a > bit the scope and complexity required for the test suite. > > > cheers, > Dave > > On 04/07/2012 13:33, Felix Sasaki wrote: > > > > 2012/7/4 Dave Lewis <dave.lewis@cs.tcd.ie> > >> I agree - nice summary Arle? >> >> And to be clear, for ITS conformance testing of glue type data >> categories, we only need to test that the correct association is made >> between the select portion of the document and the pointer concerned, and >> that the implemention can fetch what is being pointed at, but not how it >> parses or interprets that external document - right? >> > > Actually, no ... because otherwise we will have a lot of "glue" data > categories that actually do nothing else than gluing - I tried to make that > point at > > http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jul/0040.html > the test cases should be as close as possible on what's happening in the > system. For domain, I can imagine the following: > > given an example like > > <its:rules > xmlns:its="http://www.w3.org/2005/11/its" version="2.0"> > <its:domainRule selector="/html/body" domainPointer="/html/head/meta[@name='DC.subject']/@content" </html/head/meta%5B@name='DC.subject'%5D/@content> > domainMapping="automotive auto, medical medicine, 'criminal law' law, 'property law' law"/> > </its:rules> > > > an MT engine selects the proper domain sub engine. > > Such a test case needs to be checked manually, but it's much more > valuable (also in terms of demonstrating the value of the data category) > than pure "glue" conformance testing. Also, we can make sure via "real > life" test cases that the MT engine really processes the mapping - may it > be URIs or keyword lists. This would be different for "glue" test cases. > > Best, > > Felix > > > >> >> cheers, >> Dave >> >> >> On 04/07/2012 12:52, Felix Sasaki wrote: >> >> Thanks a lot for the summary, Arle. One additional point below. >> >> 2012/7/4 Arle Lommel <arle.lommel@dfki.de> >> >>> Hi all, >>> >>> Just to follow up on what Felix wrote, I was in discussion with Pedro >>> and we realized that there is a potential issue for our work as we delve >>> into process-related data categories, provenance, and so forth. If we try >>> to define values, our specification will already by obsolete by the time it >>> is out the door. For example, if we define process trigger very well, we >>> will release the specification and immediately we will discover that there >>> was some relevant usage scenario we did not consider that therefore cannot >>> be covered by the values we have. We discussed adopting the "standards as >>> database" approach being taken by ISO TC 37 (hence my frequent references >>> to the ISO Data Category Repository in the past few week). >>> >>> After discussion with Felix, however, we (Felix and I) see a solution: >>> our work is not to define the permissible values for most of this metadata. >>> Rather we provide a mechanism to point to the values people are using, as >>> we discussed with domain. This is the "glue" idea Felix mentions. That sets >>> aside the issue of *where* to define the values to support interoperability >>> >> >> This of course only makes sense if there are already values being used. >> From Thomas and Declan I think this is the case for MT systems. In other >> words, we should not define new data categories saying that they are on the >> "glue" level and that some day they might play a rule in bringing systems >> together. For a new data category fulfilling this "glue" purpose, there >> needs to be implementations - two, as usual - that can make use of it. >> >> Best, >> >> Felix >> >> >> >>> , but by focusing on just the glue it simplifies our implementation >>> requirements and testing greatly. So, for example, Pedro could post the >>> ontology of process trigger he is using and point to it in the >>> implementation with his partners, thus fulfilling the requirement for >>> implementation of the data category. But we do *not* need to agree and >>> standardize as a group on the possible values, a task that would make our >>> project exponentially more difficult and unwieldy, and we do not need to >>> implement specific values for the data category. >>> >>> To take another example, in the quality data categories, this >>> principle means we would not define a quality metric ourselves, but rather >>> ways to point and reference external quality metrics. >>> >>> So we need to keep this principle in mind for the complex data >>> categories: in most cases, we are defining *reference mechanisms*, not * >>> content/values*. We simply need to provide a way to point to the work >>> of others (either standardized or proprietary). If we are getting into any >>> sort of prescriptive description of what people *should or should not*be doing, we are exceeding our mandate. >>> >>> Best, >>> >>> Arle >>> >>> >>> On Jul 4, 2012, at 12:21 , Felix Sasaki wrote: >>> >>> Thanks, and I very much agree. Arle recently told me that there was a >>> discussion at the ISO meeting in Madrid about whether MLW-LT will define or >>> refer to data categories, as provided by DCR. I would go the same route as >>> for domain: in these areas there is already a lot of existing metadata. ITS >>> 2.0 can serve "as a glue" to make it easier to use the metadata in various >>> systems. >>> >>> >>> >> >> >> -- >> Felix Sasaki >> DFKI / W3C Fellow >> >> >> > > > -- > Felix Sasaki > DFKI / W3C Fellow > > > -- Felix Sasaki DFKI / W3C Fellow
Received on Thursday, 5 July 2012 11:43:37 UTC