[ISSUE-33]Re: [All] domain data category section proposal, please review from Dave Lewis on 2012-07-05 (public-multilingualweb-lt@w3.org from July 2012)

From: Dave Lewis <dave.lewis@cs.tcd.ie>
Date: Thu, 05 Jul 2012 11:39:47 +0100
To: public-multilingualweb-lt@w3.org
Message-ID: <4FF56EF3.6030808@cs.tcd.ie>
OK. that one is fairly straightforward, as the reference is to meta-data 
in an existing standard format in the document. It would just require 
part of the test suite specifying workflow domain to MT engine mappings, 
e.g.
     auto->MT1
     medicine->MT2
     law->MT3
and perhaps for good measure:
     medicine AND law->MT4

I was thinking more of Arle suggestion to a dumb pointer to a document 
with a set of translation job parameters for example, in a LINPORT 
format. Do we want to test suite to require correct parsing of this 
external file and then have so checking criteria for the correct 
behaviour in the output of the translation process?

I don't see this as a problem either way, I'm just trying to tease out a 
bit the scope and complexity required for the test suite.

cheers,
Dave

On 04/07/2012 13:33, Felix Sasaki wrote:
>
>
> 2012/7/4 Dave Lewis <dave.lewis@cs.tcd.ie <mailto:dave.lewis@cs.tcd.ie>>
>
>     I agree - nice summary Arle?
>
>     And to be clear, for ITS conformance testing of glue type data
>     categories, we only need to test that the correct association is
>     made between the select portion of the document and the pointer
>     concerned, and that the implemention can fetch what is being
>     pointed at, but not how it parses or interprets that external
>     document - right?
>
>
> Actually, no ... because otherwise we will have a lot of "glue" data 
> categories that actually do nothing else than gluing - I tried to make 
> that point at
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jul/0040.html
> the test cases should be as close as possible on what's happening in 
> the system. For domain, I can imagine the following:
>
> given an example like
>
> <its:rules
>    xmlns:its="http://www.w3.org/2005/11/its"  version="2.0">
>   <its:domainRule selector="/html/body" domainPointer="/html/head/meta[@name='DC.subject']/@content"
>     domainMapping="automotive auto, medical medicine, 'criminal law' law, 'property law' law"/>
> </its:rules>
>
> an MT engine selects the proper domain sub engine.
>
> Such a test case needs to be checked manually, but it's much more 
> valuable (also in terms of demonstrating the value of the data 
> category) than pure "glue" conformance testing. Also, we can make sure 
> via "real life" test cases that the MT engine really processes the 
> mapping - may it be URIs or keyword lists. This would be different for 
> "glue" test cases.
>
> Best,
>
> Felix
>
>
>     cheers,
>     Dave
>
>
>     On 04/07/2012 12:52, Felix Sasaki wrote:
>>     Thanks a lot for the summary, Arle. One additional point below.
>>
>>     2012/7/4 Arle Lommel <arle.lommel@dfki.de
>>     <mailto:arle.lommel@dfki.de>>
>>
>>         Hi all,
>>
>>         Just to follow up on what Felix wrote, I was in discussion
>>         with Pedro and we realized that there is a potential issue
>>         for our work as we delve into process-related data
>>         categories, provenance, and so forth. If we try to define
>>         values, our specification will already by obsolete by the
>>         time it is out the door. For example, if we define process
>>         trigger very well, we will release the specification and
>>         immediately we will discover that there was some relevant
>>         usage scenario we did not consider that therefore cannot be
>>         covered by the values we have. We discussed adopting the
>>         "standards as database" approach being taken by ISO TC 37
>>         (hence my frequent references to the ISO Data Category
>>         Repository in the past few week).
>>
>>         After discussion with Felix, however, we (Felix and I) see a
>>         solution: our work is not to define the permissible values
>>         for most of this metadata. Rather we provide a mechanism to
>>         point to the values people are using, as we discussed with
>>         domain. This is the "glue" idea Felix mentions. That sets
>>         aside the issue of *where* to define the values to support
>>         interoperability
>>
>>
>>     This of course only makes sense if there are already values being
>>     used. From Thomas and Declan I think this is the case for MT
>>     systems. In other words, we should not define new data categories
>>     saying that they are on the "glue" level and that some day they
>>     might play a rule in bringing systems together. For a new data
>>     category fulfilling this "glue" purpose, there needs to be
>>     implementations - two, as usual - that can make use of it.
>>
>>     Best,
>>
>>     Felix
>>
>>         , but by focusing on just the glue it simplifies our
>>         implementation requirements and testing greatly. So, for
>>         example, Pedro could post the ontology of process trigger he
>>         is using and point to it in the implementation with his
>>         partners, thus fulfilling the requirement for implementation
>>         of the data category. But we do /not/ need to agree and
>>         standardize as a group on the possible values, a task that
>>         would make our project exponentially more difficult and
>>         unwieldy, and we do not need to implement specific values for
>>         the data category.
>>
>>         To take another example, in the quality data categories, this
>>         principle means we would not define a quality metric
>>         ourselves, but rather ways to point and reference external
>>         quality metrics.
>>
>>         So we need to keep this principle in mind for the complex
>>         data categories: in most cases, we are defining
>>         /reference mechanisms/, not /content/values/. We simply need
>>         to provide a way to point to the work of others (either
>>         standardized or proprietary). If we are getting into any sort
>>         of prescriptive description of what people /should or should
>>         not/ be doing, we are exceeding our mandate.
>>
>>         Best,
>>
>>         Arle
>>
>>
>>         On Jul 4, 2012, at 12:21 , Felix Sasaki wrote:
>>
>>>         Thanks, and I very much agree. Arle recently told me that
>>>         there was a discussion at the ISO meeting in Madrid about
>>>         whether MLW-LT will define or refer to data categories, as
>>>         provided by DCR. I would go the same route as for domain: in
>>>         these areas there is already a lot of existing metadata. ITS
>>>         2.0 can serve "as a glue" to make it easier to use the
>>>         metadata in various systems.
>>
>>
>>
>>
>>     -- 
>>     Felix Sasaki
>>     DFKI / W3C Fellow
>>
>
>
>
>
> -- 
> Felix Sasaki
> DFKI / W3C Fellow
>
Received on Thursday, 5 July 2012 10:40:50 UTC