Re: [ISSUE-33]Re: [All] domain data category section proposal, please review from Felix Sasaki on 2012-07-05 (public-multilingualweb-lt@w3.org from July 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 5 Jul 2012 13:43:09 +0200
To: Dave Lewis <dave.lewis@cs.tcd.ie>
Cc: public-multilingualweb-lt@w3.org
Message-ID: <CAL58czqd3P-Qvdc-gCG_nfxbvC+oLzShNTuPx8H5bXEc+1LA2A@mail.gmail.com>
2012/7/5 Dave Lewis <dave.lewis@cs.tcd.ie>

>  OK. that one is fairly straightforward, as the reference is to meta-data
> in an existing standard format in the document. It would just require part
> of the test suite specifying workflow domain to MT engine mappings, e.g.
>     auto->MT1
>     medicine->MT2
>     law->MT3
> and perhaps for good measure:
>     medicine AND law->MT4
>
> I was thinking more of Arle suggestion to a dumb pointer to a document
> with a set of translation job parameters for example, in a LINPORT format.
> Do we want to test suite to require correct parsing of this external file
> and then have so checking criteria for the correct behaviour in the output
> of the translation process?
>

You mean for the "translation parameter" suggestion? One reason for me to
oppose that suggestion that interoperability is really hard to achieve,
which is shown by your test suite question: "which parameters, which format
for the parameters, etc.?"
So I'd rather encourage implementors to team up to develop solutions
together, and when we document them - as part of ITS 2.0 or a best practice
document. We will need to fix ITS 2.0 in November anyway, and when have a
year to do testing, best practices documentation etc.

Best,

Felix


>
> I don't see this as a problem either way, I'm just trying to tease out a
> bit the scope and complexity required for the test suite.
>
>
> cheers,
> Dave
>
> On 04/07/2012 13:33, Felix Sasaki wrote:
>
>
>
> 2012/7/4 Dave Lewis <dave.lewis@cs.tcd.ie>
>
>>  I agree - nice summary Arle?
>>
>> And to be clear, for ITS conformance testing of glue type data
>> categories, we only need to test that the correct association is made
>> between the select portion of the document and the pointer concerned, and
>> that the implemention can fetch what is being pointed at, but not how it
>> parses or interprets that external document - right?
>>
>
>  Actually, no ... because otherwise we will have a lot of "glue" data
> categories that actually do nothing else than gluing - I tried to make that
> point at
>
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jul/0040.html
> the test cases should be as close as possible on what's happening in the
> system. For domain, I can imagine the following:
>
>  given an example like
>
>  <its:rules
>   xmlns:its="http://www.w3.org/2005/11/its"  version="2.0">
>  <its:domainRule selector="/html/body" domainPointer="/html/head/meta[@name='DC.subject']/@content" </html/head/meta%5B@name='DC.subject'%5D/@content>
>    domainMapping="automotive auto, medical medicine, 'criminal law' law, 'property law' law"/>
> </its:rules>
>
>
>  an MT engine selects the proper domain sub engine.
>
>  Such a test case needs to be checked manually, but it's much more
> valuable (also in terms of demonstrating the value of the data category)
> than pure "glue" conformance testing. Also, we can make sure via "real
> life" test cases that the MT engine really processes the mapping - may it
> be URIs or keyword lists. This would be different for "glue" test cases.
>
>  Best,
>
>  Felix
>
>
>
>>
>> cheers,
>> Dave
>>
>>
>> On 04/07/2012 12:52, Felix Sasaki wrote:
>>
>> Thanks a lot for the summary, Arle. One additional point below.
>>
>> 2012/7/4 Arle Lommel <arle.lommel@dfki.de>
>>
>>> Hi all,
>>>
>>>  Just to follow up on what Felix wrote, I was in discussion with Pedro
>>> and we realized that there is a potential issue for our work as we delve
>>> into process-related data categories, provenance, and so forth. If we try
>>> to define values, our specification will already by obsolete by the time it
>>> is out the door. For example, if we define process trigger very well, we
>>> will release the specification and immediately we will discover that there
>>> was some relevant usage scenario we did not consider that therefore cannot
>>> be covered by the values we have. We discussed adopting the "standards as
>>> database" approach being taken by ISO TC 37 (hence my frequent references
>>> to the ISO Data Category Repository in the past few week).
>>>
>>>  After discussion with Felix, however, we (Felix and I) see a solution:
>>> our work is not to define the permissible values for most of this metadata.
>>> Rather we provide a mechanism to point to the values people are using, as
>>> we discussed with domain. This is the "glue" idea Felix mentions. That sets
>>> aside the issue of *where* to define the values to support interoperability
>>>
>>
>>  This of course only makes sense if there are already values being used.
>> From Thomas and Declan I think this is the case for MT systems. In other
>> words, we should not define new data categories saying that they are on the
>> "glue" level and that some day they might play a rule in bringing systems
>> together. For a new data category fulfilling this "glue" purpose, there
>> needs to be implementations - two, as usual - that can make use of it.
>>
>>  Best,
>>
>>  Felix
>>
>>
>>
>>>  , but by focusing on just the glue it simplifies our implementation
>>> requirements and testing greatly. So, for example, Pedro could post the
>>> ontology of process trigger he is using and point to it in the
>>> implementation with his partners, thus fulfilling the requirement for
>>> implementation of the data category. But we do *not* need to agree and
>>> standardize as a group on the possible values, a task that would make our
>>> project exponentially more difficult and unwieldy, and we do not need to
>>> implement specific values for the data category.
>>>
>>>  To take another example, in the quality data categories, this
>>> principle means we would not define a quality metric ourselves, but rather
>>> ways to point and reference external quality metrics.
>>>
>>>  So we need to keep this principle in mind for the complex data
>>> categories: in most cases, we are defining *reference mechanisms*, not *
>>> content/values*. We simply need to provide a way to point to the work
>>> of others (either standardized or proprietary). If we are getting into any
>>> sort of prescriptive description of what people *should or should not*be doing, we are exceeding our mandate.
>>>
>>>  Best,
>>>
>>>  Arle
>>>
>>>
>>>  On Jul 4, 2012, at 12:21 , Felix Sasaki wrote:
>>>
>>> Thanks, and I very much agree. Arle recently told me that there was a
>>> discussion at the ISO meeting in Madrid about whether MLW-LT will define or
>>> refer to data categories, as provided by DCR. I would go the same route as
>>> for domain: in these areas there is already a lot of existing metadata. ITS
>>> 2.0 can serve "as a glue" to make it easier to use the metadata in various
>>> systems.
>>>
>>>
>>>
>>
>>
>>  --
>> Felix Sasaki
>> DFKI / W3C Fellow
>>
>>
>>
>
>
>  --
> Felix Sasaki
> DFKI / W3C Fellow
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Thursday, 5 July 2012 11:43:37 UTC