Re: [All] domain data category section proposal, please review from Felix Sasaki on 2012-07-04 (public-multilingualweb-lt@w3.org from July 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 4 Jul 2012 14:33:54 +0200
To: Dave Lewis <dave.lewis@cs.tcd.ie>
Cc: Arle Lommel <arle.lommel@dfki.de>, public-multilingualweb-lt@w3.org
Message-ID: <CAL58czoWD_-MauAzv5pyq8ueg_0rFDKzZBxidqVWKgp5_XOPpA@mail.gmail.com>
2012/7/4 Dave Lewis <dave.lewis@cs.tcd.ie>

>  I agree - nice summary Arle?
>
> And to be clear, for ITS conformance testing of glue type data categories,
> we only need to test that the correct association is made between the
> select portion of the document and the pointer concerned, and that the
> implemention can fetch what is being pointed at, but not how it parses or
> interprets that external document - right?
>

Actually, no ... because otherwise we will have a lot of "glue" data
categories that actually do nothing else than gluing - I tried to make that
point at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jul/0040.html
the test cases should be as close as possible on what's happening in the
system. For domain, I can imagine the following:

given an example like

<its:rules
  xmlns:its="http://www.w3.org/2005/11/its"  version="2.0">
 <its:domainRule selector="/html/body"
domainPointer="/html/head/meta[@name='DC.subject']/@content"
   domainMapping="automotive auto, medical medicine, 'criminal law'
law, 'property law' law"/>
</its:rules>


an MT engine selects the proper domain sub engine.

Such a test case needs to be checked manually, but it's much more valuable
(also in terms of demonstrating the value of the data category) than pure
"glue" conformance testing. Also, we can make sure via "real life" test
cases that the MT engine really processes the mapping - may it be URIs or
keyword lists. This would be different for "glue" test cases.

Best,

Felix



>
> cheers,
> Dave
>
>
> On 04/07/2012 12:52, Felix Sasaki wrote:
>
> Thanks a lot for the summary, Arle. One additional point below.
>
> 2012/7/4 Arle Lommel <arle.lommel@dfki.de>
>
>> Hi all,
>>
>>  Just to follow up on what Felix wrote, I was in discussion with Pedro
>> and we realized that there is a potential issue for our work as we delve
>> into process-related data categories, provenance, and so forth. If we try
>> to define values, our specification will already by obsolete by the time it
>> is out the door. For example, if we define process trigger very well, we
>> will release the specification and immediately we will discover that there
>> was some relevant usage scenario we did not consider that therefore cannot
>> be covered by the values we have. We discussed adopting the "standards as
>> database" approach being taken by ISO TC 37 (hence my frequent references
>> to the ISO Data Category Repository in the past few week).
>>
>>  After discussion with Felix, however, we (Felix and I) see a solution:
>> our work is not to define the permissible values for most of this metadata.
>> Rather we provide a mechanism to point to the values people are using, as
>> we discussed with domain. This is the "glue" idea Felix mentions. That sets
>> aside the issue of *where* to define the values to support interoperability
>>
>
>  This of course only makes sense if there are already values being used.
> From Thomas and Declan I think this is the case for MT systems. In other
> words, we should not define new data categories saying that they are on the
> "glue" level and that some day they might play a rule in bringing systems
> together. For a new data category fulfilling this "glue" purpose, there
> needs to be implementations - two, as usual - that can make use of it.
>
>  Best,
>
>  Felix
>
>
>
>>  , but by focusing on just the glue it simplifies our implementation
>> requirements and testing greatly. So, for example, Pedro could post the
>> ontology of process trigger he is using and point to it in the
>> implementation with his partners, thus fulfilling the requirement for
>> implementation of the data category. But we do *not* need to agree and
>> standardize as a group on the possible values, a task that would make our
>> project exponentially more difficult and unwieldy, and we do not need to
>> implement specific values for the data category.
>>
>>  To take another example, in the quality data categories, this principle
>> means we would not define a quality metric ourselves, but rather ways to
>> point and reference external quality metrics.
>>
>>  So we need to keep this principle in mind for the complex data
>> categories: in most cases, we are defining *reference mechanisms*, not *
>> content/values*. We simply need to provide a way to point to the work of
>> others (either standardized or proprietary). If we are getting into any
>> sort of prescriptive description of what people *should or should not*be doing, we are exceeding our mandate.
>>
>>  Best,
>>
>>  Arle
>>
>>
>>  On Jul 4, 2012, at 12:21 , Felix Sasaki wrote:
>>
>> Thanks, and I very much agree. Arle recently told me that there was a
>> discussion at the ISO meeting in Madrid about whether MLW-LT will define or
>> refer to data categories, as provided by DCR. I would go the same route as
>> for domain: in these areas there is already a lot of existing metadata. ITS
>> 2.0 can serve "as a glue" to make it easier to use the metadata in various
>> systems.
>>
>>
>>
>
>
>  --
> Felix Sasaki
> DFKI / W3C Fellow
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Wednesday, 4 July 2012 12:34:20 UTC