[Minutes] mlw-lt WG call 2013-01-17 and additional info

Hi all,
minutes are at
and below as text. I hope that I got the attendance right, please check.

At Christian: for the "disambiguation vs. term" discussion, see
For all people attending prague, see
and esp. the objectives
which require some preparations from you.




                                - DRAFT -

                                MLW-LT WG

16 Jan 2013


           felix, karl, Marcis, philr, leroy, Ankit, shaunm, joerg,
           Clemens, Jirka, dave, Des, mdelolmo, renatb, Yves,
           guiseppe, milan, tadej, pablo, dF, Naoto, olaf

           dom, christian


           daveL, fsasaki


      Topics
          1. [5]roll call
          2. [6]Meeting time
          3. [7]state of XLIFF mapping
          4. [8]New value for localization quality type
          5. [9]Regular expression change
          6. [10]Disambiguation and term
          7. [11]annotorsRef
          8. [12]provenance record ordering
          9. [13]Test suite
         10. [14]prague f2f
         11. [15]xliff mapping implementation update (with David on
             the call)
         12. [16]metadata harvesting
      * [17]Summary of Action Items

roll call

    <fsasaki> checking attendance

    <fsasaki> scribe: daveL


Meeting time

    <fsasaki> [19]http://www.doodle.com/pn6xa86rfbypmd2k

    felix; there is no apparent slot that works. felix willl
    distribute a weekly alternating proposal

state of XLIFF mapping

    <fsasaki> scribe: fsasaki

    dave: haven't updated the mapping page a lot
    ... there is more work to be done to formalize the mapping
    ... and come up with examples
    ... I think we won't to focus on XLIFF 1.2 mapping first
    ... we were hoping that XLIFF 2 would be stable, but there is a
    ... focus on XLIFF 1.2 also helps with putting a demonstrator

    yves: dave summarized everythign right
    ... in okapi we implemented ITS mapping on what we have
    ... it is partially implemented, ongoing

    dave: we will come back shortly on that
    ... wrt to interop between solas and CMS lion, also using okapi
    ... with the preparation for rome

    phil: it is now on our critical path for our implementation
    ... david said he would have a prototype a few weeks ago
    ... even if there is nothing final
    ... even if we would have a rough direction
    ... e.g. yves said that with xliff 1.2, he would use mrk markup
    ... even if we had directions what is easily acceptable
    ... otherwise it could hold up my implemetnation

    yves: the xliff 1.2 mapping is what we used for implementations
    ... most of the time it made sense
    ... we have tackled some of the standoff stuff
    ... it is also in the git repository (for okapi, scribe

    <Yves_> yes

    phil: provenance and loc quality issue, rating are relevant for
    us here

    <Yves_> Location:

    phil: Yves' page for 1.2. we can certainly use that as our

    dave: will talk to david tomorrow about that

    phil: tx

New value for localization quality type "conformance"

    <daveL> scribe: daveL

    felix: asks if anyone has further thoughts, or supported for
    this new type

Regular expression change

    felix: no respeonses yet

    shaun: no update on this

    <fsasaki> ACTION: shaun to work on regex for validating regex
    subset proposal [recorded in

    <trackbot> Created ACTION-385 - Work on regex for validating
    regex subset proposal [on Shaun McCance - due 2013-01-23].

Disambiguation and term

    felix: has been discussed in response to christian comment
    ... any further comments

    marcis: what is the goal?

    felix: christian suggested merging term and disambig data
    ... but response was that both had distinct use cases, that
    could merge by are valid individually

    marcis: would not want to drop data category, term is easier to
    implement and purpose is clear
    ... not so clear on disambiguation category, in terms of what
    is possible to do with this
    ... for example there may be other types that might be useful
    in the disambiguation use case
    ... and doing term management with disambig would make it very
    ... so there might need to be more atribute specifically for
    named entity
    ... referencing input form W3C india recvied today

    tadej: motivation for separate data category was because it
    covered some use cases that fell out of the scope of
    ... by providing some additional context
    ... but do see that there is some commonality
    ... Also term must remain to keep compatibility with named
    entity 1

    correction, > with terminology in ITS1

    jörg: still in favour of having the two data categories

    scribe: since dismabiguation can cover many other tasks in
    content or NLP processing
    ... whereas term is more specific

    pedro: the sort of text we mark up is different in both cases
    so it makes sense to keep the distinction

    tadej; agree granularities are quite limiting, or should we
    have more identifiers to support this

    scribe: but this might be more comlicating

    jorge: yes this would be more complicated, clearer as it is

    <fsasaki> [22]http://tinyurl.com/its20-testsuite-dashboard

    felix: christian will dial in to f2f to discuss this and
    resolve the topic next week
    ... we also need to consider number of implementations, which
    are not so many, when considering any possible merger

    Des: agree with jorge, keep them separate as they are distinct
    use cases

    jorge: clarified, attributes as defined currently are clearer
    than making them more fine grained

    felix: reminds that W3C process requires responding which
    involves some work

    <Yves_> could we talk about annotorsRef
    ssues/71 a bit during this call?

    felix: replying to a question from Dave: the current number of
    comments received is good


    yves: for two data categories, proc and locqualiss, can have
    information from multiple annotators, but we have no way of
    doing this for annotatorRef
    ... for current implementation, we assume the most recent
    annotator is the correct one, but this is not ideal
    ... provenance especially has multiple items and requires

    <fsasaki> daveL: will look into this thread

    <scribe> scribe: daveL

provenance record ordering

    phil: lets talk about the ordering of proveance

    <Yves_> provenance data category

    felix: this was a discussion of whether there was any
    implication between ordering and time of record


    <fsasaki> (mails related to the discussion)

    phil: asks whether there should be a lack of date stamp

    <fsasaki> daveL: a date stamp was discussed

    <fsasaki> .. there is two aspects:

    <fsasaki> .. a lot of original requirements didn't have a
    strong need for a time stamp

    <fsasaki> .. the original requirement was about identifying
    rich enough so that we can differentiate

    <fsasaki> .. see e.g. "agent provenance" that used to include

    <fsasaki> .. the 2nd aspect:

    <fsasaki> .. we discussed whether the order of the proveancen
    records are added is significant

    <fsasaki> .. but from an implementation point of view it is
    again compliciated

    <fsasaki> .. and there hadn't be much a call for this during
    requirements gathering

    <fsasaki> .. "time" also has various aspects: start of a
    translation, finish, duration, ...

    <fsasaki> .. it is also a point that the provenance wg in w3c
    had addressed

    <fsasaki> .. so we just provide identifiers of who made the
    translation and revision

    <fsasaki> .. for knowing more there is a the provenance model

    <fsasaki> .. more = more about time

    <fsasaki> .. so in summary, there was no big requirement to
    have a time stamp

    <fsasaki> .. and *if* you want to do that, you can use the w3c
    prov model

    <fsasaki> .. I'll reply to that mail thread

    <fsasaki> pablo: I think provenance can stay as is

    <fsasaki> .. adding a time stamp can be useful and interesint -
    if every implementer is fine with that i'm fine too

    <scribe> scribe: daveL

    felix: adding tiestamp is a substantive change and would
    require another call, plus tests etc

Test suite


    felix: from this week on be aware that people should stop using
    the google docs and they update the test suite master


    felix: we need still some input on tests still related to
    assertion (MUSTs0 which need suggestion for test for them

prague f2f


      prague f2f


      [32] http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f#Objectives

    felix: thanks to jirka for organising this


      [33] http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f#Participants

    jirka: is you are not yet register, please do so asap. Numbers
    of people need to be known for wifi etc.

    felix: also need to know in advance when people want to dial in
    for organising the agenda


      [34] http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f#Objectives

    felix: going through objectives


      [35] http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary

    felix: in particular the relationship between the different
    posters and links to where people can access them and update
    high level summary, adding any new use cases

    <fsasaki> daveL: some time to discuss preparing EU project

    felix: also brainstorm on activities for rest of year and new
    projects and synergy between them
    ... the Rome preparation should cover that.

    <fsasaki> scribe: fsasaki

    <omstefanov> as I will not be able to take part in the f2f
    Prague, but definitely intend to come to Rome, so please make
    sure preps for Rome are recording in writing

xliff mapping implementation update (with David on the call)

    david: phil asked on that, we got good comments from xyz
    ... status of xliff mapping - only written piece is xliff
    mapping wiki


      xliff mapping implementation update (with David on the call)

    david: will work on this today, yesterday / today was EC
    ... we should publish this as a note / PC
    ... what is the editorial setup for such a note?
    ... we will need an additional namespace itsx

    felix: update on implementation prototype?

    david: solas is consuming ITS2 categories
    ... like OKAPI does
    ... that is being tested as part of the test suite
    ... that is consumed by various components of solas
    ... one is an MT broker
    ... works with different MT systems
    ... depends on the MT systems whether they can deal with ITS
    ... moravia is contributing to that
    ... m4loc can be used as middleware
    ... in our current prototype the mt services exposes the m4loc
    ... from the deliverable - open source xliff roundtripp
    ... the okapi filter interprets the ITS decoration
    ... then the mapping in the wiki is used
    ... it is consumed by middle ware open source component

    felix: would be good to see a demo

    david: will do, in prague and in rome

metadata harvesting

    ankit: we are waiting for some sort of data from cocomore

    felix: what data?

    ankit: we said that cocomore would provide us with annotated

    ankit will provide module by prague f2f

    pedro: will have annotated data from spanish client
    ... client is the spanish gov tax office
    ... they will annotate with ITS metadata for this show case
    ... spanish content in HTML5
    ... we will generate english content
    ... and annotate it in the output of the real time system

    felix: so ankit could later use the data to test the module?

    ankit: training data is as much as you can get

    pedro: annotated data from cocomore is html content
    ... we will generate content in chinese and french
    ... so ankit can take that into account chinese, french, german
    in his system
    ... and spanish
    ... this will be german to english, german to french, german to
    chinese, german to spanish

    <Pedro> Showcase WP3 (Cocomore-Linguaserve) is German to
    Chinese and German to French

    <Clemens> right!

    <Pedro> Showcase WP4 (Linguaserve-Lucy-DCU) is the full demo
    Spanish to English, and partial demo Spanish to French and
    Spanish to German

    thanks for everybody for staying longer, meeting adjourned

Summary of Action Items

    [NEW] ACTION: shaun to work on regex for validating regex
    subset proposal [recorded in

    [End of minutes]

