[Minutes] mlw-lt WG call 2013-01-17 and additional info from Felix Sasaki on 2013-01-17 (public-multilingualweb-lt@w3.org from January 2013)

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 17 Jan 2013 10:03:37 +0100
To: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
Message-ID: <50F7BE69.4060209@w3.org>
Hi all,
minutes are at
http://www.w3.org/2013/01/16-mlw-lt-minutes.html
and below as text. I hope that I got the attendance right, please check.

At Christian: for the "disambiguation vs. term" discussion, see
http://www.w3.org/2013/01/16-mlw-lt-minutes.html#item06
For all people attending prague, see
http://www.w3.org/2013/01/16-mlw-lt-minutes.html#item10
and esp. the objectives
http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f#Objectives
which require some preparations from you.

Best,

Felix



    [1]W3C

       [1] http://www.w3.org/

                                - DRAFT -

                                MLW-LT WG

16 Jan 2013

    [2]Agenda

       [2] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0090.html

    See also: [3]IRC log

       [3] http://www.w3.org/2013/01/16-mlw-lt-irc

Attendees

    Present
           felix, karl, Marcis, philr, leroy, Ankit, shaunm, joerg,
           Clemens, Jirka, dave, Des, mdelolmo, renatb, Yves,
           guiseppe, milan, tadej, pablo, dF, Naoto, olaf

    Regrets
           dom, christian

    Chair
           felix

    Scribe
           daveL, fsasaki

Contents

      * [4]Topics
          1. [5]roll call
          2. [6]Meeting time
          3. [7]state of XLIFF mapping
          4. [8]New value for localization quality type
             "conformance"
          5. [9]Regular expression change
          6. [10]Disambiguation and term
          7. [11]annotorsRef
          8. [12]provenance record ordering
          9. [13]Test suite
         10. [14]prague f2f
         11. [15]xliff mapping implementation update (with David on
             the call)
         12. [16]metadata harvesting
      * [17]Summary of Action Items
      __________________________________________________________

roll call

    <fsasaki> checking attendance

    <fsasaki> scribe: daveL

    <fsasaki>
    [18]http://lists.w3.org/Archives/Public/public-multilingualweb-
    lt/2013Jan/0090.html

      [18] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0090.html

Meeting time

    <fsasaki> [19]http://www.doodle.com/pn6xa86rfbypmd2k

      [19] http://www.doodle.com/pn6xa86rfbypmd2k

    felix; there is no apparent slot that works. felix willl
    distribute a weekly alternating proposal

state of XLIFF mapping

    <fsasaki> scribe: fsasaki

    dave: haven't updated the mapping page a lot
    ... there is more work to be done to formalize the mapping
    ... and come up with examples
    ... I think we won't to focus on XLIFF 1.2 mapping first
    ... we were hoping that XLIFF 2 would be stable, but there is a
    delay
    ... focus on XLIFF 1.2 also helps with putting a demonstrator
    together

    yves: dave summarized everythign right
    ... in okapi we implemented ITS mapping on what we have
    ... it is partially implemented, ongoing

    dave: we will come back shortly on that
    ... wrt to interop between solas and CMS lion, also using okapi
    ... with the preparation for rome

    phil: it is now on our critical path for our implementation
    ... david said he would have a prototype a few weeks ago
    ... even if there is nothing final
    ... even if we would have a rough direction
    ... e.g. yves said that with xliff 1.2, he would use mrk markup
    ... even if we had directions what is easily acceptable
    ... otherwise it could hold up my implemetnation

    yves: the xliff 1.2 mapping is what we used for implementations
    ... most of the time it made sense
    ... we have tackled some of the standoff stuff
    ... it is also in the git repository (for okapi, scribe
    assumes)?

    <Yves_> yes

    phil: provenance and loc quality issue, rating are relevant for
    us here

    <Yves_> Location:
    [20]http://code.google.com/p/okapi/source/list?name=html5

      [20] http://code.google.com/p/okapi/source/list?name=html5

    phil: Yves' page for 1.2. we can certainly use that as our
    direction

    dave: will talk to david tomorrow about that

    phil: tx

New value for localization quality type "conformance"

    <daveL> scribe: daveL

    felix: asks if anyone has further thoughts, or supported for
    this new type

Regular expression change

    felix: no respeonses yet

    shaun: no update on this

    <fsasaki> ACTION: shaun to work on regex for validating regex
    subset proposal [recorded in
    [21]http://www.w3.org/2013/01/16-mlw-lt-minutes.html#action02]

    <trackbot> Created ACTION-385 - Work on regex for validating
    regex subset proposal [on Shaun McCance - due 2013-01-23].

Disambiguation and term

    felix: has been discussed in response to christian comment
    ... any further comments

    marcis: what is the goal?

    felix: christian suggested merging term and disambig data
    categories
    ... but response was that both had distinct use cases, that
    could merge by are valid individually

    marcis: would not want to drop data category, term is easier to
    implement and purpose is clear
    ... not so clear on disambiguation category, in terms of what
    is possible to do with this
    ... for example there may be other types that might be useful
    in the disambiguation use case
    ... and doing term management with disambig would make it very
    heavy
    ... so there might need to be more atribute specifically for
    named entity
    ... referencing input form W3C india recvied today

    tadej: motivation for separate data category was because it
    covered some use cases that fell out of the scope of
    terminology
    ... by providing some additional context
    ... but do see that there is some commonality
    ... Also term must remain to keep compatibility with named
    entity 1

    correction, > with terminology in ITS1

    jörg: still in favour of having the two data categories

    scribe: since dismabiguation can cover many other tasks in
    content or NLP processing
    ... whereas term is more specific

    pedro: the sort of text we mark up is different in both cases
    so it makes sense to keep the distinction

    tadej; agree granularities are quite limiting, or should we
    have more identifiers to support this

    scribe: but this might be more comlicating

    jorge: yes this would be more complicated, clearer as it is

    <fsasaki> [22]http://tinyurl.com/its20-testsuite-dashboard

      [22] http://tinyurl.com/its20-testsuite-dashboard

    felix: christian will dial in to f2f to discuss this and
    resolve the topic next week
    ... we also need to consider number of implementations, which
    are not so many, when considering any possible merger

    Des: agree with jorge, keep them separate as they are distinct
    use cases

    jorge: clarified, attributes as defined currently are clearer
    than making them more fine grained

    felix: reminds that W3C process requires responding which
    involves some work

    <Yves_> could we talk about annotorsRef
    [23]https://www.w3.org/International/multilingualweb/lt/track/i
    ssues/71 a bit during this call?

      [23] https://www.w3.org/International/multilingualweb/lt/track/issues/71

    felix: replying to a question from Dave: the current number of
    comments received is good

annotorsRef

    yves: for two data categories, proc and locqualiss, can have
    information from multiple annotators, but we have no way of
    doing this for annotatorRef
    ... for current implementation, we assume the most recent
    annotator is the correct one, but this is not ideal
    ... provenance especially has multiple items and requires
    annotationRef

    <fsasaki> daveL: will look into this thread

    <scribe> scribe: daveL

provenance record ordering

    phil: lets talk about the ordering of proveance

    <Yves_> provenance data category
    [24]https://www.w3.org/International/multilingualweb/lt/track/i
    ssues/72

      [24] https://www.w3.org/International/multilingualweb/lt/track/issues/72

    <fsasaki>
    [25]http://lists.w3.org/Archives/Public/public-multilingualweb-
    lt/2013Jan/0090.html

      [25] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0090.html

    <Arle_> I am back on the call.

    <fsasaki>
    [26]http://lists.w3.org/Archives/Public/public-multilingualweb-
    lt/2013Jan/0061.html

      [26] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0061.html

    <fsasaki>
    [27]http://lists.w3.org/Archives/Public/public-multilingualweb-
    lt/2013Jan/0066.html

      [27] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0066.html

    felix: this was a discussion of whether there was any
    implication between ordering and time of record

    <fsasaki>
    [28]http://lists.w3.org/Archives/Public/public-multilingualweb-
    lt/2013Jan/0055.html

      [28] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0055.html

    <fsasaki> (mails related to the discussion)

    phil: asks whether there should be a lack of date stamp

    <fsasaki> daveL: a date stamp was discussed

    <fsasaki> .. there is two aspects:

    <fsasaki> .. a lot of original requirements didn't have a
    strong need for a time stamp

    <fsasaki> .. the original requirement was about identifying
    rich enough so that we can differentiate

    <fsasaki> .. see e.g. "agent provenance" that used to include
    taht

    <fsasaki> .. the 2nd aspect:

    <fsasaki> .. we discussed whether the order of the proveancen
    records are added is significant

    <fsasaki> .. but from an implementation point of view it is
    again compliciated

    <fsasaki> .. and there hadn't be much a call for this during
    requirements gathering

    <fsasaki> .. "time" also has various aspects: start of a
    translation, finish, duration, ...

    <fsasaki> .. it is also a point that the provenance wg in w3c
    had addressed

    <fsasaki> .. so we just provide identifiers of who made the
    translation and revision

    <fsasaki> .. for knowing more there is a the provenance model

    <fsasaki> .. more = more about time

    <fsasaki> .. so in summary, there was no big requirement to
    have a time stamp

    <fsasaki> .. and *if* you want to do that, you can use the w3c
    prov model

    <fsasaki> .. I'll reply to that mail thread

    <fsasaki> pablo: I think provenance can stay as is

    <fsasaki> .. adding a time stamp can be useful and interesint -
    if every implementer is fine with that i'm fine too

    <scribe> scribe: daveL

    felix: adding tiestamp is a substantive change and would
    require another call, plus tests etc

Test suite

    <fsasaki>
    [29]http://lists.w3.org/Archives/Public/public-multilingualweb-
    lt/2013Jan/0090.html

      [29] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0090.html

    felix: from this week on be aware that people should stop using
    the google docs and they update the test suite master
    themselves

    <fsasaki>
    [30]http://lists.w3.org/Archives/Public/public-multilingualweb-
    lt/2012Dec/0087.html

      [30] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Dec/0087.html

    felix: we need still some input on tests still related to
    assertion (MUSTs0 which need suggestion for test for them

prague f2f

    <fsasaki>
    [31]http://www.w3.org/International/multilingualweb/lt/wiki/Pra
    gueJan2013f2f

      [31] http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f

    <fsasaki>
    [32]http://www.w3.org/International/multilingualweb/lt/wiki/Pra
    gueJan2013f2f#Objectives

      [32] http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f#Objectives

    felix: thanks to jirka for organising this

    <fsasaki>
    [33]http://www.w3.org/International/multilingualweb/lt/wiki/Pra
    gueJan2013f2f#Participants

      [33] http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f#Participants

    jirka: is you are not yet register, please do so asap. Numbers
    of people need to be known for wifi etc.

    felix: also need to know in advance when people want to dial in
    for organising the agenda

    <fsasaki>
    [34]http://www.w3.org/International/multilingualweb/lt/wiki/Pra
    gueJan2013f2f#Objectives

      [34] http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f#Objectives

    felix: going through objectives

    <fsasaki>
    [35]http://www.w3.org/International/multilingualweb/lt/wiki/Use
    _cases_-_high_level_summary

      [35] http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary

    felix: in particular the relationship between the different
    posters and links to where people can access them and update
    high level summary, adding any new use cases

    <fsasaki> daveL: some time to discuss preparing EU project
    review?

    felix: also brainstorm on activities for rest of year and new
    projects and synergy between them
    ... the Rome preparation should cover that.

    <fsasaki> scribe: fsasaki

    <omstefanov> as I will not be able to take part in the f2f
    Prague, but definitely intend to come to Rome, so please make
    sure preps for Rome are recording in writing

xliff mapping implementation update (with David on the call)

    david: phil asked on that, we got good comments from xyz
    ... status of xliff mapping - only written piece is xliff
    mapping wiki

    <dF>
    [36]http://www.w3.org/International/multilingualweb/lt/wiki/XLI
    FF_Mapping

      [36] http://www.w3.org/International/multilingualweb/lt/wiki/XLIFF_Mapping

    david: will work on this today, yesterday / today was EC
    deadline
    ... we should publish this as a note / PC
    ... what is the editorial setup for such a note?
    ... we will need an additional namespace itsx

    felix: update on implementation prototype?

    david: solas is consuming ITS2 categories
    ... like OKAPI does
    ... that is being tested as part of the test suite
    ... that is consumed by various components of solas
    architecture
    ... one is an MT broker
    ... works with different MT systems
    ... depends on the MT systems whether they can deal with ITS
    metadata
    ... moravia is contributing to that
    ... m4loc can be used as middleware
    ... in our current prototype the mt services exposes the m4loc
    service
    ... from the deliverable - open source xliff roundtripp
    ... the okapi filter interprets the ITS decoration
    ... then the mapping in the wiki is used
    ... it is consumed by middle ware open source component

    felix: would be good to see a demo

    david: will do, in prague and in rome

metadata harvesting

    ankit: we are waiting for some sort of data from cocomore

    felix: what data?

    ankit: we said that cocomore would provide us with annotated
    data

    ankit will provide module by prague f2f

    pedro: will have annotated data from spanish client
    ... client is the spanish gov tax office
    ... they will annotate with ITS metadata for this show case
    ... spanish content in HTML5
    ... we will generate english content
    ... and annotate it in the output of the real time system

    felix: so ankit could later use the data to test the module?

    ankit: training data is as much as you can get

    pedro: annotated data from cocomore is html content
    ... we will generate content in chinese and french
    ... so ankit can take that into account chinese, french, german
    in his system
    ... and spanish
    ... this will be german to english, german to french, german to
    chinese, german to spanish

    <Pedro> Showcase WP3 (Cocomore-Linguaserve) is German to
    Chinese and German to French

    <Clemens> right!

    <Pedro> Showcase WP4 (Linguaserve-Lucy-DCU) is the full demo
    Spanish to English, and partial demo Spanish to French and
    Spanish to German

    thanks for everybody for staying longer, meeting adjourned

Summary of Action Items

    [NEW] ACTION: shaun to work on regex for validating regex
    subset proposal [recorded in
    [37]http://www.w3.org/2013/01/16-mlw-lt-minutes.html#action02]

    [End of minutes]
      __________________________________________________________


     Minutes formatted by David Booth's [38]scribe.perl version
     1.137 ([39]CVS log)
     $Date: 2013-01-17 09:00:44 $

      [38] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
      [39] http://dev.w3.org/cvsweb/2002/scribe/
Received on Thursday, 17 January 2013 09:04:03 UTC