[Minutes] MLW-LT working group call 2012-08-09

... are at http://www.w3.org/2012/08/09-mlw-lt-minutes.html and below as
text. I change some parts of the raw minutes - you can still find these at

http://www.w3.org/2012/08/09-mlw-lt-irc.html

Best,

Felix

   [1]W3C

      [1] http://www.w3.org/

                               - DRAFT -

                               MLW-LT WG

09 Aug 2012

   [2]Agenda

      [2] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0148.html

   See also: [3]IRC log

      [3] http://www.w3.org/2012/08/09-mlw-lt-irc

Attendees

   Present
          arle, davidF, dom, fsasaki, leroy, Yves, olaf, phil, des

   Regrets
          Pedro, Shaun, Milan, Raphael, Pablo, Giuseppe, Dave,
          tadej

   Chair
          David

   Scribe
          Arle, DomJones, fsasaki

Contents

     * [4]Topics
         1. [5]agenda review
         2. [6]quality discussion
         3. [7]issue-42
         4. [8]NIF_RDF rounddrip
         5. [9]test suite
         6. [10]mtConfidence
         7. [11]Seattle event
     * [12]Summary of Action Items
     __________________________________________________________

   <dF> Declan and Jan sent regrets on top of the regrets recorded
   on agenda in advance..

   <dF> the Doodle based regrets: Pedro, Shaun, Milan, Raphael,
   Pablo, Giuseppe, Dave > *Additional regrets:* Tadej

agenda review

   <dF> [13]http://www.w3.org/2012/08/02-mlw-lt-minutes.html

     [13] http://www.w3.org/2012/08/02-mlw-lt-minutes.html

   df: Look at the minutes, any issues, any objections to these?
   ... accept the minutes and move onto topic 1

quality discussion

   dF: Topic 1 Quality Discussion. Need to discuss issue 42 which
   is a more general issue. Progress of category, time frame etc?
   ... arle please report

   <Arle> Current draft of Quality is here:
   [14]http://dl.dropbox.com/u/223919/dfki/mlw-lt/locQuality.html

     [14] http://dl.dropbox.com/u/223919/dfki/mlw-lt/locQuality.html

   Arle: Just posted link to spec draft, still not in correct form
   but please use for reference. At this point need agreement on
   attributes listed in section 6.x.3. These need to be agreed
   upon, with the exception of ??
   ... all those in top half of table are agreed upon by phil, me,
   yvves etc.
   ... second half needs people to comit to implementations.

   Felix: I think you can add me to the list to the people who
   agree on the information here but not on whether they will
   become attributes

   Arle: useful distinction. Each "Attribute Name" represents
   pieces of information. Need to nail down and agree upon these.
   At Felix, do we need to issue a call for consensus on this?

   Felix: No W3C process for this...

   dF: I think that the quality thing should be addressed in a
   structured way
   ... Arle is the owner of this, if consesnsus needs to be
   achieved we should do this

   Felix: But what if a decision is later overruled? All we can do
   is structure the discussion and come back to consensus later.

   dF: Clear every consensus can be overrulled but structuring a
   discussion ?

   fsasaki: Should start discussing the topic itself, not so much
   about the process

   dF: There is one action item action-168 which does not seem to
   have developed much... Arle can you comment?

   Arle: This has been ongoing, Yves has been active on this.
   Really the last piece of that was writing to ?cilgrave? about
   the XLIFF part.

   dF: Not many recorded emails on this.

   Arle: Lots of discussions going on elsewhere
   ... v. quickly. Some info that we have agreement on - we
   started out with the idea of having two seperate pieces to
   this, 1) What metric, process, tool has generated mark-up. This
   defines a q name with prefix and uri with more info
   ... think of it as a tool, metric, process signature.
   ... 2) Low quality score, allows a process to provide a score
   relavent to a docusment. 95, 32 etc, apply at document level.
   Some at moment are more inline, locQualityType, for example.
   ... these are designed for interoperability between tools.
   ... Allows common tagging between different tool.
   ... Low quality codes - Allows mapping of implementation tools
   to common set as well as passing over original code.

   Arle: These are the ones we have agreement upon, there are five
   there that we dont have agreement upon. I wont go through those
   but please look at online document.

   dF: Can this be wrapped up in August? Can a cut be made on
   information pieces that have not made process?

   Arle: I think so, these seem stable. I think we have consensus
   on them.

   dF: Are you prepared to cut those which are not mature enough?

   Arle: Yes. Except in the case of arguments and impl commitments
   from Phil, Yves, etc?

   dF: I would like to formalise this. Set an action to freeze
   number of information pieces. Would you be able to freeze the
   number by the next call, in a week??

   Felix: If you look at issue 42 some of these info pieces are
   the same across data categories... Im not saying that we would
   disagree but where they belong to we may disagree.

   Arle: That impacts the first two of these.. Whether they are
   here or move we need them. For all but first two (profile and
   score) we'll have a decision by next week?

   ACTION

   <fsasaki> ACTION: arle to freeze the number of information
   items in quality, with the reservation that some items might
   move to other areas [recorded in
   [15]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action01]

   <trackbot> Created -192 - Freeze the number of information
   items in quality, with the reservation that some items might
   move to other areas [on Arle Lommel - due 2012-08-16].

   <Arle> scribe: Arle

issue-42

   Felix: I was looking at the proposals we currently have and in
   a number of categories we have data about what generated it and
   the confidence in that. Text analysis, mt confidence, and
   quality all have similar issues. People have to separate issues
   generated by multiple tools. Another common aspect between
   these categories is that these pieces of information are kind
   of general settings that inherit through the tree to where you
   need them, much like the language
   ... In our case, you might specify one tool, or, if needed,
   multiple tools used for creating annotations.
   ... There is one issue: in Quality, you identify the model, but
   in the others it is a tool.

   <fsasaki>
   [16]http://lists.w3.org/Archives/Public/public-multilingualweb-
   lt/2012Aug/0149.html

     [16] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0149.html

   David: The common aspect is the state of inheritance, and that
   you may need to record multiple tools or models on the local
   level. How does the inheritance relate to global and local
   approaches?

   <DES> +q

   Felix: Global and local are just different ways to specify the
   metadata. But these are separate pieces of metadata. Once you
   have specified them (locally or globally) they inherit
   throughout the document.

   David: Like with translate and they way it can switch it on and
   off. So the issue really is to specify that these inherit,
   correct?

   Felix: I see this not as specific to these data categories, but
   rather as a separate data category. I'm not sure how you would
   describe the relationship from mtConfidence, quality, and text
   analysis to these. I don't yet know how it would work in
   detail.

   David: So you propose to introduce a generalized originator
   category. Isn't that like provenance?

   <fsasaki>
   [17]http://lists.w3.org/Archives/Public/public-multilingualweb-
   lt/2012Aug/0149.html

     [17] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0149.html

   <fsasaki> lingProcInfo

   Felix: That's a good point. There is a clear relationship. I
   just pasted a link Christian Lieske supplied on this. It might
   be provenance or a subcategory of provenance. It is important
   for at least three categories, but maybe for others. This is so
   specific that I think maybe we need a specific mechanism.
   Provenance is really about more complex information related to
   provenance. This is more about identifying the process used to
   create something. I'd rather see
   ... E.g., pointing to the tool or process.

   David: In Dublin I wanted provenance to be independent. I see
   only two options: (1) subsume it in provenance; (2) specialize
   it in the categories in question. For example, if I use the
   LISA QA Model, is it relevant to anything but quality. I don't
   think it would be problematic to have these done in specialized
   categories.
   ... I think this would work better to modularize ITS. But if we
   make them orthogonal, we should put them in provenance.

   Felix: But if we specialize them, we run into the issue we see
   with quality that the ITS inheritance model.

   David: So are you saying that ITS inheritance is for the
   content only, not the metadata?.

   Felix: If you want to apply the same type of data category
   multiple instances of a data category to the same node, you
   cannot do it. You can't say that Tool A gives one value and
   Tool B gives another value for the same piece of content.

   David: So you mean that if there are comparable originators,
   you can't apply multiple ones, correct?

   Felix: Yes.

   David: This won't be an issue for mtConfidence, because you are
   generally working with a single candidate at a time. If you
   need more, you should look at XLIFF or something.
   ... If you are composing a document from multiple sources, the
   normal inheritance model would work.

   <fsasaki> scribe: fsasaki

   arle: for quality, the normal inheritance model fails

   <scribe> scribe: Arle

   David: Would it be OK to state that inheritance is cancelled
   when two comparable originators are used on the same node?

   Felix: We need to consider backward compatibility, and also the
   test suite, which has examples where inheritance deletes one
   piece of information. The test suite is just one example where
   this change would go against running implementations.

   Phil: We are talking about child elements inheriting the
   metadata from a parent?

   Felix: Yes. It is CSS-like inheritance.

   Phil: Would it be permitted to replicate certain parts of the
   document when you need to apply multiple pieces to the same
   content? It would be building a pseudo-parent around multiple
   builds.

   David: That would be out of scope for us.

   Yves: What we could do is have a span with an attribute that
   points to an external element. That is stand-off annotation
   that could contain several entries, not just one.
   ... The inheritance model works fine in the document itself.

   Felix: Yves is saying you have a pointer in the document to the
   list of alternatives. By using the stand-off list you can have
   all the annotations you want.

   David: You wouldn't duplicate the content, but you would have a
   list of applicable metadata. This is a mechanism to be used for
   when there is clashing inheritance?

   Felix: Arle and I discussed having a separate section in the
   HTML5 document that is not displayed where you put this
   information and then you ship around a single document.

   David: I think we should specify this mechanism in a separate
   discussion.

   Felix: I think this is related to Issue-37. I'll create an
   example.

   <scribe> ACTION: Felix to create an HTML5 example of the
   externalized markup within a single file. [recorded in
   [18]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action02]

   <trackbot> Created ACTION-193 - Create an HTML5 example of the
   externalized markup within a single file. [on Felix Sasaki -
   due 2012-08-16].

   David: I think the high-level information is whether we keep
   the producer information in a specialized category, or whether
   we put it in provenance. I think we all agreed that in the case
   of clashing producers we have this other mechanism.

   Yves: It's not just about different producers, but also about
   cases where the same information is applied in multiple places.

   Felix: This is not producer-specific, but conflict-specific.

   David: The use case I am thinking of is about two different
   reviewers using the same quality model.

   <fsasaki> felix: or two different text analytics systems

   Phil: The general condition is that you want multiple pieces of
   metadata. Whether they conflict or not, you can accommodate
   both within a single node(?)

   <fsasaki>
   [19]http://lists.w3.org/Archives/Public/public-multilingualweb-
   lt/2012Aug/0149.html

     [19] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0149.html

   David: From the point of view of MT confidence, I don't think
   we need this special mechanism.

   Felix: One other point (see pasted link). One part for opening
   Issue-42 is the conflict discussion, part of the issue is that
   we want to describe tool-specific data. Arle and I need to
   create a way to describe what generated the data.

   David: I think we use the same templated piece about
   inheritance.

   Des: I have a related issue. Quality score is normalized, but
   agent isn't mandatory, but agent is mandatory for MT and text
   analytics. We need to be consistent across these.

   <fsasaki> +1 to des

   David: If you had to include multiple MT results, you have to
   replicate content, but text analytics can use multiple tools
   for one piece of content.

   <Yves_> +1 to des

   Felix: There are limits to harmonization, but let me make some
   examples.

   <philr> +1 to des

   David: Is anyone here to tell us anything.

   <fsasaki> ACTION: felix to work on issue-42, provide examples
   and template for various data categories [recorded in
   [20]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action03]

   <trackbot> Created ACTION-194 - Work on issue-42, provide
   examples and template for various data categories [on Felix
   Sasaki - due 2012-08-16].

NIF_RDF rounddrip

   <fsasaki> [21]http://wiki.nlp2rdf.org/wiki/ITS2NIF2ITS

     [21] http://wiki.nlp2rdf.org/wiki/ITS2NIF2ITS

   <fsasaki> Sebastian choose DBpedia Spotlight(web site) here as
   an example

   Felix: Short update. Sebestian Hellman did all the work, but
   see the wiki link I posted. It shows how to go from
   HTML/arbitrary XML to RDF in the NIF format. Various tools
   understand this format. One application scenario is to produce
   named entity annotation with the DBPedia Spotlight tool.

   The results can be integrated into the original XML. It
   provides a bridge to language-technology tools that use NIF. It
   does not impact the description of the data categories. I've
   started building a conversion. It will give us a nice bridge to
   other tooling.

test suite

   Dom: I'd like people to look at what we've done. I'm going to
   start looking at the output that tools might produce. So by the
   beginning of September we should have agreed upon input files
   and output formats and we can tie implementations against data
   categories for testing in Prague.
   ... We're happy with progress, but want others to take a look.

mtConfidence

   David: Yves pointed out some deficiencies. I will produce the
   next draft version. I won't touch the inheritance bit and would
   wait for Felix. But I think we only need normal inheritance
   here.

   <scribe> ACTION: dF to produce next draft of mtConfidence.
   [recorded in
   [22]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action04]

   <trackbot> Created ACTION-195 - Produce next draft of
   mtConfidence. [on David Filip - due 2012-08-16].

Seattle event

   <dF> *Topic 6* > *Seattle event* >
   [23]http://www.localizationworld.com/lwseattle2012/feisgiltt/ >
   Felix's Action-191 >
   [24]https://www.w3.org/International/multilingualweb/lt/track/a
   ctions/191 > Please tweet and retweet the I18n blog entry >
   [25]http://www.w3.org/blog/International/2012/08/06/speaking-pr
   oposals-for-feisgillt-event-open-until-august-14-dont-delay/ >
   Please indicate your attendance on LinkedIn:
   [26]http://linkd.in/Q5Tq7B > Submit speaking and demo proposals
   by August

     [23] http://www.localizationworld.com/lwseattle2012/feisgiltt/
     [24] https://www.w3.org/International/multilingualweb/lt/track/actions/191
     [25] http://www.w3.org/blog/International/2012/08/06/speaking-proposals-for-feisgillt-event-open-until-august-14-dont-delay/
     [26] http://linkd.in/Q5Tq7B

   <fsasaki> please spread the word :) :) :)

   David: Please Tweet, build buzz, etc.

   <fsasaki> thanks to dF for making all this happen!

   David: Thanks to Felix for publishing blog entry, etc.
   ... I'll leave housekeeping topics for the next weeks.
   ... I think they are self-explanatory. No need to extend the
   meeting for now.

   <fsasaki>
   [27]http://lists.w3.org/Archives/Public/public-multilingualweb-
   lt-commits/

     [27] http://lists.w3.org/Archives/Public/public-multilingualweb-lt-commits/

   Felix: One final item. I've created a list at this URL that
   shows the commits to the W3C CVS. It shows you what changes the
   editors make.

   Meeting closed.

Summary of Action Items

   [NEW] ACTION: arle to freeze the number of information items in
   quality, with the reservation that some items might move to
   other areas [recorded in
   [28]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action01]
   [NEW] ACTION: dF to produce next draft of mtConfidence.
   [recorded in
   [29]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action04]
   [NEW] ACTION: Felix to create an HTML5 example of the
   externalized markup within a single file. [recorded in
   [30]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action02]
   [NEW] ACTION: felix to work on issue-42, provide examples and
   template for various data categories [recorded in
   [31]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action03]

   [End of minutes]
     __________________________________________________________


    Minutes formatted by David Booth's [32]scribe.perl version
    1.136 ([33]CVS log)
    $Date: 2012/08/09 15:27:19 $

     [32] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
     [33] http://dev.w3.org/cvsweb/2002/scribe/

Received on Thursday, 9 August 2012 15:30:57 UTC