- From: Felix Sasaki <fsasaki@w3.org>
- Date: Thu, 9 Aug 2012 17:30:20 +0200
- To: public-multilingualweb-lt@w3.org
- Message-ID: <CAL58czrUGnDm8-kAv96v9EuDs+ihCM-BE+6ifiaVGbM0U-PGuw@mail.gmail.com>
... are at http://www.w3.org/2012/08/09-mlw-lt-minutes.html and below as
text. I change some parts of the raw minutes - you can still find these at
http://www.w3.org/2012/08/09-mlw-lt-irc.html
Best,
Felix
[1]W3C
[1] http://www.w3.org/
- DRAFT -
MLW-LT WG
09 Aug 2012
[2]Agenda
[2] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0148.html
See also: [3]IRC log
[3] http://www.w3.org/2012/08/09-mlw-lt-irc
Attendees
Present
arle, davidF, dom, fsasaki, leroy, Yves, olaf, phil, des
Regrets
Pedro, Shaun, Milan, Raphael, Pablo, Giuseppe, Dave,
tadej
Chair
David
Scribe
Arle, DomJones, fsasaki
Contents
* [4]Topics
1. [5]agenda review
2. [6]quality discussion
3. [7]issue-42
4. [8]NIF_RDF rounddrip
5. [9]test suite
6. [10]mtConfidence
7. [11]Seattle event
* [12]Summary of Action Items
__________________________________________________________
<dF> Declan and Jan sent regrets on top of the regrets recorded
on agenda in advance..
<dF> the Doodle based regrets: Pedro, Shaun, Milan, Raphael,
Pablo, Giuseppe, Dave > *Additional regrets:* Tadej
agenda review
<dF> [13]http://www.w3.org/2012/08/02-mlw-lt-minutes.html
[13] http://www.w3.org/2012/08/02-mlw-lt-minutes.html
df: Look at the minutes, any issues, any objections to these?
... accept the minutes and move onto topic 1
quality discussion
dF: Topic 1 Quality Discussion. Need to discuss issue 42 which
is a more general issue. Progress of category, time frame etc?
... arle please report
<Arle> Current draft of Quality is here:
[14]http://dl.dropbox.com/u/223919/dfki/mlw-lt/locQuality.html
[14] http://dl.dropbox.com/u/223919/dfki/mlw-lt/locQuality.html
Arle: Just posted link to spec draft, still not in correct form
but please use for reference. At this point need agreement on
attributes listed in section 6.x.3. These need to be agreed
upon, with the exception of ??
... all those in top half of table are agreed upon by phil, me,
yvves etc.
... second half needs people to comit to implementations.
Felix: I think you can add me to the list to the people who
agree on the information here but not on whether they will
become attributes
Arle: useful distinction. Each "Attribute Name" represents
pieces of information. Need to nail down and agree upon these.
At Felix, do we need to issue a call for consensus on this?
Felix: No W3C process for this...
dF: I think that the quality thing should be addressed in a
structured way
... Arle is the owner of this, if consesnsus needs to be
achieved we should do this
Felix: But what if a decision is later overruled? All we can do
is structure the discussion and come back to consensus later.
dF: Clear every consensus can be overrulled but structuring a
discussion ?
fsasaki: Should start discussing the topic itself, not so much
about the process
dF: There is one action item action-168 which does not seem to
have developed much... Arle can you comment?
Arle: This has been ongoing, Yves has been active on this.
Really the last piece of that was writing to ?cilgrave? about
the XLIFF part.
dF: Not many recorded emails on this.
Arle: Lots of discussions going on elsewhere
... v. quickly. Some info that we have agreement on - we
started out with the idea of having two seperate pieces to
this, 1) What metric, process, tool has generated mark-up. This
defines a q name with prefix and uri with more info
... think of it as a tool, metric, process signature.
... 2) Low quality score, allows a process to provide a score
relavent to a docusment. 95, 32 etc, apply at document level.
Some at moment are more inline, locQualityType, for example.
... these are designed for interoperability between tools.
... Allows common tagging between different tool.
... Low quality codes - Allows mapping of implementation tools
to common set as well as passing over original code.
Arle: These are the ones we have agreement upon, there are five
there that we dont have agreement upon. I wont go through those
but please look at online document.
dF: Can this be wrapped up in August? Can a cut be made on
information pieces that have not made process?
Arle: I think so, these seem stable. I think we have consensus
on them.
dF: Are you prepared to cut those which are not mature enough?
Arle: Yes. Except in the case of arguments and impl commitments
from Phil, Yves, etc?
dF: I would like to formalise this. Set an action to freeze
number of information pieces. Would you be able to freeze the
number by the next call, in a week??
Felix: If you look at issue 42 some of these info pieces are
the same across data categories... Im not saying that we would
disagree but where they belong to we may disagree.
Arle: That impacts the first two of these.. Whether they are
here or move we need them. For all but first two (profile and
score) we'll have a decision by next week?
ACTION
<fsasaki> ACTION: arle to freeze the number of information
items in quality, with the reservation that some items might
move to other areas [recorded in
[15]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action01]
<trackbot> Created -192 - Freeze the number of information
items in quality, with the reservation that some items might
move to other areas [on Arle Lommel - due 2012-08-16].
<Arle> scribe: Arle
issue-42
Felix: I was looking at the proposals we currently have and in
a number of categories we have data about what generated it and
the confidence in that. Text analysis, mt confidence, and
quality all have similar issues. People have to separate issues
generated by multiple tools. Another common aspect between
these categories is that these pieces of information are kind
of general settings that inherit through the tree to where you
need them, much like the language
... In our case, you might specify one tool, or, if needed,
multiple tools used for creating annotations.
... There is one issue: in Quality, you identify the model, but
in the others it is a tool.
<fsasaki>
[16]http://lists.w3.org/Archives/Public/public-multilingualweb-
lt/2012Aug/0149.html
[16] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0149.html
David: The common aspect is the state of inheritance, and that
you may need to record multiple tools or models on the local
level. How does the inheritance relate to global and local
approaches?
<DES> +q
Felix: Global and local are just different ways to specify the
metadata. But these are separate pieces of metadata. Once you
have specified them (locally or globally) they inherit
throughout the document.
David: Like with translate and they way it can switch it on and
off. So the issue really is to specify that these inherit,
correct?
Felix: I see this not as specific to these data categories, but
rather as a separate data category. I'm not sure how you would
describe the relationship from mtConfidence, quality, and text
analysis to these. I don't yet know how it would work in
detail.
David: So you propose to introduce a generalized originator
category. Isn't that like provenance?
<fsasaki>
[17]http://lists.w3.org/Archives/Public/public-multilingualweb-
lt/2012Aug/0149.html
[17] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0149.html
<fsasaki> lingProcInfo
Felix: That's a good point. There is a clear relationship. I
just pasted a link Christian Lieske supplied on this. It might
be provenance or a subcategory of provenance. It is important
for at least three categories, but maybe for others. This is so
specific that I think maybe we need a specific mechanism.
Provenance is really about more complex information related to
provenance. This is more about identifying the process used to
create something. I'd rather see
... E.g., pointing to the tool or process.
David: In Dublin I wanted provenance to be independent. I see
only two options: (1) subsume it in provenance; (2) specialize
it in the categories in question. For example, if I use the
LISA QA Model, is it relevant to anything but quality. I don't
think it would be problematic to have these done in specialized
categories.
... I think this would work better to modularize ITS. But if we
make them orthogonal, we should put them in provenance.
Felix: But if we specialize them, we run into the issue we see
with quality that the ITS inheritance model.
David: So are you saying that ITS inheritance is for the
content only, not the metadata?.
Felix: If you want to apply the same type of data category
multiple instances of a data category to the same node, you
cannot do it. You can't say that Tool A gives one value and
Tool B gives another value for the same piece of content.
David: So you mean that if there are comparable originators,
you can't apply multiple ones, correct?
Felix: Yes.
David: This won't be an issue for mtConfidence, because you are
generally working with a single candidate at a time. If you
need more, you should look at XLIFF or something.
... If you are composing a document from multiple sources, the
normal inheritance model would work.
<fsasaki> scribe: fsasaki
arle: for quality, the normal inheritance model fails
<scribe> scribe: Arle
David: Would it be OK to state that inheritance is cancelled
when two comparable originators are used on the same node?
Felix: We need to consider backward compatibility, and also the
test suite, which has examples where inheritance deletes one
piece of information. The test suite is just one example where
this change would go against running implementations.
Phil: We are talking about child elements inheriting the
metadata from a parent?
Felix: Yes. It is CSS-like inheritance.
Phil: Would it be permitted to replicate certain parts of the
document when you need to apply multiple pieces to the same
content? It would be building a pseudo-parent around multiple
builds.
David: That would be out of scope for us.
Yves: What we could do is have a span with an attribute that
points to an external element. That is stand-off annotation
that could contain several entries, not just one.
... The inheritance model works fine in the document itself.
Felix: Yves is saying you have a pointer in the document to the
list of alternatives. By using the stand-off list you can have
all the annotations you want.
David: You wouldn't duplicate the content, but you would have a
list of applicable metadata. This is a mechanism to be used for
when there is clashing inheritance?
Felix: Arle and I discussed having a separate section in the
HTML5 document that is not displayed where you put this
information and then you ship around a single document.
David: I think we should specify this mechanism in a separate
discussion.
Felix: I think this is related to Issue-37. I'll create an
example.
<scribe> ACTION: Felix to create an HTML5 example of the
externalized markup within a single file. [recorded in
[18]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action02]
<trackbot> Created ACTION-193 - Create an HTML5 example of the
externalized markup within a single file. [on Felix Sasaki -
due 2012-08-16].
David: I think the high-level information is whether we keep
the producer information in a specialized category, or whether
we put it in provenance. I think we all agreed that in the case
of clashing producers we have this other mechanism.
Yves: It's not just about different producers, but also about
cases where the same information is applied in multiple places.
Felix: This is not producer-specific, but conflict-specific.
David: The use case I am thinking of is about two different
reviewers using the same quality model.
<fsasaki> felix: or two different text analytics systems
Phil: The general condition is that you want multiple pieces of
metadata. Whether they conflict or not, you can accommodate
both within a single node(?)
<fsasaki>
[19]http://lists.w3.org/Archives/Public/public-multilingualweb-
lt/2012Aug/0149.html
[19] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0149.html
David: From the point of view of MT confidence, I don't think
we need this special mechanism.
Felix: One other point (see pasted link). One part for opening
Issue-42 is the conflict discussion, part of the issue is that
we want to describe tool-specific data. Arle and I need to
create a way to describe what generated the data.
David: I think we use the same templated piece about
inheritance.
Des: I have a related issue. Quality score is normalized, but
agent isn't mandatory, but agent is mandatory for MT and text
analytics. We need to be consistent across these.
<fsasaki> +1 to des
David: If you had to include multiple MT results, you have to
replicate content, but text analytics can use multiple tools
for one piece of content.
<Yves_> +1 to des
Felix: There are limits to harmonization, but let me make some
examples.
<philr> +1 to des
David: Is anyone here to tell us anything.
<fsasaki> ACTION: felix to work on issue-42, provide examples
and template for various data categories [recorded in
[20]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action03]
<trackbot> Created ACTION-194 - Work on issue-42, provide
examples and template for various data categories [on Felix
Sasaki - due 2012-08-16].
NIF_RDF rounddrip
<fsasaki> [21]http://wiki.nlp2rdf.org/wiki/ITS2NIF2ITS
[21] http://wiki.nlp2rdf.org/wiki/ITS2NIF2ITS
<fsasaki> Sebastian choose DBpedia Spotlight(web site) here as
an example
Felix: Short update. Sebestian Hellman did all the work, but
see the wiki link I posted. It shows how to go from
HTML/arbitrary XML to RDF in the NIF format. Various tools
understand this format. One application scenario is to produce
named entity annotation with the DBPedia Spotlight tool.
The results can be integrated into the original XML. It
provides a bridge to language-technology tools that use NIF. It
does not impact the description of the data categories. I've
started building a conversion. It will give us a nice bridge to
other tooling.
test suite
Dom: I'd like people to look at what we've done. I'm going to
start looking at the output that tools might produce. So by the
beginning of September we should have agreed upon input files
and output formats and we can tie implementations against data
categories for testing in Prague.
... We're happy with progress, but want others to take a look.
mtConfidence
David: Yves pointed out some deficiencies. I will produce the
next draft version. I won't touch the inheritance bit and would
wait for Felix. But I think we only need normal inheritance
here.
<scribe> ACTION: dF to produce next draft of mtConfidence.
[recorded in
[22]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action04]
<trackbot> Created ACTION-195 - Produce next draft of
mtConfidence. [on David Filip - due 2012-08-16].
Seattle event
<dF> *Topic 6* > *Seattle event* >
[23]http://www.localizationworld.com/lwseattle2012/feisgiltt/ >
Felix's Action-191 >
[24]https://www.w3.org/International/multilingualweb/lt/track/a
ctions/191 > Please tweet and retweet the I18n blog entry >
[25]http://www.w3.org/blog/International/2012/08/06/speaking-pr
oposals-for-feisgillt-event-open-until-august-14-dont-delay/ >
Please indicate your attendance on LinkedIn:
[26]http://linkd.in/Q5Tq7B > Submit speaking and demo proposals
by August
[23] http://www.localizationworld.com/lwseattle2012/feisgiltt/
[24] https://www.w3.org/International/multilingualweb/lt/track/actions/191
[25] http://www.w3.org/blog/International/2012/08/06/speaking-proposals-for-feisgillt-event-open-until-august-14-dont-delay/
[26] http://linkd.in/Q5Tq7B
<fsasaki> please spread the word :) :) :)
David: Please Tweet, build buzz, etc.
<fsasaki> thanks to dF for making all this happen!
David: Thanks to Felix for publishing blog entry, etc.
... I'll leave housekeeping topics for the next weeks.
... I think they are self-explanatory. No need to extend the
meeting for now.
<fsasaki>
[27]http://lists.w3.org/Archives/Public/public-multilingualweb-
lt-commits/
[27] http://lists.w3.org/Archives/Public/public-multilingualweb-lt-commits/
Felix: One final item. I've created a list at this URL that
shows the commits to the W3C CVS. It shows you what changes the
editors make.
Meeting closed.
Summary of Action Items
[NEW] ACTION: arle to freeze the number of information items in
quality, with the reservation that some items might move to
other areas [recorded in
[28]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action01]
[NEW] ACTION: dF to produce next draft of mtConfidence.
[recorded in
[29]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action04]
[NEW] ACTION: Felix to create an HTML5 example of the
externalized markup within a single file. [recorded in
[30]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action02]
[NEW] ACTION: felix to work on issue-42, provide examples and
template for various data categories [recorded in
[31]http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action03]
[End of minutes]
__________________________________________________________
Minutes formatted by David Booth's [32]scribe.perl version
1.136 ([33]CVS log)
$Date: 2012/08/09 15:27:19 $
[32] http://dev.w3.org/cvsweb/~checkout~/2002/scribe/scribedoc.htm
[33] http://dev.w3.org/cvsweb/2002/scribe/
Received on Thursday, 9 August 2012 15:30:57 UTC