- From: Felix Sasaki <fsasaki@w3.org>
- Date: Fri, 2 Nov 2012 00:05:06 +0100
- To: public-multilingualweb-lt@w3.org
- Cc: Arle Lommel <arle.lommel@dfki.de>, Phil Ritchie <philr@vistatec.ie>
- Message-ID: <CAL58czpFhcGYjYubq+MoMYMZySNFvLybGSiuyAJUtOLwEAK_4g@mail.gmail.com>
Hi all,
minutes of today are at
http://www.w3.org/2012/11/01-mlw-lt-minutes.html
and below as text. Slides for the HTML WG meeting are at
http://www.w3.org/International/multilingualweb/lt/wiki/File:Its20-html5-tpac2012.pdf
The idea is that I present the slides, Frederik makes the demo, then we
discuss for about 15 min.
The Monday call
http://www.w3.org/International/multilingualweb/lt/wiki/Main_Page#Upcoming
will be 3 p.m. UTC, which is 4 p.m. in central Europe, see
http://www.timeanddate.com/worldclock/fixedtime.html?iso=20121105T15
Arle, Phil, we had some discussion about localization precis and the need
for its-* attributes to mimic XML standoff markup, and about localization
precis; so it would be quite helpful to have you on the call to discuss
this (hoping you will recover soon Arle, of course).
Best,
Felix
[1]W3C
[1] http://www.w3.org/
- DRAFT -
MLW-LT Lyon f2f
01 Nov 2012
[2]Agenda
[2] http://www.w3.org/International/multilingualweb/lt/wiki/LyonNov2012#Thursday_1st_Nov:_MLW-LT_WG_meeting_agenda
See also: [3]IRC log
[3] http://www.w3.org/2012/11/01-mlw-lt-irc
Attendees
Present
Ankit Bert(partially) Dave David Dom Felix Fredrik
JonasJacek Karl Leroy Mārcis Matthias Milan Moritz
Naoto(partially) Pablo SebastianSk Tadej(remote)
Yves(remote) Clemens jirka matthiasK mauricio mhellwig
pedro pablo renatb
Regrets
Chair
felix
Scribe
clemens, Yves_, daveL, fsasaki, Dom
Contents
* [4]Topics
1. [5]http://www.w3.org/International/multilingualweb/lt/
wiki/LyonNov2012#Thursday_1st_Nov:_MLW-LT_WG_meeting_a
genda
2. [6]self intro
3. [7]review implementation isssues
4. [8]test suite
5. [9]CMS-TMS demo by linguaserv and cocomore
6. [10]demo of online MT system
* [11]Summary of Action Items
__________________________________________________________
<fsasaki> ACTION: felix to follow-up on discussion about
regular expression for allowed characters, see
[12]http://www.w3.org/2012/10/31-mlw-minutes.html#item02
[recorded in
[13]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action01]
[12] http://www.w3.org/2012/10/31-mlw-minutes.html#item02
<trackbot> Created ACTION-261 - Follow-up on discussion about
regular expression for allowed characters, see
[14]http://www.w3.org/2012/10/31-mlw-minutes.html#item02 on
Felix Sasaki - due 2012-11-08].
[14] http://www.w3.org/2012/10/31-mlw-minutes.html#item02
[15]http://www.w3.org/International/multilingualweb/lt/wiki/LyonNov20
12#Thursday_1st_Nov:_MLW-LT_WG_meeting_agenda
<http://www.w3.org/International/multilingualweb/lt/wiki/LyonNov2012#Thursday_1st_Nov:_MLW-LT_WG_meeting_agenda>
[15] http://www.w3.org/International/multilingualweb/lt/wiki/LyonNov2012#Thursday_1st_Nov:_MLW-LT_WG_meeting_agenda
Agenda accepted
self intro
<fsasaki> people doing self-intro
review implementation isssues
<fsasaki>
[16]http://www.w3.org/International/multilingualweb/lt/drafts/i
ts20/its20.html#domain-implementation
[16] http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#domain-implementation
<Yves_> changes for domain are: removing quotes and duplicates
<fsasaki> Need for a common representation of the ITS data
categories in XLIFF, see XLIFF_Mapping
felix: domain issue is "fine"
<Yves_> ITS to XLIFF working draft mapping:
[17]http://www.w3.org/International/multilingualweb/lt/wiki/XLI
FF_Mapping
[17] http://www.w3.org/International/multilingualweb/lt/wiki/XLIFF_Mapping
Implementors are waiting for the XLIFF mapping
XLIFF
current direction is that pointers are not be needed in XLIFF
<fsasaki> "mrk", "sm" and "em" elements in XLIFF. "mrk" and
"sm" would be extended
<fsasaki> XML Schema subset of regex for allowed characters
<fsasaki>
[18]http://www.w3.org/International/multilingualweb/lt/drafts/i
ts20/its20.html#allowedchars-implementation
[18] http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#allowedchars-implementation
<Yves_> the issue is that not all programming languages support
all XML schema regex pattern
<fsasaki>
[19]http://www.w3.org/2012/10/31-mlw-minutes.html#item02
[19] http://www.w3.org/2012/10/31-mlw-minutes.html#item02
we keep the draft as it is now and postpone the topic of
regular expression in allowed characters
<fsasaki> Allowed Characters regular expression for not
allowing HTML tags in content nodes where only plain text
content is allowed
<fsasaki> ACTION: pedro to write note about allowed characters
issue with help from Mauricio and karl - due 8. November
[recorded in
[20]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action02]
<trackbot> Created ACTION-262 - write note about allowed
characters issue with help from Mauricio and karl [on Pedro
Luis Díez Orzas - due 2012-11-08].
<fsasaki> "Need to add all the document HMTL tags (wrap the
content with html, head, body tags) so we can add a link to a
global rules XML "
<fsasaki>
[21]http://lists.w3.org/Archives/Public/public-multilingualweb-
lt/2012Nov/0005.html
[21] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Nov/0005.html
<scribe> ACTION: implying some its:rules in html tags [recorded
in
[22]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action03]
<trackbot> Sorry, couldn't find implying. You can review and
register nicknames at
<[23]http://www.w3.org/International/multilingualweb/lt/track/u
sers>.
[23] http://www.w3.org/International/multilingualweb/lt/track/users%3E.
<fsasaki>
[24]http://www.w3.org/TR/xml-i18n-bp/#relating-its-plus-xhtml
[24] http://www.w3.org/TR/xml-i18n-bp/#relating-its-plus-xhtml
<fsasaki> 1) input 18_8.xml
<fsasaki> 2) XML parsing of HTML fragment e.g. via validator.nu
library in 18_8.xml
<fsasaki> output of step 2): DOM or XML serizalization
<fsasaki> 3) normal ITS processing
<fsasaki> 0) input is "some HTML data"
<fsasaki> next step: step 2)
<fsasaki> next step: step 3)
Coffee break now
<Yves_> scribe: Yves_
mauricio: one possibility for solving this could be to changing
to XHTML the HTML content
dF: that would allow local ITS markup
Felix: so no change for the data category
... maybe some note on best practices?
dF: one problem Karl noted is the validator inside Drupal
Jirka: I can try to look at it and see it could be fixed
Felix: but do we need guidance for CMS in general?
dF: maybe in a non-normative section
Moritz: yes a best practice
<fsasaki> ACTION: david to summarize the options and the
recommendations related to HTML parsing workflow in the CMS,
see discussion at
[25]http://www.w3.org/2012/11/01-mlw-lt-irc#T10-13-41 recorded
in
[26]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action04]
[25] http://www.w3.org/2012/11/01-mlw-lt-irc#T10-13-41
<trackbot> Sorry, ambiguous username (more than one match) -
david
<trackbot> Try using a different identifier, such as family
name or username (eg. dlewis6, dfilip)
<fsasaki> ACTION: dfilip to summarize the options and the
recommendations related to HTML parsing workflow in the CMS,
see discussion at
[27]http://www.w3.org/2012/11/01-mlw-lt-irc#T10-13-41 recorded
in
[28]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action05]
[27] http://www.w3.org/2012/11/01-mlw-lt-irc#T10-13-41
<trackbot> Created ACTION-263 - Summarize the options and the
recommendations related to HTML parsing workflow in the CMS,
see discussion at
[29]http://www.w3.org/2012/11/01-mlw-lt-irc#T10-13-41 on David
Filip - due 2012-11-08].
[29] http://www.w3.org/2012/11/01-mlw-lt-irc#T10-13-41
Felix: so maybe a best practice for CMS
<fsasaki> "Troubles with namespaces in HTML5. "
<pnietoca> xmlns:h="[30]http://www.w3.org/1999/xhtml"
[30] http://www.w3.org/1999/xhtml
Pedro: Jirka noted we need the namespace definition
<pnietoca>
[31]http://lists.w3.org/Archives/Public/public-multilingualweb-
lt/2012Oct/0226.html
[31] http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Oct/0226.html
Jirka: the namespace declaration is missing
... probably an implementation issue
... the node in the DOM should list the namespaces
... I can help if needed
Richard Ishida comes to discuss logistisc/topics with HTML5 WG
scribe: and I18N WG
<fsasaki> tomorrow 11 a.m. meeting with i18n wg
back to implementation issues
scribe: HTML5 namespace issue resolved it seems
<fsasaki> "Need to come to an agreement to map domain values to
be consistent for both Lucy and DCU's MT Systems. "
Felix: important for the showcase that the two system
communicate
<fsasaki> ACTION: thomas to follow up on domain list topic with
DCU [recorded in
[32]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action06]
<trackbot> Created ACTION-264 - Follow up on domain list topic
with DCU [on Thomas Rüdesheim - due 2012-11-08].
<fsasaki> "Problems to use global rules with the provenance and
quality metadata since ATLAS PW1 cannot place files on the
client server. "
Pedro: our system doesn't allow to create rules file on the
client
... rules file then points to our server, but that's not the
right way to do it
... also will we drop global rules?
David: important question for some data categories like
provenance, etc.
... some idea is to exclude some pointers
... but some use cases show some non-pointer global rules are
useful
... other issue is how to address attributes
... transltable attributes is not best practice,
... so maybe it's ok to drop global rules despite those use
cases
Felix: what about the quality issue
dF: well some attributes are not going away, like in HTML5
... no way to markup things like alt or title
Felix: two types of implementers: some using XLIFF extration,
other work directly with original data
... for XLIFF case this is not an issue
... for original data this is a problem
David: we should be consistent, if an attribute can be
translated, it should be able to get other data categories too
... an approach maybe could be to use a local attribute?
Felix: two topics one is where to put the global rules, the
other is do we need global rules
... with local, you can't address attributes
Tadej: so global rules in script
... ok with that solution
... also concern about a rule should not have pointer and
values at the same time
... basically not doing proxy stand-off annotation
Felix: yes, should we drop this statement?
... one could have combination of both in some cases
Tadej: this would cause clashes in some cases
... not sure if this affect stand-off
Felix: stand-off is different
... the URI points to a piece of info, so it still "add"
information
Tadej: so goal to to keep all info in one place rather than
scatter it around
Felix: If we keep the constraint what do we do with cases like
Quality Issue
David: is XLIFF the only use case for pointer in Quality issue
<fsasaki> yves: pointer for the reference attrbute to standoff
markup is needed
<fsasaki> .. at least in the XLIFF case
<fsasaki> felix: would that be resolved by a mapping table?
<fsasaki> yves: would not make the stuff processable by an ITS
processor
<fsasaki> yves: need for pointer would be just for reference
attribute, not the other pointers
David: if XLIFF is the only use case, should we make it a
special case?
<fsasaki> yves: can imagine an "Its only" processor to deal
with quality issues
<fsasaki> .. they can have special information for ITS
<fsasaki> .. but big question is: do we want ot allow that to
happen
<fsasaki> .. that is have other formats to work without XLIFF
markup
<fsasaki> .. if other formats don't allow the ITS native
mapping attribute, what to do?
<fsasaki> .. if XLIFF mrk allows extension, the problem goes
away
Felix: difficult to discuss because we depend on XLIFF
decission
<fsasaki> yves: we need to wait for next week XLIFF meeting, if
is not resolved, we need to find a different way
<fsasaki> ACTION: daveL to react to decision on extensibility
for mrk in XLIFF and the result for our "pointer" attributes
[recorded in
[33]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action07]
<trackbot> Created ACTION-265 - React to decision on
extensibility for mrk in XLIFF and the result for our "pointer"
attributes [on David Lewis - due 2012-11-08].
Felix: Example 54 and 55
... RDFa
... .. ITS generic processor can then work with RDFa
... Karl provided feedback
... is this a use case for Pointer
Tadej: the example as some small issue
<fsasaki> "typeof=http:/nerd.eurecom.fr/ontology#Place": only
single slash
Tadej: single slash missing and miss-named attribute
<tadej> Example 55: "entityTypeRefPointer" ->
"disambiguationClassRefPointer"
<tadej> "disambigClassRefPointer"
Felix: is that the only way to consume RFDa if you don't know
about it?
... it seems so
Tadej: yes
... main difference with Example 52
... it's much less stable, more prone to error
... using the pointer is more stable
... but see why 52 would not be recommended
... better for consistency using local would be better
<fsasaki> ACTION: felix to edit disambig example [recorded in
[34]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action08]
<trackbot> Created ACTION-266 - Edit disambig example [on Felix
Sasaki - due 2012-11-08].
Felix: Currently pointer/non-pointer can be mixed in
Disambiguation
... not good
... you should do either one
Tadej: agreed
<fsasaki> ACTION: felix to try to simplify disambiguation
globally [recorded in
[35]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action09]
<trackbot> Created ACTION-267 - Try to simplify disambiguation
globally [on Felix Sasaki - due 2012-11-08].
<fsasaki> "Language Information: Use-Case? When is xml:lang or
lang not enough or can't be used? "
karl: we have xml:lang or lang
... when using HTML we should use lang
... maybe the content of those attributes should be explicitely
defined
Felix: in section 6.7.1
... we say that BCP47 is the value
Dave: it's there for XML vaocabulary where xml:lang is not used
... no need for pointer in HTML5
Pedro: another case is to keep info about what was the original
language
<fsasaki> [36]http://www.w3.org/TR/xml-i18n-bp/
[36] http://www.w3.org/TR/xml-i18n-bp/
<fsasaki> Example 19: Declaring language information with a
non-standard mechanism
Pedro: but that's not for this.
Felix: yes, it's for XML
Pablo: was not able to come up with use case in HTML
Dave: so all three xml:lang, lang and pointer are relevant
<fsasaki> yves: various legacy content has other attributes for
language information, that is a use case for language
Information
<fsasaki> .. all those cases use bcp 47 as a value, so with the
pointer attribute we always get the same value
dF: what about the BCP47 extensions
Felix: ITS does not do validation of content
... could have a note about issue related to extensions
Pedro: maybe ITS 3.0 could have a way to indicate the original
language
Felix: could be done in provenance maybe
Moritz: so can we drop langInfo for HTML5?
<fsasaki> "[Ed. note: Add something about HTML5 lang]"
Felix: there is an editor note about it
<dF> [37]http://tools.ietf.org/html/rfc6497
[37] http://tools.ietf.org/html/rfc6497
<scribe> .. Done with the implementation issues
Felix: could start with test suite
Jirka: we need to start to work on schema
... several data categories are still not stable
... we need that for the publication
Felix: for disambiguation MTconfidance it's about pointer and
tool reference
for LQ Precis it's about implementation commitements
dF: name may prevent adoption
Felix: another detail is the question about global rules
... let's say cut off date in 3 weeks
... including the details with global rules and pointers
Jirka: maybe too short, need 2 weeks at least
... could do it in parallel
felix: Friday 23rd would be the cut off day
... then end of nov we have stable content
... then we have two more weeks for schema and tests
... not sure how realisitic it is in december
Leroy: should be ok for tests
Felix: we want to get to LC
... and we have a January f2f where we want to address comments
... if we send this at start of December people would have time
for commentd
... should we have a call in week of 26th for test/schema
Jirka: not sure,depends on time I need to work on schema
<fsasaki> ACTION: dom to make sure that schedule for test suite
and schema update discussed at
[38]http://www.w3.org/2012/11/01-mlw-lt-irc#T11-27-30 is taken
into account [recorded in
[39]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action10]
[38] http://www.w3.org/2012/11/01-mlw-lt-irc#T11-27-30
<trackbot> Sorry, couldn't find dom. You can review and
register nicknames at
<[40]http://www.w3.org/International/multilingualweb/lt/track/u
sers>.
[40] http://www.w3.org/International/multilingualweb/lt/track/users%3E.
<fsasaki> ACTION: DomJones to make sure that schedule for test
suite and schema update discussed at
[41]http://www.w3.org/2012/11/01-mlw-lt-irc#T11-27-30 is taken
into account [recorded in
[42]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action11]
[41] http://www.w3.org/2012/11/01-mlw-lt-irc#T11-27-30
<trackbot> Sorry, couldn't find DomJones. You can review and
register nicknames at
<[43]http://www.w3.org/International/multilingualweb/lt/track/u
sers>.
[43] http://www.w3.org/International/multilingualweb/lt/track/users%3E.
<fsasaki> ACTION: Dominic to make sure that schedule for test
suite and schema update discussed at
[44]http://www.w3.org/2012/11/01-mlw-lt-irc#T11-27-30 is taken
into account [recorded in
[45]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action12]
[44] http://www.w3.org/2012/11/01-mlw-lt-irc#T11-27-30
<trackbot> Created ACTION-268 - Make sure that schedule for
test suite and schema update discussed at
[46]http://www.w3.org/2012/11/01-mlw-lt-irc#T11-27-30 is taken
into account [on Dominic Jones - due 2012-11-08].
[46] http://www.w3.org/2012/11/01-mlw-lt-irc#T11-27-30
Pedro: LQ precis could be LQ metrics
dF: could bs LQ score
felix: lunch now. back in one hours
<daveL> scribe: daveL
test suite
DomJones: breif overview of test suite
<DomJones>
[47]https://docs.google.com/spreadsheet/ccc?key=0AgIk0-aoSKOadG
5HQmJDT2EybWVvVC1VbnF5alN2S3c#gid=0
[47] https://docs.google.com/spreadsheet/ccc?key=0AgIk0-aoSKOadG5HQmJDT2EybWVvVC1VbnF5alN2S3c#gid=0
DomJones: looking at these tables there are some features that
don't have two implementors
<scribe> ACTION: DomJones to ask Phil to confirm whether or not
he will implement provenance [recorded in
[48]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action13]
<trackbot> Sorry, couldn't find DomJones. You can review and
register nicknames at
<[49]http://www.w3.org/International/multilingualweb/lt/track/u
sers>.
[49] http://www.w3.org/International/multilingualweb/lt/track/users%3E.
Yves: not committing yet to provenance as it is not stable
Dom: Yves just confirmed he will support disambiguation
Tadej: will implement text analysis annotation if it is stable
dF: seems to be too unstable, needs to be resolve in
specification section
<fsasaki>
[50]http://www.w3.org/International/multilingualweb/lt/drafts/i
ts20/its20.html#EX-locQualityIssue-html5-local-1
[50] http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-locQualityIssue-html5-local-1
Dom: quality issue is partially covered by two implementors,
but different useages of using OKAPI is acceptable.
<fsasaki>
[51]http://www.w3.org/International/multilingualweb/lt/drafts/i
ts20/its20.html#EX-locQualityIssue-html5-local-2
[51] http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-locQualityIssue-html5-local-2
felix: quality has stand-off in script as well as in-line
option. Do we want both, as these are not reflected in the test
suite currently
... this choice has a lot of knock on for test suite
Yves: prefer the stand off in script over inline
<scribe> ACTION: Dave to check with Phil what his preference
was between quality issue locally with inline and script-based
stand off [recorded in
[52]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action14]
<trackbot> Sorry, couldn't find Dave. You can review and
register nicknames at
<[53]http://www.w3.org/International/multilingualweb/lt/track/u
sers>.
[53] http://www.w3.org/International/multilingualweb/lt/track/users%3E.
<scribe> ACTION: daveL to check with Phil what his preference
was between quality issue locally with inline and script-based
stand off [recorded in
[54]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action15]
<trackbot> Created ACTION-269 - Check with Phil what his
preference was between quality issue locally with inline and
script-based stand off [on David Lewis - due 2012-11-08].
dF: the script based appraoch may still recieve negative
reaction from the HMTL WG
Jirka: HTML WG will say to do this using microdata, but this
doesn't work in this case
... HTML parsing consdere XML is script like it was CDATA
... but best way is the link to an external file
... or use XHMLT rather than HTML - which may not suite
everyone
<fsasaki> its-rules and its-standoff references?
Yves: the refernece for standoff mark-up could be an external
file
felix: so in html would we have a separate link relation, in
additon to its-rules, for such mark-up
<fsasaki> felix: not sure if above is needed, just *one*
solution
df: prefer a span-based inline solution, vs a standoff in a
script
jirka: if using the inline version, rdfa or microdata would be
better, but HTML WG has stabalised this choice enought for us
to make a decision
felix: but to summarise, for cocomore/linguaserve, the script
is better since propagation back to client can be controlled
more clearly
... so there is slight preference for script solution in the
room
Jirka: agrees it not a nice solution but the best we can manage
given the problem of coexisitng between XML and HTML in general
felix: is there a volunteer to explore the use of external file
for standoff
... back to test suite
DomJones: now have two implementers for quallity issues
... quality Precis, has only partial coverage from UL and
vistaTEC
df: only interested if someone produces it
yves: there was interest from Des also in this
df: this seems still quite unstable, for example how valid is
the value of the score without the (optional) tool info
<fsasaki> ACTION: felix to ask phil and des and arle about need
and implementation committment for localization precis during
next call [recorded in
[55]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action16]
<trackbot> Created ACTION-270 - Ask phil and des and arle about
need and implementation committment for localization precis
during next call [on Felix Sasaki - due 2012-11-08].
dF: so this need more consideration especially about
interpretation of score
felix: highlight the feature at risk in a draft, which allows
us to cleanly remove after feature freeze - but not to change
the feature
DomJones: mtconfidence and alloewed text
... have also been now supported with imeplmentation
... asks what language people are using
... repsonses from room, Java, Javascript, php
... so tcd will try and provide some test sample code
...
... aim to have an online tutoril to help people with coding
the test suite
... suggest week 3rd December, suggest on Tuesday 4th december
yves: need to avoid XLIFF call 4pm GMT
DomJones: so aim for 2.30pm GMT, 3.30 central european time
<fsasaki>
[56]http://www.timeanddate.com/worldclock/fixedtime.html?iso=20
121204T14
[56] http://www.timeanddate.com/worldclock/fixedtime.html?iso=20121204T14
DomJones: we will record this anyway for epople who miss it
<fsasaki> above wordclock link is the time of the webinar for
the test suite
DomJones: now plan to switch to github for all the files and
use google docs as index sheet and recording
Leroy: tst output will change to be tab delmited and order the
output alphabetically as suggsted by Yves
... also use its as prefix in all cases
DomJones: when will files be frozen?
felix: after feature freeze, changes are unlikely, though
external comment may require some changes.
<fsasaki> its-term="yes" vs. its:term="yes"
<fsasaki> [57]http://www.w3.org/2012/09/mlw-lt-charter.html
[57] http://www.w3.org/2012/09/mlw-lt-charter.html
felix: according to charter, testing can continue until
october, but we should aim to be completed by March
<fsasaki> its-term="yes" vs. its:term="yes"
felix: clarification on prefixes in all cases is 'its:'
yves: could just drop prefix altogether
felix, leroy: agreed
DomJones: we will in new year work on making testing more
accessible to implementors outside of the WG, as part of the
general promotion of the of the spec and to encourage its
uptake
felix: will there be some documentation on this
Leroy: yes there will be some in the github and on the web page
felix: will the test parser we made available
DomJone: yes will will release that later targetted at non
working group members
CMS-TMS demo by linguaserv and cocomore
mhellwig: explains demo
... two use cases
... one where client have few locaslaition staff, so they needs
tools to add its marks
... second is where staff can enhance content with ITS, but can
change the content directly - this is actually higher priority
for client
<fsasaki> scribe: fsasaki
karl: demo - editing content
... in the body I can add the content and add metadata
... demo shows clicking on content, then adding a localization
note
... also specifying concepts, e.g. disambiguation target
... can also mark content that should not be translated or only
for certain languages
... that is lcoale filter metadata
... now about global metadata
... domain, revision agent, translation agent
... can add translate rules
... one field for the translate selector
... we think about helping the users by helping with creating
selectors
... another example of outputing the content as only HTML(5)
... this is one option of output, the other is the XML file
that we have seen from Linguaserve before
... now there is an example with checking the annotations:
... one can see the page and click on data category buttons,
then you can see the metadata available
... we now press a button and send the data to linguaserve
(switching machines, now working with Linguaserve machine)
Mauricio: now demo environment, internal workflow interface
... here we have received the file from Cocomore
... it has information to transform the drupal XML file to CAT
tool oriented files
... now executed the preparational step
... now simulating the translation of the file
... assuming that the file has been translated now
... setting simulation translation marks, will explain these
later
... the file is now ready to be downloaded by Cocomore
pedro: we don't go into their servers - we act as a server
... we don't go to the client's server
karl: now I am starting a manual cron job
... to check if there are new translations
... the cron job checks regularly if there are new translations
... the user doesn't need to do anything
... issue with cron job is that drupal has cron jobs
internally, so it takes some time until drupal gets to our cron
job
(karl's machine loading)
karl: translation received
... status changed from "in progress" to "needs review"
... now looking at the spanish node
... the language mgmt tab shows that all metadata is still here
... there is also additional metadata: revision agent and
translation agent
(demo continues with Mauricio's machine)
mauricio: now showing what happens with the XML file on our
side
... current page is linked from our use case description in the
wiki
... now looking at the XML file we saw before - 21_11_orig.xml
... that we saw this morning
... now pre-production step
... in the log there is the XPath of the nodes, will be used
for the test suite
... currently there is only "translate" metadata test suite
output in here, others will come later
... now cat-tool oriented XML file
... in this file you have various information pieces, e.g.
domain information that the translator will see
... inside the translatable content there is some marks about
parts that are not translatable
... now showing a really translated file
... now using this for the post-processing step
... now you see the translators file in the original format of
the client
... what has changed: e.g. the "ready to process" in the
readyness has changed
... orignially it had four stages: hTranslate, ...
... now the state is: publish
... that means it can be published on the client side
... the time stamp related to readyness also has changed
... also we have now translation provenance information
... local xml:lang attribute has changed from "de" to "es"
... next showing what will happen with the XML that we used to
translate
... content part that was marked as "translate=no" is blocked
so that the translators cannot change it
... also storage size is part of the file
... that is a file used in the transit translation tool
... once the engine does post-processing, the storarge size
will be applied
pedro: that is a simple way to allow the translator to control
the number of characters that are possible
... this is just one cat tool, but that can be done with other
cat toosl as well
... expansion of global rules has been done before, and will be
done in post-processing
<mdelolmo> Provenance XPath selector value: Does the semantic
combination between the Agent Provenance and Translate data
category rules validate the regular expression for Provenance
(//item)?
<mdelolmo> <its:transProvRule selector="//item"
transOrg="Linguaserve" transPerson="LSTranslator01"
transRevOrg="Linguaserve" transRevPerson="LSRevisor02"/>
<mdelolmo> <item id="18-body-0-format"
its:allowedCharacters="."
its:translate="no"><![CDATA[full_html]]></item>
discussion about interrelation between data categories
yves: you can imagine scenarios where tools would implement
only one data category
... would be better to have an XPath expression that only
selects the relevant nodes for provenance
... but that is probably marginal
demo of online MT system
<scribe> scribe: DomJOnes
Pablo: Global use of domain, second domain, mixture of
translate / domain.
… sample is allways the same with different use of tags.
… no ITS errors identified. Click translate, same page
presented but translated.
Pedro: Engine is supported by Lucy SW. Thomas absent, behind
this is Lucy S/W.
... Showing translate local
… examples in source text. Tags (translate yes/no) are set
randomly.
… showing example of translation based on ITS tag (translate
yes/no).
… showing file sent to MT and returned result.
… showing pre-mt file. Searches for "translate" tag, adds in
meta-tag for translator to see translate = no in example shown.
… goes to show source code of translated page, searches for
"translate" tag, they are not found, they have been cleaned and
removed from the text.
Felix: so after translation, the translate tags are removed?
yes.
Pablo: Showing example with translate global rules. All nodes
are translated but their children are not translatable.
… shows example being translated. Nodes are translated,
children are not. Showing the tags are being parsed correctly.
Marcais: Are the whole spans (including those with translate =
no) through translate system?
Pablo: Yes, all tags are sent through MT
Pedro: The reason is that beind the MT engine these can be the
post-editors. The PE may need to seem them for contextual
reasons. All data is sent to MT regardless of whether text is
to be translated or not.
Marcais: So its just kept as is in the translation (translate
=no)
Pablo: Yes. Shows the output that is translated and that which
is not.
… flicks between source code and webpage. Shows tags
(translate) have been removed.
<Yves_> Question: In the global rule it looks like the XPath
expression have no namespace, so how they can work with HTML?
Is that related to the issue you mentionned for Namespace in
HTML5 earlier today?
… shows domain example. Global rules declare domain. Tag,
pointer, mapping.
… multiple "economic" domain mappings are added.
Pedro: These domains are needed by the MT engine.
Pablo: shows the text being translated. Meta-data (domain) has
been removed.
… Shows global domain and a different engine based on the
domain.
<Jirka> Yves, you are right I was going to ask the same
question.
… same rule applies, but different engine is used.
felix: asks a question from Yves and Jirka about domain rule…
Jirka: Selector is not completely right for HTML5 where all
elements are placed in HTML namespace.
… I hope that this is not a final version, just a temporary
one?
Pablo: No problems, I'll change it. But will require some work.
... Shows another global rules element. Three different
selector for elements wrt translate yes/no
… different domain.
… When we use domain meta-data we divide the document into
different parts. What is sent to the MT engine is shown with
tags in place and ITS domain mapping.
… We divide the document into three parts.
… we send three requests to the MT engine.
… it takes longer to translate three as opposed to one,
Pablo: Shows final example applying domain meta-data to
different nodes. What is sent to the MT system has multiple
domain tags added to it. When page is translated it takes a
while…
Pedro: This negativly effects the performance of the MT engine.
We must take into account how the ITS usage adds to MT costs.
Milan: How many requests to the MT engine?
Pablo: around 15.
David F: The overhead of multiple fragments depends on the impl
of the MT. Some engines may perform better than others.
Pedro: This is how it works, you can balance different engines
etc, but this effects the price to the client.
Pablo: as you can see a cache system is present for translation
requests.
Milan: Whatabout creating a file per domain and sending three
requests, one for each file.
Pablo: Maybe it can be done but its easier to send one file.
Pedro: We are also using PEditors, we have to perform with a
CAT standard system.
Pablo: Showing caching system which results in faster
performance.
... Closes demo.
Yves: a general comment about domain value, when we do mapping
we dont mention Casing of the domain. Should they be
case-sensitive? Should we edit the spec to show this?
Felix: Are keywords case-sensitive in HTML?
jirka: keyword values are case-sensitive.
?: Maybe present in the algorithm, a lower case mapping
<scribe> ACTION: Yves to add a step regarding the lowercasing
of the domain data category [recorded in
[58]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action17]
<trackbot> Created ACTION-271 - Add a step regarding the
lowercasing of the domain data category [on Yves Savourel - due
2012-11-08].
david F: our general impl is insensitive to case sensitivity.
… in contract with MT provider you may need to address this but
not in our mapping.
Yves: Does it mean the value we return is lowercase as well or
should they stay as we get them?
David F: should be lowercase
<fsasaki> I assume "STEP 5: Return the resulting string." would
be "STEP 5: lowercase the resulting string and return it."
Marcis: general comment, on translation in general, whether to
not send something through an MT system and whether it can be
specified to be passed through MT engine or not. Sending a
string through MT which is not translated could overload the MT
speed. Q: Whether its possible to differentiate to send through
an MT system and no translate or do not send through MT system.
David Lewis: Problematic - locale filter allows you to not pass
through MT for specific locales. What was being shown was stuff
that was not MT'd as it was subsequently PE'd
Marcis: Some things you do not want to translate, or passed
through MT engine.
Pedro: Very impl dependent. Depends on what behaviour you want.
If something is not translated it matters for analysis of the
translation.
Marcis: If you know some text is important for MT and some is
not can you handle both?
Pedro: This is a problem with the Lucy MT provision.
… Matrex SMT works with chucks of data, so maybe we can see
whether there is an improvement but we dont know yet.
Pedro: Different techniques are used but we need to know how
SMT performs.
David F: In ITS we have translate, term, disam which can all be
combined to make a business case. But these problems are
business issues between LSP and MT provided. We should not
spend time on this discussion.
Pedro: This is both true and false. We need to test various
contracts between LSP and MT provider and this should be
documented.
David F: Agrees but not now.
David L: Has no translate mark-up come up with your clients?
Are they interested in this in terms of costs per words?
Pedro: The business model of this contract is not based on
that.
<pnietoca> You can find the DEMO link here
[59]http://www.w3.org/International/multilingualweb/lt/wiki/Use
_cases_-_high_level_summary#More_Information_and_Implementation
_Status.2FIssues_7
[59] http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#More_Information_and_Implementation_Status.2FIssues_7
Felix: Showing screen. 30mins to go, would like to take Bert's
presence here to ITS 1.0 uses XPATH 2.0 uses extraction. Shows
example of translate data cat, which can be used globally with
absolute selectors. 2 subjects, 1: defining selector, 2: using
CSS selector at level 3. Think this is the right thing to do
based on discussions with Bert. Might be various selectors only
relevant to CSS for XPATH.
<pnietoca> and here
[60]https://www.w3.org/International/multilingualweb/lt/wiki/On
line_MT_Systems_Use_Case_Demonstration#Use_Case_Demonstration
[60] https://www.w3.org/International/multilingualweb/lt/wiki/Online_MT_Systems_Use_Case_Demonstration#Use_Case_Demonstration
<pnietoca> might be down sometimes depending on the developing
… shows a rule example where node is selected based on
evaluated expression with parameter. Having the CSS selector is
easier for developers who may not know CSS in detail.
… we dont have anyone who has yet said they would implement CSS
selectors. This is feature-risk as no implementor yet
avaliable.
… however during break bert said people are working on CSS
selectors to XPATH. We could say we support a translation
mechanism from CSS selectors to Xpath. CSS level 3.
Jirka: I thought this was already done?
… the code should exist.
Felix: The developer implemented in python, people are working
with other frameworks. Wanted to bring this up while bert is
here.
… nice to have mechaism to use CSS selector and this to be
translated in XPATH. Is this something to specifiy in the Spec,
do we point people to libaries etc?
Dave L: If there was mapping from CSS to XPATH do we not need
to mention CSS as part of the normative spec?
Felix: If mapping could be part of the spec without being used
directly it would be good.
Cocomore?: people are maybe more familiar with CSS selector so
would be better to allow users to use CSS selectors. makes more
sense if these classes can be selected more easily.
Felix: There is a use-case
… any other thoughts on this?
<mhellwig> s/Cocomore?/kfritsche
Jirka: Is someone working on in-browser implementation.
Yves: I think so as some impl are using CSS. If we go the CSS
path what does it change for the test cases? Do we need 2
test-cases for both XPATH and CSS?
Dave L: This question has been asked before.
David F: Doubling the test-suite.
Felix: Just a conversion step, conversion test-case.
… other option is we drop the support for CSS selectors. 1)
Full direct support, 2) convergence support or 3) we drop it
Jirka: Who asked for this?
Milan: Maybe it was Phil R?
Dave L: Interoperability is help full in passing file from one
system to another.
Felix: Another solution: Dont change anything in the spec, not
make it a feature and see if it can be done in an impl
… we would not formally specifiy this step.
… define the functionality without using it formally.
Jirka: keep CSS with reserved word for query language.
Felix: If we dont change anything we put this feature at risk,
see if someone comes up with solution next year.
David L: Increases volume of test suite but maybe not the
implementation work
Bert: You can only select elements and not attributes..
Jirka: So CSS test cases would be smaller. In CSS the selector
is not case sensitive. HTML not case sensitive, XML would be.
…Some algorithm to transform CSS to XPATH would not be easy as
case vs. case-insensitive problem.
Felix: For XML there is no strong use-case
… only for thsoe using HTML
Bert: There is no official mapping but everything but the
pseudo elements can be mapped to XPATH.
Jirka: It depends on user-interaction
David L: Therefore these things may need to be removed.
Jirka: No these user-interactions will never be applicable
Felix: For the time being no conclusion on this issue but if we
dont change anything and no-one impl it may cause an issue. We
need to mark this as a feature at risk.
David F: For people to understand the commitment they are
making this should be mapped onto the google spreadsheet.
Felix: I will take an action to explain this and see take-up
… this is an opportunity to recruit more rule authors who can
use CSS selectors as opposed to learning XPATH.
<fsasaki> ACTION: felix to make sure that css selectors are
marked as feature at risk in the draft, and explain the
rational in a mail [recorded in
[61]http://www.w3.org/2012/11/01-mlw-lt-minutes.html#action18]
<trackbot> Created ACTION-272 - Make sure that css selectors
are marked as feature at risk in the draft, and explain the
rational in a mail [on Felix Sasaki - due 2012-11-08].
Felix: Topics you want for the agenda tomorrow.
… we have joint meeting with HTML, planning for next year, XML
Prague for example. Last workshop in Nov, plans for 2014
… editorial discussions for specification.
… please see doodle poll for a few more people to commit to
editing meetings.
<fsasaki> [62]http://doodle.com/heh7k59h7vkvnv88
[62] http://doodle.com/heh7k59h7vkvnv88
Jirka: Will the next meeting be in Prague?
Felix: Yes, 23th, 24th of January.
... Any other topics? Issues?
David F: XLIFF mapping task-force meeting should happen this
week. We said we'd circulate this yesterday but we didn't. Can
we schedule for tomorrow? Propose 4pm French time.
… 3 pm
Felix: 2-3pm free time tomorrow for implementation discussions.
Jirka: Who will go at 9am to joint meeting with HTML?
Felix: Hope all with join.
… 9am we meet with the HTML group for 30 mins.
Felix: I have asked for a review from the HTML WG on the
section on HTML and ITS written by Jirka. Wanted to ask
Frederick a demo which Yves gave in Prague. People see what
happens to the HTML.
… would be good to have a 5-10 min demo.
Jirka: Closer to 5min
… @felix do you have examples of HTML markup in your slideS?
Felix: Yes I do.
Jirka: Should also show HTML validator for the WG. Simple,
example driven presentation.
... Our goal is to get a commitment to get a review from them.
Felix: Looking for a commitment for a contact
Tadej: Plan to discuss tool-info data category tomorrow, would
be helpful. I would have some comments on this, if we could
discuss
Received on Thursday, 1 November 2012 23:05:33 UTC