W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > October 2012

Re: ACTION-233: Update quality issue example to use the solution (XML in "script" tag) for standoff markup

From: Phil Ritchie <philr@vistatec.ie>
Date: Tue, 2 Oct 2012 12:22:02 +0100
To: Felix Sasaki <fsasaki@w3.org>
Cc: public-multilingualweb-lt@w3.org
Message-ID: <OFF7F947C4.2D10AFF8-ON80257A8B.003E4BE5-80257A8B.003E71C6@vistatec.ie>
OK, understood. Hmm. I think use of the script element will break my 
implementation. I'll have to check.

Phil.





From:   Felix Sasaki <fsasaki@w3.org>
To:     Phil Ritchie <philr@vistatec.ie>, 
Cc:     public-multilingualweb-lt@w3.org
Date:   02/10/2012 11:04
Subject:        Re: ACTION-233: Update quality issue example to use the 
solution (XML in "script" tag) for standoff markup





2012/10/2 Phil Ritchie <philr@vistatec.ie>
Felix 

Before I can answer the question can you tell me what the motivation for 
using the script tags is?

There are two motivations. One is based on
https://www.w3.org/International/multilingualweb/lt/wiki/LSP_Localization_Chain_Side_Use_Case_Demonstration
here you have ITS rules files inside HTML5. It seems that this is a 
requirement from Linguaserve: not rules linked, but inside HTML5. So far 
Linguaserve has put the rules files "just somewhere". That makes it 
invalid HTML5. With the rules in the "script" element, it becomes valid 
again.
The other motivation is that the standoff we had so far for HTML5 looked 
like this:

 <span its-loc-quality-issues-ref=#lq1>c'es</span> le contenu</p>
                <span id=lq1 
its-loc-quality-issues=its-loc-quality-issues>
                    <span
                        its-loc-quality-issue=its-loc-quality-issue
                        its-loc-quality-issue-coment="Sentence without 
capitalization"
                        its-loc-quality-issue-severity=30
                        its-loc-quality-issue-type=typographical></span>
                    <span
                        its-loc-quality-issue=its-loc-quality-issue
                        its-loc-quality-issue-coment="'c'es' is unknown. 
Could be 'c'est'"
                        its-loc-quality-issue-severity=50
                        its-loc-quality-issue-type=misspelling></span>
                </span>


 "span" is mis-used to "transport" standoff metadata in the "body" 
element. It works, but is not very clean. Hence "script" which is defined 
for that purpose, see
http://dev.w3.org/html5/spec/the-script-element.html 
about "application/xml" and other types:
"These types are explicitly listed here because they are poorly-defined 
types that are nonetheless likely to be used as formats for data blocks, 
and it would be problematic if they were suddenly to be interpreted as 
script by a user agent."
Jirka had mentioned this solution afternonn 26
http://www.w3.org/2012/09/26-mlw-lt-minutes.html
search for "current recommendation is to put the tool info xml into script 
in html"
and pointed us to the related DOM methods
https://developer.mozilla.org/en-US/docs/DOM/DOMParser

Felix


My demo in Prague used standoff without needing to wrap them in script 
tags. 

Phil.





From:        Felix Sasaki <fsasaki@w3.org> 
To:        public-multilingualweb-lt@w3.org, 
Date:        02/10/2012 09:17 
Subject:        ACTION-233: Update quality issue example to use the 
solution (XML in  "script" tag) for standoff markup 




Hi all, 

I updated the qaissue example to use XML in the script element, see 
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-locQualityIssue-html5-local-2 

the standoff metadata is now in a dedicated "script" element. See also 
http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/EX-locQualityIssue-html5-local-2.html 

http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/qaissues.js 


So this works, but I have a question to the implementors using HTML5 as an 
input for processing outside the browser. 
If you process 
http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/EX-locQualityIssue-html5-local-2.html 

with the validator.nu HTML5 parser, the content of "script" is not "seen" 
as XML. The output then is 

<html xmlns="http://www.w3.org/1999/xhtml">... 
<script type="application/xml" id="its-standoff-1"> 
  &lt;its:locQualityIssues xml:id="lq1" xmlns:its="
http://www.w3.org/2005/11/its"&gt; 
   &lt;its:locQualityIssue 
    locQualityIssueType="misspelling" 
    locQualityIssueComment="'c'es' is unknown. Could be 'c'est'" 
    locQualityIssueSeverity="50"/&gt; 
   &lt;its:locQualityIssue 
    locQualityIssueType="typographical" 
    locQualityIssueComment="Sentence without capitalization" 
    locQualityIssueSeverity="30"/&gt; 
  &lt;/its:locQualityIssues&gt; 
</script>...</html> 

So if we would have an XML-based tool that wants to pick up the ITS 
standoff information, it won't work.  
Currently, Linguaserve is using this approach 
https://www.w3.org/International/multilingualweb/lt/wiki/LSP_Localization_Chain_Side_Use_Case_Demonstration 

to embed ITS rules into an HTML file. I had hoped that the "script" 
element would have been an alternative - is it? 
I'm sure this is not a difficult problem, but we probably need some 
guidance for implementors who are not used to process HTML5. 

Felix 

-- 
Felix Sasaki 
DFKI / W3C Fellow 


************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.
www.vistatec.com
************************************************************



-- 
Felix Sasaki
DFKI / W3C Fellow


************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.

www.vistatec.com
************************************************************
Received on Tuesday, 2 October 2012 11:22:38 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:55 UTC