ACTION-233: Update quality issue example to use the solution (XML in "script" tag) for standoff markup

Hi all,

I updated the qaissue example to use XML in the script element, see
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-locQualityIssue-html5-local-2
the standoff metadata is now in a dedicated "script" element. See also
http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/EX-locQualityIssue-html5-local-2.html
http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/qaissues.js

So this works, but I have a question to the implementors using HTML5 as an
input for processing outside the browser.
If you process
http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/EX-locQualityIssue-html5-local-2.html
with the validator.nu HTML5 parser, the content of "script" is not "seen"
as XML. The output then is

<html xmlns="http://www.w3.org/1999/xhtml">...
<script type="application/xml" id="its-standoff-1">
  &lt;its:locQualityIssues xml:id="lq1" xmlns:its="
http://www.w3.org/2005/11/its"&gt;
   &lt;its:locQualityIssue
    locQualityIssueType="misspelling"
    locQualityIssueComment="'c'es' is unknown. Could be 'c'est'"
    locQualityIssueSeverity="50"/&gt;
   &lt;its:locQualityIssue
    locQualityIssueType="typographical"
    locQualityIssueComment="Sentence without capitalization"
    locQualityIssueSeverity="30"/&gt;
  &lt;/its:locQualityIssues&gt;
</script>...</html>

So if we would have an XML-based tool that wants to pick up the ITS
standoff information, it won't work.
Currently, Linguaserve is using this approach
https://www.w3.org/International/multilingualweb/lt/wiki/LSP_Localization_Chain_Side_Use_Case_Demonstration
to embed ITS rules into an HTML file. I had hoped that the "script" element
would have been an alternative - is it?
I'm sure this is not a difficult problem, but we probably need some
guidance for implementors who are not used to process HTML5.

Felix

-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Tuesday, 2 October 2012 08:16:51 UTC