Re: ITS reset for inherited metadata

Hi Nathan,

thank you very much for your comment. A general remark: if you or others 
propose new features for ITS 2.x, I would suggest that you add them to 
the interest group wiki
http://www.w3.org/International/its/wiki
in order to do so you will need a w3c account, see
http://www.w3.org/Help/Account/

Some comments below.

Am 06.08.13 20:51, schrieb Nathan Glenn:
> Hello,
> In the course of the WICS project I have found need for a mechanism to 
> undo metadata inheritance for a given element. I'd like to suggest it 
> for the next version of ITS.
>
> *Use case:* I am converting ITS-decorated XML into HTML while keeping 
> ITS markup intact.

I think above sentence is something else than you are actually doing: 
what you describe below is not "keeping ITS markup intact", but keeping 
the ITS *information* (= a serialization of ITS information that has 
been created via local markup / global rules / inheritance defaults, 
like e.g. in the test suite
https://github.com/finnle/ITS-2.0-Testsuite/ITS-2.0-Testsuite/its2.0/expected/idvalue/html/idvalue1htmloutput.txt
) intact.
"keeping ITS markup intact" during a conversion from XML to HTML is 
relatively simple: convert local its:* attributes to its-*. Jirka Kosek 
has created a set of stylesheets that realize roundtripping HTML <> XML
https://github.com/kosek/html5-its-tools
Global rules need to be adapted manually. But this should not be a big 
issue: normally you will not create one  rule set per XML document but 
rather per XML format (or per template in a given format).

> Any non-element nodes that are somehow involved with ITS markup 
> (attributes selected via global rules, comments/PIs etc. pointed to 
> with *Pointers) are being converted into elements and pasted near the 
> nodes they represent.

In your case, you realize "near the nodes" as "converted into child 
elements". But nobody forces you to store the ITS information that way. 
You could also have a tool specific mechanism that converts the ITS 
markup from XML into ITS information in HTML into something like this:

<html>...
  <script type=application/xml>
   <itsInfos>
    <itsInfo idref="#p1">
     <itsMetadata translate="yes" locNote="..:" ...></itsMetadata>
      <attribute name="id">
       <itsMetadata idValue="p1" translate="no"/>
     </attribute>
    </itsInfo>
   </itsInfo>
  </script>
</head>
<body>
<p id="p1">...</p>
</body></html>


The "p" element contains an "id" attribute. Inside "script" there is an 
"itsInfo" element. Its "idref" attribute points to the idvalue. In this 
way the serialized ITS information is not inline, but "standoff". Note 
that this is a tool specific standoff mechanism, not ITS 2.0 standoff 
like for provenance or localization quality issue. The advantage 
compared to your approach is that putting the serialized ITS information 
into "script" keeps the file HTML valid.

> In the original document, comments/PIs and usually attributes don't 
> inherit ITS metadata. However, as elements these nodes /do/ inherit 
> ITS metadata. So in order to make the ITS metadata in the new document 
> completely match the original, I have to undo this inheritance. This 
> is possible for categories with default values (I can set dir="ltr", 
> or localeFilter="*") but not for ones that don't. I could set 
> locNote="", but that is not the same as the default (which is the 
> non-existence of locNote). The affected categories are: localization 
> note, language information, domain, provenance, localization quality 
> issue, localization quality rating, MT confidence, and allowed characters.
>
> *Desired ITS markup: *The markup that I really need to make this work 
> is a selective metadata reset. In other words, I need to be able to 
> say "don't inherit the following categories". Something like 
> its:reset="mtConfidence;locNote;language-information". It would also 
> be useful to have a shorthand to prevent inheritance of anything: 
> its:reset="all". A global version would be nice, too.


I think you can implement your use case in two ways:

1) When you convert to HTML, do not conversion of ITS information (i.e. 
your processing instructions or comments) into HTML, but rather convert 
the ITS markup (see above). An ITS 2 processor that understands HTML 
will then process your files without any issues.

2) If 1) doesn't work for you and you need to serialize the ITS 
information created in XML within the HTML file, choose an approach that 
does not require direct embedding of the ITS information inside HTML. 
Embedding the information via elements, btw., might not only break ITS 
inheritance (as you describe), but also other tools that check HTML 
(validators) or want to process it (inside or outside the browers).

Best,

Felix

Received on Wednesday, 7 August 2013 07:54:13 UTC