Re: ITS reset for inherited metadata

Thanks for the terminology fix (I've been wondering how to say that
properly)! Thanks also for the pointer to Kosek's converters. They do look
useful, however they aren't suitable for my application. Kosek's converters
are for converting between XHTML and HTML5, but I need to convert general
XML to valid HTML (for the purpose of later displaying ITS information with
the document). This means creating legal HTML representations of illegal
HTML that has ITS info attached (e.g. turning attributes into elements and
sticking them in the parent for display). The output HTML has to be
processable by general ITS/HTML tools (or at least the one we're making is
supposed to be general), so I can't use a homemade standoff markup. I'm
just documenting that my solution as partial for now.
I'll see about adding this feature suggestion to the wiki.

Nathan



On Tue, Aug 6, 2013 at 11:53 PM, Felix Sasaki <fsasaki@w3.org> wrote:

>  Hi Nathan,
>
> thank you very much for your comment. A general remark: if you or others
> propose new features for ITS 2.x, I would suggest that you add them to the
> interest group wiki
> http://www.w3.org/International/its/wiki
> in order to do so you will need a w3c account, see
> http://www.w3.org/Help/Account/
>
> Some comments below.
>
> Am 06.08.13 20:51, schrieb Nathan Glenn:
>
> Hello,
> In the course of the WICS project I have found need for a mechanism to
> undo metadata inheritance for a given element. I'd like to suggest it for
> the next version of ITS.
>
>  *Use case:* I am converting ITS-decorated XML into HTML while keeping
> ITS markup intact.
>
>
> I think above sentence is something else than you are actually doing: what
> you describe below is not "keeping ITS markup intact", but keeping the ITS
> *information* (= a serialization of ITS information that has been created
> via local markup / global rules / inheritance defaults, like e.g. in the
> test suite
>
> https://github.com/finnle/ITS-2.0-Testsuite/ITS-2.0-Testsuite/its2.0/expected/idvalue/html/idvalue1htmloutput.txt
> ) intact.
> "keeping ITS markup intact" during a conversion from XML to HTML is
> relatively simple: convert local its:* attributes to its-*. Jirka Kosek has
> created a set of stylesheets that realize roundtripping HTML <> XML
> https://github.com/kosek/html5-its-tools
> Global rules need to be adapted manually. But this should not be a big
> issue: normally you will not create one  rule set per XML document but
> rather per XML format (or per template in a given format).
>
>
>  Any non-element nodes that are somehow involved with ITS markup
> (attributes selected via global rules, comments/PIs etc. pointed to with
> *Pointers) are being converted into elements and pasted near the nodes they
> represent.
>
>
> In your case, you realize "near the nodes" as "converted into child
> elements". But nobody forces you to store the ITS information that way. You
> could also have a tool specific mechanism that converts the ITS markup from
> XML into ITS information in HTML into something like this:
>
> <html>...
>  <script type=application/xml>
>   <itsInfos>
>    <itsInfo idref="#p1">
>     <itsMetadata translate="yes" locNote="..:" ...></itsMetadata>
>      <attribute name="id">
>       <itsMetadata idValue="p1" translate="no"/>
>     </attribute>
>    </itsInfo>
>   </itsInfo>
>  </script>
> </head>
> <body>
> <p id="p1">...</p>
> </body></html>
>
>
> The "p" element contains an "id" attribute. Inside "script" there is an
> "itsInfo" element. Its "idref" attribute points to the idvalue. In this way
> the serialized ITS information is not inline, but "standoff". Note that
> this is a tool specific standoff mechanism, not ITS 2.0 standoff like for
> provenance or localization quality issue. The advantage compared to your
> approach is that putting the serialized ITS information into "script" keeps
> the file HTML valid.
>
>
>  In the original document, comments/PIs and usually attributes don't
> inherit ITS metadata. However, as elements these nodes *do* inherit ITS
> metadata. So in order to make the ITS metadata in the new document
> completely match the original, I have to undo this inheritance. This is
> possible for categories with default values (I can set dir="ltr", or
> localeFilter="*") but not for ones that don't. I could set locNote="", but
> that is not the same as the default (which is the non-existence of
> locNote). The affected categories are: localization note, language
> information, domain, provenance, localization quality issue, localization
> quality rating, MT confidence, and allowed characters.
>
>  *Desired ITS markup: *The markup that I really need to make this work is
> a selective metadata reset. In other words, I need to be able to say "don't
> inherit the following categories". Something like
> its:reset="mtConfidence;locNote;language-information". It would also be
> useful to have a shorthand to prevent inheritance of anything:
> its:reset="all". A global version would be nice, too.
>
>
>
> I think you can implement your use case in two ways:
>
> 1) When you convert to HTML, do not conversion of ITS information (i.e.
> your processing instructions or comments) into HTML, but rather convert the
> ITS markup (see above). An ITS 2 processor that understands HTML will then
> process your files without any issues.
>
> 2) If 1) doesn't work for you and you need to serialize the ITS
> information created in XML within the HTML file, choose an approach that
> does not require direct embedding of the ITS information inside HTML.
> Embedding the information via elements, btw., might not only break ITS
> inheritance (as you describe), but also other tools that check HTML
> (validators) or want to process it (inside or outside the browers).
>
> Best,
>
> Felix
>

Received on Thursday, 8 August 2013 00:28:25 UTC