- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 07 Aug 2013 08:53:55 +0200
- To: Nathan Glenn <garfieldnate@gmail.com>
- CC: public-i18n-its-ig@w3.org
- Message-ID: <5201EF03.5030107@w3.org>
Hi Nathan, thank you very much for your comment. A general remark: if you or others propose new features for ITS 2.x, I would suggest that you add them to the interest group wiki http://www.w3.org/International/its/wiki in order to do so you will need a w3c account, see http://www.w3.org/Help/Account/ Some comments below. Am 06.08.13 20:51, schrieb Nathan Glenn: > Hello, > In the course of the WICS project I have found need for a mechanism to > undo metadata inheritance for a given element. I'd like to suggest it > for the next version of ITS. > > *Use case:* I am converting ITS-decorated XML into HTML while keeping > ITS markup intact. I think above sentence is something else than you are actually doing: what you describe below is not "keeping ITS markup intact", but keeping the ITS *information* (= a serialization of ITS information that has been created via local markup / global rules / inheritance defaults, like e.g. in the test suite https://github.com/finnle/ITS-2.0-Testsuite/ITS-2.0-Testsuite/its2.0/expected/idvalue/html/idvalue1htmloutput.txt ) intact. "keeping ITS markup intact" during a conversion from XML to HTML is relatively simple: convert local its:* attributes to its-*. Jirka Kosek has created a set of stylesheets that realize roundtripping HTML <> XML https://github.com/kosek/html5-its-tools Global rules need to be adapted manually. But this should not be a big issue: normally you will not create one rule set per XML document but rather per XML format (or per template in a given format). > Any non-element nodes that are somehow involved with ITS markup > (attributes selected via global rules, comments/PIs etc. pointed to > with *Pointers) are being converted into elements and pasted near the > nodes they represent. In your case, you realize "near the nodes" as "converted into child elements". But nobody forces you to store the ITS information that way. You could also have a tool specific mechanism that converts the ITS markup from XML into ITS information in HTML into something like this: <html>... <script type=application/xml> <itsInfos> <itsInfo idref="#p1"> <itsMetadata translate="yes" locNote="..:" ...></itsMetadata> <attribute name="id"> <itsMetadata idValue="p1" translate="no"/> </attribute> </itsInfo> </itsInfo> </script> </head> <body> <p id="p1">...</p> </body></html> The "p" element contains an "id" attribute. Inside "script" there is an "itsInfo" element. Its "idref" attribute points to the idvalue. In this way the serialized ITS information is not inline, but "standoff". Note that this is a tool specific standoff mechanism, not ITS 2.0 standoff like for provenance or localization quality issue. The advantage compared to your approach is that putting the serialized ITS information into "script" keeps the file HTML valid. > In the original document, comments/PIs and usually attributes don't > inherit ITS metadata. However, as elements these nodes /do/ inherit > ITS metadata. So in order to make the ITS metadata in the new document > completely match the original, I have to undo this inheritance. This > is possible for categories with default values (I can set dir="ltr", > or localeFilter="*") but not for ones that don't. I could set > locNote="", but that is not the same as the default (which is the > non-existence of locNote). The affected categories are: localization > note, language information, domain, provenance, localization quality > issue, localization quality rating, MT confidence, and allowed characters. > > *Desired ITS markup: *The markup that I really need to make this work > is a selective metadata reset. In other words, I need to be able to > say "don't inherit the following categories". Something like > its:reset="mtConfidence;locNote;language-information". It would also > be useful to have a shorthand to prevent inheritance of anything: > its:reset="all". A global version would be nice, too. I think you can implement your use case in two ways: 1) When you convert to HTML, do not conversion of ITS information (i.e. your processing instructions or comments) into HTML, but rather convert the ITS markup (see above). An ITS 2 processor that understands HTML will then process your files without any issues. 2) If 1) doesn't work for you and you need to serialize the ITS information created in XML within the HTML file, choose an approach that does not require direct embedding of the ITS information inside HTML. Embedding the information via elements, btw., might not only break ITS inheritance (as you describe), but also other tools that check HTML (validators) or want to process it (inside or outside the browers). Best, Felix
Received on Wednesday, 7 August 2013 07:54:13 UTC