- From: Felix Sasaki <fsasaki@w3.org>
- Date: Thu, 08 Aug 2013 09:16:51 +0200
- To: Nathan Glenn <garfieldnate@gmail.com>
- CC: public-i18n-its-ig@w3.org
- Message-ID: <520345E3.3000901@w3.org>
Hi Nathan, Am 08.08.13 02:27, schrieb Nathan Glenn: > Thanks for the terminology fix (I've been wondering how to say that > properly)! Thanks also for the pointer to Kosek's converters. They do > look useful, however they aren't suitable for my application. Kosek's > converters are for converting between XHTML and HTML5, but I need to > convert general XML to valid HTML (for the purpose of later displaying > ITS information with the document). Jirka's converter is available for general XML as part of the docbook stylesheets, see documentation at http://xmlguru.cz/2013/05/docbook-and-its2 and the relevant stylesheet at http://snapshots.docbook.org/xsl-ns/html/its.xsl So I think you can use that in an XSLT transformation to implement your use case - without creating invalid HTML. > This means creating legal HTML representations of illegal HTML that > has ITS info attached (e.g. turning attributes into elements and > sticking them in the parent for display). The output HTML has to be > processable by general ITS/HTML tools (or at least the one we're > making is supposed to be general), so I can't use a homemade standoff > markup. I think the requirement " The output HTML has to be processable by general ITS/HTML tools " is easier to accomodate by using a general XML converter than by changing the ITS2.x spec itself. And again: your approach of "turning attributes into elements and sticking them in the parent for display" is just one tool specific approach to "flatten" ITS information - there may be many others. > I'm just documenting that my solution as partial for now. > I'll see about adding this feature suggestion to the wiki. Thanks. When you do that, could you add a link to this thread http://lists.w3.org/Archives/Public/public-i18n-its-ig/2013Aug/0001.html ? Best, Felix > > Nathan > > > > On Tue, Aug 6, 2013 at 11:53 PM, Felix Sasaki <fsasaki@w3.org > <mailto:fsasaki@w3.org>> wrote: > > Hi Nathan, > > thank you very much for your comment. A general remark: if you or > others propose new features for ITS 2.x, I would suggest that you > add them to the interest group wiki > http://www.w3.org/International/its/wiki > in order to do so you will need a w3c account, see > http://www.w3.org/Help/Account/ > > Some comments below. > > Am 06.08.13 20:51, schrieb Nathan Glenn: >> Hello, >> In the course of the WICS project I have found need for a >> mechanism to undo metadata inheritance for a given element. I'd >> like to suggest it for the next version of ITS. >> >> *Use case:* I am converting ITS-decorated XML into HTML while >> keeping ITS markup intact. > > I think above sentence is something else than you are actually > doing: what you describe below is not "keeping ITS markup intact", > but keeping the ITS *information* (= a serialization of ITS > information that has been created via local markup / global rules > / inheritance defaults, like e.g. in the test suite > https://github.com/finnle/ITS-2.0-Testsuite/ITS-2.0-Testsuite/its2.0/expected/idvalue/html/idvalue1htmloutput.txt > ) intact. > "keeping ITS markup intact" during a conversion from XML to HTML > is relatively simple: convert local its:* attributes to its-*. > Jirka Kosek has created a set of stylesheets that realize > roundtripping HTML <> XML > https://github.com/kosek/html5-its-tools > Global rules need to be adapted manually. But this should not be a > big issue: normally you will not create one rule set per XML > document but rather per XML format (or per template in a given > format). > > >> Any non-element nodes that are somehow involved with ITS markup >> (attributes selected via global rules, comments/PIs etc. pointed >> to with *Pointers) are being converted into elements and pasted >> near the nodes they represent. > > In your case, you realize "near the nodes" as "converted into > child elements". But nobody forces you to store the ITS > information that way. You could also have a tool specific > mechanism that converts the ITS markup from XML into ITS > information in HTML into something like this: > > <html>... > <script type=application/xml> > <itsInfos> > <itsInfo idref="#p1"> > <itsMetadata translate="yes" locNote="..:" ...></itsMetadata> > <attribute name="id"> > <itsMetadata idValue="p1" translate="no"/> > </attribute> > </itsInfo> > </itsInfo> > </script> > </head> > <body> > <p id="p1">...</p> > </body></html> > > > The "p" element contains an "id" attribute. Inside "script" there > is an "itsInfo" element. Its "idref" attribute points to the > idvalue. In this way the serialized ITS information is not inline, > but "standoff". Note that this is a tool specific standoff > mechanism, not ITS 2.0 standoff like for provenance or > localization quality issue. The advantage compared to your > approach is that putting the serialized ITS information into > "script" keeps the file HTML valid. > > >> In the original document, comments/PIs and usually attributes >> don't inherit ITS metadata. However, as elements these nodes /do/ >> inherit ITS metadata. So in order to make the ITS metadata in the >> new document completely match the original, I have to undo this >> inheritance. This is possible for categories with default values >> (I can set dir="ltr", or localeFilter="*") but not for ones that >> don't. I could set locNote="", but that is not the same as the >> default (which is the non-existence of locNote). The affected >> categories are: localization note, language information, domain, >> provenance, localization quality issue, localization quality >> rating, MT confidence, and allowed characters. >> >> *Desired ITS markup: *The markup that I really need to make this >> work is a selective metadata reset. In other words, I need to be >> able to say "don't inherit the following categories". Something >> like its:reset="mtConfidence;locNote;language-information". It >> would also be useful to have a shorthand to prevent inheritance >> of anything: its:reset="all". A global version would be nice, too. > > > I think you can implement your use case in two ways: > > 1) When you convert to HTML, do not conversion of ITS information > (i.e. your processing instructions or comments) into HTML, but > rather convert the ITS markup (see above). An ITS 2 processor that > understands HTML will then process your files without any issues. > > 2) If 1) doesn't work for you and you need to serialize the ITS > information created in XML within the HTML file, choose an > approach that does not require direct embedding of the ITS > information inside HTML. Embedding the information via elements, > btw., might not only break ITS inheritance (as you describe), but > also other tools that check HTML (validators) or want to process > it (inside or outside the browers). > > Best, > > Felix > >
Received on Thursday, 8 August 2013 07:17:22 UTC