- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Thu, 3 May 2012 09:53:25 +1000
On Thu, May 3, 2012 at 2:59 AM, Charles Pritchard <chuck at jumis.com> wrote: > There has been some discussion on the w3c/whatwg mailing lists about how far > we can mark up content with linguistic tags, such as marking word and/or > sentence boundaries. > > In my authoring of web apps, I often write a short manual into a hidden div, > so that the vocabulary of my application can be processed by translation > services such as Google translate. Having content in the DOM seems the most > appropriate way to handle translation. > > I'd like the group to consider the costs/benefits/alternatives to a "lang-" > attribute. > Such as <span lang-role="sentence">This is a sentence.</span> > > The data- and aria- attributes have worked out well. We may want to make > room for one more. > > Such a structure could be used to markup typical subject/object/verb and > clause sections; it could also be used to markup poetic texts as well as > defined meanings of content. > > http://www.omegawiki.org/Expression:orange > This is an <span lang-meaning="DefinedMeaning:orange_(5821)">orange</span>. > Now this, this is <span > lang-meaning="DefinedMeaning:orange_(5822)">orange</span>. > > In most cases there's no need to define sentence boundary, meaning or > otherwise. But, it'd sure be nice to have the ability to do so in a standard > manner. > > I'd recommend role, meaning and prosody/pronunciation as the primary > targets. Character markup may be something to consider as it's come up in > SVG (rotate) and in CSS before. Doing a span for each character is not > practical, so we'd want a shorthand much as SVG has shorthand for rotate. > > -Charles Hi Charles, In one of my companies, we've successfully used <span>, @class and @data-xxx attributes to support linguistic markup. See http://www.eopas.org/transcripts/70 for an example (you will need to agree to a research license checkbox to link through). Here's a markup excerpt: <div class="051-004_w morphemes tier"> <span> <table class="word"> <tbody><tr> <td colspan="1"> <span class="concordance" data-addr="/p4/w1" data-language-code="erk" data-search="Maarik" data-type="word"> Maarik </span> </td></tr><tr> <td class="morpheme"> <span class="concordance" data-addr="/p4/w1/m1" data-language-code="erk" data-search="maarik" data-type="morpheme"> maarik </span> </td> </tr> <tr> <td class="gloss">mister</td> </tr> </tbody></table> </span> It supports multiple levels of linguistic semantic markup: * phrase * word * morpheme * gloss If you wanted to make a standard for what levels should be marked up in which way for linguistic data, you'd first have to get the linguistic researchers to agree on the required feature-set. Then you could standardise e.g. data-lang-xxx attributes - or even make up new linguistic-xxx attributes . http://www.whatwg.org/specs/web-apps/current-work/#extensibility describes how to do that. Hope this helps. Cheers, Silvia.
Received on Wednesday, 2 May 2012 16:53:25 UTC