- From: Ian Hickson <ian@hixie.ch>
- Date: Mon, 11 Jun 2012 21:58:49 +0000 (UTC)
- To: whatwg@lists.whatwg.org
- Message-ID: <Pine.LNX.4.64.1206112150120.378@ps20323.dreamhostps.com>
On Mon, 26 Mar 2012, Adam Barth wrote: > On Mon, Mar 26, 2012 at 3:17 PM, Ian Hickson <ian@hixie.ch> wrote: > > On Mon, 26 Mar 2012, Adam Barth wrote: > >> > >> WebKit recently implemented > >> http://www.whatwg.org/specs/web-apps/current-work/#attr-translate, > >> but that caused us to break orange.fr on mobile: > >> > >> https://bugs.webkit.org/show_bug.cgi?id=82246 > >> > >> The problem is that > >> http://www.winktoolkit.org/documentation/symbols/HTMLElement.html#translate > >> has a the following code: > >> > >> if (wink.isUndefined(HTMLElement.prototype.translate)) > >> HTMLElement.prototype.translate = HTMLElement.prototype.winkTranslate; > >> > >> The web site expects HTMLElement.prototype.translate to be Wink's > >> translate function rather than the HTML translate attribute. > >> > >> Would it make sense to change the name of the translate attribute to > >> avoid this conflict? Should we try to evangalize the Wink Toolkit to > >> change their code and everyone who uses Wink to update to the fixed > >> version? > > > > How widely used is it? (In particular, how widely used is .translate() > > rather than .winkTranslate()?) > > The documentation lists only .translate(), not .winkTranslate(), so I > would expect most folks using the library to use the former rather than > the latter. On Mon, 26 Mar 2012, Edward O'Connor wrote: > > > > It would be unfortunate to have to reserve the use of a name as > > generic as "translate" for a particular library. > > Indeed. That said, the name "translate" already means something in the > platform—it's used by CSS transforms and by the <canvas> 2D Context > API. Wink's usage of the term matches the existing use of the term on > the platform. > > Maybe we should rename the "is this element translatable or not" > attribute to, say, "translatable". On Tue, 27 Mar 2012, jerome.giraud@orange.com wrote: > > We had already planned on finding a replacement for our HTML Element > extensions and I think the current discussions will force us to speed > things up, which is a good thing :) > > This was a "legacy" feature that we decided we should get rid of a long > time ago for these obvious conflicts reasons, though we had never > imagined the "translate" would be used on HTMLElements in an i18n > context (so +1 for Edward O'Connors comment if I may) > > I already warned the persons in charge of the Orange portal and they > will replace the HTMLElement.translate calls. We will warn our users and > prepare the necessary changes for our next release. Since you are on top of this I have not changed the attribute name in the spec. Please do let me know if this ends up being a less tractable problem than it currently appears. On Wed, 2 May 2012, Charles Pritchard wrote: > > There has been some discussion on the w3c/whatwg mailing lists about how > far we can mark up content with linguistic tags, such as marking word > and/or sentence boundaries. > > In my authoring of web apps, I often write a short manual into a hidden > div, so that the vocabulary of my application can be processed by > translation services such as Google translate. Having content in the DOM > seems the most appropriate way to handle translation. > > I'd like the group to consider the costs/benefits/alternatives to a > "lang-" attribute. > Such as <span lang-role="sentence">This is a sentence.</span> > > The data- and aria- attributes have worked out well. We may want to make > room for one more. > > Such a structure could be used to markup typical subject/object/verb and > clause sections; it could also be used to markup poetic texts as well as > defined meanings of content. > > http://www.omegawiki.org/Expression:orange > This is an <span lang-meaning="DefinedMeaning:orange_(5821)">orange</span>. > Now this, this is <span > lang-meaning="DefinedMeaning:orange_(5822)">orange</span>. > > In most cases there's no need to define sentence boundary, meaning or > otherwise. But, it'd sure be nice to have the ability to do so in a > standard manner. > > I'd recommend role, meaning and prosody/pronunciation as the primary > targets. Character markup may be something to consider as it's come up > in SVG (rotate) and in CSS before. Doing a span for each character is > not practical, so we'd want a shorthand much as SVG has shorthand for > rotate. On Wed, 2 May 2012, Tab Atkins Jr. wrote: > > Do you expect outside services to do anything useful with this > information? If not, the data-* attributes seem appropriate. > > If you do expect that, have you evaluated the existing mechanisms for > embedding custom data in the page and found them wanting? If so, how? On Wed, 2 May 2012, Charles Pritchard wrote: > > Yes, that's the primary reason. "services such as Google translate". > > 1. Google translate gets a little loose with some markup, to where the > translated content may be placed outside the span tag. > > Such as: <div id="one">My potato is <span>hot</span></div>. > > 2. Some words can be ambiguous to the point that even a human reader may > not know what the meaning is. It'd be great to have a mechanism to > disambiguate. > > 3. Speech markup is cool, I like it, but we can have something a little > lighter or even have some interplay with prosody. > <span>You say <span>potato</span>, I say <span>potato</span></span>. > (poteitoe, potahtoe) > > 4. CSS markup has come up a few times for sentence, word and character > boundaries. Language is not static, it is very much human, and enabling > humans to markup their language is what HTML is all about. > > I'll put some effort in later this week to dig up a few threads on the > CSS requests. > > 5. Services should never touch data-*; I've had to put all my content > into markup anyway. I've had to add id attributes so I can identify it > when it's translated by the UA or other service. Since I've done all > that work, it'd be really nice to have some more options to add in, such > as disambiguation, part of speech and occasionally, pronunciation and > translation suggestions. On Wed, 2 May 2012, Benjamin Hawkes-Lewis wrote: > > I don't get how *any* of these are problems with the "existing > mechanisms for embedding custom data". > > 1. New features won't fix Google Translate bugs with existing features, > and it's more efficient for Google to fix Translate than for the > community to design, specify, and implement new features. > > 2, 3, and 4: Given an appropriate vocabulary, existing mechanisms can > encode unambiguous meanings, information about how text should be > spoken, and phrase and sentence boundaries. Unicode describes character > boundaries. > > 5. Tab isn't talking about "data-" here, but about all the various > mechanisms available to provide custom data for services to consume > (e.g. microdata, microformats, RDFa). On Wed, 2 May 2012, Charles Pritchard wrote: > > New features do allow services to coalesce around standards. That's what > the standards are here for. HTML5 just added a translate attribute. > > Span does not in and of itself signify any semantic meaning. Doesn't > that mean that Google Translate is operating correctly? > > [...] > > Boris brought up that the concept of letter could use some attention: > http://lists.w3.org/Archives/Public/www-style/2011Nov/0055.html > > Yes, we have existing XML mechanisms for text should be spoken. > > What existing mechanism do we have for disambiguation? > > [...] > > Tab asked directly why data- does not work > > Yes, we have a lot of microformats, it's true. And RDFa. > > They don't seem to be taking flight for these issues, and language > translation seems like a high level issue appropriate for HTML. Again, > look at the translate and lang attributes; those are baked into HTML. > > I am approaching the "lang-" proposal as language agnostic, much as > "aria-" is language agnostic. > > This seems to be where we are currently: > <img lang="es" translate="no" alt="No" /> > > With alt having ARIA counterparts. > > I'm suggesting a "lang-" with counterparts to translate, language code, > and a vastly enhanced vocabulary, much as ARIA vastly enhanced the UI > vocabulary. I think it could help in the long run. On Wed, 2 May 2012, Benjamin Hawkes-Lewis wrote: > [...] > > Moving text in or out of an element that "mean[s] something on its own" > (as the spec puts it) has potential to break things. But that's also > true, if less so, for an element that "doesn't mean anything on its > own". There might be code (clientside JS, CSS selectors, XPointer URIs, > automation scripts, whatever) that depends on that text being inside or > outside that element at that position in the DOM. > > That's not to say that Google Translate is operating incorrectly. > Translation inevitably changes the DOM. Text node contents change of > course. Because different languages may express the same ideas in > different orders, DOM nodes may need to be reordered. Because different > languages have different practices around compounding or implying ideas > with different numbers of words, what might be a separate word in a > separate element in one language might need to be merged into another > word outside the element, or vice versa. It's not obvious that there is > a correct behavior here, and I struggle to see how the markup examples > you proposed would help. (Perhaps you could elaborate?) Researching and > recommending authoring practices that make translation less likely to > break code might be a more immediately fruitful line of enquiry, and > might help inform the ultimate creation of a vocabulary fit for purpose. > > But more importantly, assuming such a vocabulary could be created, this > is not a reason why it could not be embedded using the existing > mechanisms. The HTML specification is not the only source of > standardized vocabulary on the web. > > [...] > > 1. If you're only using the data yourself, why not data-? > > 2. If you want other people to use the data, why not the other > mechanisms for custom data embedding? > > Your 5 points appeared to be in answer to his second question, because > you placed them as a list in response to it. > > [...] > > That's just you choosing to use something _other_ than the existing > mechanisms; it's not a reason why you could not use them. > > I'm baffled why you think defining an RDF vocabulary then requiring host > languages to closely couple their specs to your spec with a set of > arbitrary and confusing syntactical and behavioural requirements is > preferable to just defining a vocabulary and letting host languages > embed it however they like. I would certainly caution against further > integrations with HTML along the ARIA model, having seen the pain it's > caused. > > I'd suggest instead that the small number of authors interested in this > markup get together and use and develop vocabularies that can be > embedded in HTML or XML using microdata or RDFa. You will probably make > lots of mistakes and learn a lot along the way. If at the end of the > day, you've got robust vocabularies that solve problems for more authors > and sees non-microscopic levels of adoption, then they could be pulled > into the mainstream language just as class="nav" got pulled in as <nav> > and class="datetime" got pulled in as <time>. > > Proposing that we conjure such a vocabulary out of the air to solve a > wide set of mostly unanalysed problems in the absence of documented > workarounds and then reify that vocabulary in a load of specific > features seems to me to put the cart way before the horse. On Thu, 3 May 2012, Silvia Pfeiffer wrote: > > In one of my companies, we've successfully used <span>, @class and > @data-xxx attributes to support linguistic markup. See > http://www.eopas.org/transcripts/70 for an example (you will need to > agree to a research license checkbox to link through). > > Here's a markup excerpt: > > <div class="051-004_w morphemes tier"> > <span> > <table class="word"> > <tbody><tr> > <td colspan="1"> > <span class="concordance" data-addr="/p4/w1" data-language-code="erk" > data-search="Maarik" data-type="word"> > Maarik > </span> > </td></tr><tr> > <td class="morpheme"> > <span class="concordance" data-addr="/p4/w1/m1" > data-language-code="erk" data-search="maarik" data-type="morpheme"> > maarik > </span> > </td> > </tr> > <tr> > <td class="gloss">mister</td> > </tr> > </tbody></table> > </span> > > It supports multiple levels of linguistic semantic markup: > * phrase > * word > * morpheme > * gloss > > If you wanted to make a standard for what levels should be marked up in > which way for linguistic data, you'd first have to get the linguistic > researchers to agree on the required feature-set. Then you could > standardise e.g. data-lang-xxx attributes - or even make up new > linguistic-xxx attributes . > http://www.whatwg.org/specs/web-apps/current-work/#extensibility > describes how to do that. Given the existence of solution that can address this already, I haven't added anything to the spec to support it. If it turns out that a lot of people do this, then it would make sense to examine whether we should have dedicated markup for it. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 11 June 2012 21:59:19 UTC