- From: Vipul S. Chawathe <Engineer@VipulSChawathe.ind.in>
- Date: Sat, 12 Jan 2013 02:23:18 +0530
- To: "'Ian Hickson'" <ian@hixie.ch>
- Cc: whatwg@lists.whatwg.org
>From: Ian Hickson [mailto:ian@hixie.ch] > >On Thu, 10 Jan 2013, Thomas A. Fine wrote: >> >> Use Cases: >> 4. Clarifying sentence boundaries would be an aid in machine >> translation software. >Do you have any evidence supporting this? I've spoken with engineers who work on machine translation software and while they've certainly had requests (whence the "translate" attribute), they've never asked for a way to mark up sentences. I'm doing some related work that requires machine translation on the lines of export/import HTML snippets. Human language content boundaries are directly determined by author's grammatical punctuation skills at the sentence level. HTML is everything to-do tied-up with GUI web-browsers, so machine translation, screen readers, & so forth are supported through other "living" standards GRDDL XSLT RDFa that also work with HTML as one of multiple possible host, however their relationship with XML serialization as dependency for proper functioning might cause browser engine makers to promote sticking to microdata, unless someday we get Google SilverFlash.java Safari plug-in so that one size will fit all. As HTML is host language in wide-spread use (my apologies for lacking statistics that I compensate by deriving statements from common sense), perhaps this is starting point for raising concerns that may be redirected into other specs too. It's the only opening for those rare use cases as the story of Emperor's New Clothes. Getting back to business, for larger content fragments there's the p element. An immediate citation is search results cut-off abrupt fragments in content preview. For improvising on such fragment indices they've come up with schema.org vocab which I just had to remind here. They've got provision to specialize from their general pre-defined types, so Thing>WebPageElement can be used to get Thing>WebPageElement>Paragraph>Sentence This can be expressed using html5 microdata itemtype attribute as: <span itemscope="itemscope" itemtype="http://www.schema.org/thing/webpage/webpageelement/paragraph/sente nce">One whole sentence!</span> HTML5 without XML serialization will allow to skip ="itemscope" too! saves 12 characters, savings comparable to those recommended by minifying. :-)
Received on Friday, 11 January 2013 20:52:19 UTC