W3C home > Mailing lists > Public > whatwg@whatwg.org > January 2013

Re: [whatwg] Sentence structure

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 11 Jan 2013 22:59:41 +0000 (UTC)
To: "Vipul S. Chawathe" <Engineer@VipulSChawathe.ind.in>
Message-ID: <Pine.LNX.4.64.1301112252550.2101@ps20323.dreamhostps.com>
Cc: whatwg@lists.whatwg.org
On Sat, 12 Jan 2013, Vipul S. Chawathe wrote:
> 
> I'm doing some related work that requires machine translation on the lines
> of export/import HTML snippets. Human language content boundaries are
> directly determined by author's grammatical punctuation skills at the
> sentence level.

Sure, but if the author isn't competent enough to use punctuation, I think 
we're probably not going to be able to rely on them using <sentence> 
correctly either, at the end of the day.


> HTML is everything to-do tied-up with GUI web-browsers, so machine 
> translation, screen readers, & so forth are supported through other 
> "living" standards GRDDL XSLT RDFa that also work with HTML as one of 
> multiple possible host, however their relationship with XML 
> serialization as dependency for proper functioning might cause browser 
> engine makers to promote sticking to microdata, unless someday we get 
> Google SilverFlash.java Safari plug-in so that one size will fit all. As 
> HTML is host language in wide-spread use (my apologies for lacking 
> statistics that I compensate by deriving statements from common sense), 
> perhaps this is starting point for raising concerns that may be 
> redirected into other specs too. It's the only opening for those rare 
> use cases as the story of Emperor's New Clothes.
> Getting back to business, for larger content fragments there's the p 
> element. An immediate citation is search results cut-off abrupt 
> fragments in content preview. For improvising on such fragment indices 
> they've come up with schema.org vocab which I just had to remind here. 
> They've got provision to specialize from their general pre-defined 
> types, so Thing>WebPageElement can be used to get 
> Thing>WebPageElement>Paragraph>Sentence This can be expressed using 
> html5 microdata itemtype attribute as: <span itemscope="itemscope" 
> itemtype="http://www.schema.org/thing/webpage/webpageelement/paragraph/sente 
> nce">One whole sentence!</span> HTML5 without XML serialization will 
> allow to skip ="itemscope" too! saves 12 characters, savings comparable 
> to those recommended by minifying. :-)

I'm sorry, but I've no idea what you're saying here.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 11 January 2013 23:00:06 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 30 January 2013 18:48:12 GMT