W3C home > Mailing lists > Public > whatwg@whatwg.org > January 2011

[whatwg] sic element, was: Re: Exposing spelling/grammar suggestions in contentEditable

From: timeless <timeless@gmail.com>
Date: Mon, 3 Jan 2011 13:42:09 +0200
Message-ID: <AANLkTikxmHJBZO+Vf4Y8Nn8R_otCALg_s5N64tM4hRms@mail.gmail.com>
On Mon, Jan 3, 2011 at 10:41 AM, Markus Ernst <derernst at gmx.ch> wrote:
>> Would search engines benefit from markup for this?
>
> They could actually benefit, if the correct spelling would be added in an
> attribute, so they could match the misspelled word with a correctly spelled
> search term; somehow like:
> <sic correct="choose">chuse</sic>

I doubt it. At this point search engines have better spell checking
and translation support than just about anything else, since they have
and use the largest corpora in the world.

http://www.google.com/search?q=chuse
http://www.usconstitution.net/constmiss.html

I can't get the cooler versions of this to work, but...

http://www.google.com/search?q=we+chuse
applies yields a search for we choose, and completion offers: we
choose to be bounded
"we chuse" suggests "we choose to go to the moon"

It's hard to do a useful search for this stuff,

searching for "But in chusing the President" or "But in choosing the
President" generally yields the same documents, e.g.
http://www.usconstitution.net/const.html

The reason of course is that the document has:
Article II - The Executive Branch
(... then from the five highest on the List the said House shall in
like Manner chuse the President. But in chusing the President, the
Votes shall be taken by States, ...) (This clause in parentheses was
superseded by the 12th Amendment.)

Amendment 12 - Choosing the President, Vice-President. Ratified
6/15/1804. Note History The Electoral College
... the House of Representatives shall choose immediately, by ballot,
the President. But in choosing the President, the votes shall be taken
by states, ...

I think in general search engines do not reward points for attributes
or similar things, because on average they're used more by spammers
than by hammers (if everything you have looks like a nail).

Adding sic's is nowhere near as valuable or useful as adding
replacement language or footnotes or end paragraph notes. sic's merely
interrupt reading.

FWIW, this morning I spent some time searching for "serveral", i
eventually settled on something like:
http://www.google.com/codesearch?hl=en&start=20&sa=N&q=serveral+-serveralert+-serveralias+-serverallowsresponsecachingforrequest+-serveralreadyactive+-serveralreadyregistered+-openssh+-serverallowsspawn

It isn't particularly useful, it turns out (using another search
engine) there was a version of tracker which had a message with this
misspelled, but as of 0.7.25 only outdated translations remember:
/tracker-0.7.25/po/pt_BR.po
    line 2008 -- #~ "serveral minutes\n"
/tracker-0.7.25/po/sk.po
    line 1988 -- #~ "serveral minutes\n"
/tracker-0.7.25/po/sv.po
    line 2658 -- #~ "serveral minutes\n"
/tracker-0.7.25/po/be at latin.po
    line 1519 -- #~ "serveral minutes\n"
/tracker-0.7.25/po/ca.po
    line 1522 -- #~ "serveral minutes\n"
/tracker-0.7.25/po/ko.po
    line 739 -- "serveral minutes\n"

Now, you might want to find "several minutes" in tracker, but afaict,
that code doesn't exist, so you wouldn't have been searching for this
that way in the first place.
Received on Monday, 3 January 2011 03:42:09 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:29 UTC