- From: timeless <timeless@gmail.com>
- Date: Mon, 3 Jan 2011 13:42:09 +0200
On Mon, Jan 3, 2011 at 10:41 AM, Markus Ernst <derernst at gmx.ch> wrote: >> Would search engines benefit from markup for this? > > They could actually benefit, if the correct spelling would be added in an > attribute, so they could match the misspelled word with a correctly spelled > search term; somehow like: > <sic correct="choose">chuse</sic> I doubt it. At this point search engines have better spell checking and translation support than just about anything else, since they have and use the largest corpora in the world. http://www.google.com/search?q=chuse http://www.usconstitution.net/constmiss.html I can't get the cooler versions of this to work, but... http://www.google.com/search?q=we+chuse applies yields a search for we choose, and completion offers: we choose to be bounded "we chuse" suggests "we choose to go to the moon" It's hard to do a useful search for this stuff, searching for "But in chusing the President" or "But in choosing the President" generally yields the same documents, e.g. http://www.usconstitution.net/const.html The reason of course is that the document has: Article II - The Executive Branch (... then from the five highest on the List the said House shall in like Manner chuse the President. But in chusing the President, the Votes shall be taken by States, ...) (This clause in parentheses was superseded by the 12th Amendment.) Amendment 12 - Choosing the President, Vice-President. Ratified 6/15/1804. Note History The Electoral College ... the House of Representatives shall choose immediately, by ballot, the President. But in choosing the President, the votes shall be taken by states, ... I think in general search engines do not reward points for attributes or similar things, because on average they're used more by spammers than by hammers (if everything you have looks like a nail). Adding sic's is nowhere near as valuable or useful as adding replacement language or footnotes or end paragraph notes. sic's merely interrupt reading. FWIW, this morning I spent some time searching for "serveral", i eventually settled on something like: http://www.google.com/codesearch?hl=en&start=20&sa=N&q=serveral+-serveralert+-serveralias+-serverallowsresponsecachingforrequest+-serveralreadyactive+-serveralreadyregistered+-openssh+-serverallowsspawn It isn't particularly useful, it turns out (using another search engine) there was a version of tracker which had a message with this misspelled, but as of 0.7.25 only outdated translations remember: /tracker-0.7.25/po/pt_BR.po line 2008 -- #~ "serveral minutes\n" /tracker-0.7.25/po/sk.po line 1988 -- #~ "serveral minutes\n" /tracker-0.7.25/po/sv.po line 2658 -- #~ "serveral minutes\n" /tracker-0.7.25/po/be at latin.po line 1519 -- #~ "serveral minutes\n" /tracker-0.7.25/po/ca.po line 1522 -- #~ "serveral minutes\n" /tracker-0.7.25/po/ko.po line 739 -- "serveral minutes\n" Now, you might want to find "several minutes" in tracker, but afaict, that code doesn't exist, so you wouldn't have been searching for this that way in the first place.
Received on Monday, 3 January 2011 03:42:09 UTC