- From: Sai <w3c@saizai.com>
- Date: Sun, 4 Sep 2011 23:16:26 -0400
- To: www-international@w3.org
Hi all. I found the W3C personal names draft via my activity in the G+ nymwars (e.g. https://plus.google.com/103112149634414554669/posts/WAu688n8JgZ and https://plus.google.com/103112149634414554669/posts/KGn5ezKLTC4). I'm thinking of making a simple website that, given a single Unicode string full name as input, outputs a best guess as to: a) the country(ies) of origin, language(s), gender, etc cultural properties of the name b) a logical-segment breakdown of the name (e.g. given name, patronymic, matronymic, generation name, etc) c) the variant forms of the name (e.g. formal, familiar, transliterated) for use in various situations like Mark Davis described This is much akin to the IBM commercial product http://www-01.ibm.com/software/data/infosphere/global-name-recognition/ — basically, I see the desire of programmers to chop up names or get them pre-chopped and would like to provide a model implementation that does it nonstupidly. Does anyone know where I can find some large, computer parsable, republishable databases of names from around the world, and/or would any of you be interested in helping with this? Also, in case you've not already seen it: http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ — not sure whether it's the sort of thing you'd link to in a public W3C spec (it's a bit snarky), but it's fairly incisive about this issue. Thanks, - Sai P.S. Personal investment in this issue: I'm mononymic. :-P
Received on Monday, 5 September 2011 13:07:54 UTC