Here is one way you could generate a list of programming languages
Look at the bottom of
http://en.wikipedia.org/wiki/D_programming_language
and you see categories like “C Programming Language Family” and then if you look at
http://en.wikipedia.org/wiki/Category:C_programming_language_family
you’ll see that is a member of category
http://en.wikipedia.org/wiki/Category:Programming_language_families
by traversing this graph you can find categories that contain programming languages and programming languages. All of the category links are in DBpedia so this is straightforward to do.
The great thing is you can seed this with a query that gets partial results; for instance, you can use your search for “programming language” in the name. To be fair you’ll need to put some human effort into this. You’ll find some categories that turn up that are wrong, and probably get some items like “Generics in Java” and “Dennis Richie”. Still my experience is that I can create categories of 10,000 or so things (like “things in new york city that don’t have coordinates” or “things related to sex and drugs”) in a few hours of work. It’s helpful to sort results with a subjective importance score so at least you can see the worst outliers. (At one point I got Hillary Clinton as the top “sex” topic, for instance, because she was the victim of adultery. It’s quite interesting that the perpetrator of adultery didn’t get flagged...)
The graph traversal has a similar structure to Kleinberg’s hubs and authorities algorithm and there’s probably some way to assign scores to the nodes that are related to probability of a topic or category being in the set.
Note also that Freebase has a programming language type, see
http://www.freebase.com/view/en/fortran
and you could get a list of programming languages there and then map the id’s back to DBpedia.