Re: [Dbpedia-discussion] How do I consistently query dbpedia for programming languages by name?

Here is one way you could generate a list of programming languages

Look at the bottom of

http://en.wikipedia.org/wiki/D_programming_language

and you see categories like “C Programming Language Family” and then if you look at

http://en.wikipedia.org/wiki/Category:C_programming_language_family

you’ll see that is a member of category

http://en.wikipedia.org/wiki/Category:Programming_language_families

by traversing this graph you can find categories that contain programming languages and programming languages. All of the category links are in DBpedia so this is straightforward to do.

The great thing is you can seed this with a query that gets partial results; for instance, you can use your search for “programming language” in the name. To be fair you’ll need to put some human effort into this. You’ll find some categories that turn up that are wrong, and probably get some items like “Generics in Java” and “Dennis Richie”. Still my experience is that I can create categories of 10,000 or so things (like “things in new york city that don’t have coordinates” or “things related to sex and drugs”) in a few hours of work. It’s helpful to sort results with a subjective importance score so at least you can see the worst outliers.  (At one point I got Hillary Clinton as the top “sex” topic, for instance, because she was the victim of adultery. It’s quite interesting that the perpetrator of adultery didn’t get flagged...)

The graph traversal has a similar structure to Kleinberg’s hubs and authorities algorithm and there’s probably some way to assign scores to the nodes that are related to probability of a topic or category being in the set.

Note also that Freebase has a programming language type, see

http://www.freebase.com/view/en/fortran

and you could get a list of programming languages there and then map the id’s back to DBpedia.

Received on Wednesday, 31 October 2012 08:25:30 UTC