Querying DBpedia for state flowers from Terry Brooks on 2009-06-11 (public-lod@w3.org from June 2009)

From: Terry Brooks <tabrooks@u.washington.edu>
Date: Thu, 11 Jun 2009 11:41:47 -0700
To: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <AAA761A2C05F5E4FAD4F0B6433BDD7110CA24EB5A7@sdc-mbx-01.exchange.washington.edu>

I'm preparing course material about querying DBpedia from a web page using Firefox and Greasemonkey, unpacking the payload received and patching the information into a web page.  My sample SPARQL query is for the state flowers of states of the United States, a query that is listed on the Meow meow meow blog at http://www.craigethomas.com/blog/2009/02/anatomy-of-a-sparql-query-part-1-select/  

Strategies for unpacking the payload are complicated by unpredictable structural irregularities of the payload.  I was wondering if someone could suggest an explanation, or point out explanatory documentation that I could provide my students.

Most of the states have a predictable XML payload that is structured like this:

    <result>
      <binding name="state">
        <uri>http://dbpedia.org/resource/Mississippi</uri>
      </binding>
      <binding name="flower">
        <uri>http://dbpedia.org/resource/Magnolia_Blossom</uri>
      </binding>
    </result>

But West Virginia's state flower is structured as a literal with an embedded HTML tag:

   <literal xml:lang="en">Rhododendron&lt;br&gt;(''Rhododendron maximum'')</literal>

And Florida's state flower listing contains escape characters:

  <uri>http://dbpedia.org/resource/Orange_%28fruit%29</uri>

There is also the general problem of multiple listings.  For example, California is listed with the California_Poppy twice.

What is an explanation for these structural irregularities?

Thanks, Terry


Terrence Brooks
Information School
University of Washington
Voice: 206 543-2646
Fax: 206 616-3152
E-mail: tabrooks@u.washington.edu
Web: http://faculty.washington.edu/tabrooks/

Received on Thursday, 11 June 2009 18:42:22 UTC