Querying DBpedia for state flowers

I'm preparing course material about querying DBpedia from a web page using Firefox and Greasemonkey, unpacking the payload received and patching the information into a web page.  My sample SPARQL query is for the state flowers of states of the United States, a query that is listed on the Meow meow meow blog at http://www.craigethomas.com/blog/2009/02/anatomy-of-a-sparql-query-part-1-select/  

Strategies for unpacking the payload are complicated by unpredictable structural irregularities of the payload.  I was wondering if someone could suggest an explanation, or point out explanatory documentation that I could provide my students.

Most of the states have a predictable XML payload that is structured like this:

    <result>
      <binding name="state">
        <uri>http://dbpedia.org/resource/Mississippi</uri>
      </binding>
      <binding name="flower">
        <uri>http://dbpedia.org/resource/Magnolia_Blossom</uri>
      </binding>
    </result>

But West Virginia's state flower is structured as a literal with an embedded HTML tag:

   <literal xml:lang="en">Rhododendron&lt;br&gt;(''Rhododendron maximum'')</literal>

And Florida's state flower listing contains escape characters:

  <uri>http://dbpedia.org/resource/Orange_%28fruit%29</uri>

There is also the general problem of multiple listings.  For example, California is listed with the California_Poppy twice.

What is an explanation for these structural irregularities?

Thanks, Terry


Terrence Brooks
Information School
University of Washington
Voice: 206 543-2646
Fax: 206 616-3152
E-mail: tabrooks@u.washington.edu
Web: http://faculty.washington.edu/tabrooks/

Received on Thursday, 11 June 2009 18:42:22 UTC