W3C home > Mailing lists > Public > www-rdf-interest@w3.org > September 2003

Re: Need hint for parsing Open Directory RDF

From: Dave Beckett <dave.beckett@bristol.ac.uk>
Date: Mon, 22 Sep 2003 00:03:58 +0100
To: algermissen@acm.org
Cc: jalgermissen@topicmapping.com, "www-rdf-interest@w3.org" <www-rdf-interest@w3.org>
Message-Id: <20030922000358.07513c1e.dave.beckett@bristol.ac.uk>

On Sun, 21 Sep 2003 20:00:05 +0200
Jan Algermissen <jalgermissen@topicmapping.com> wrote:

> 
> Hello.
> 
> I am trying to parse the Open Directory structures example
> ( http://rdf.dmoz.org/rdf/structure.example.txt ) with raptor 1.0
> and get the following error:
> 
> "Using an element tag without a namespace is forbidden"
> "Literal property element Topic has property attributes"
> 
> The RDF starts like this:
> 
> =============================================
> <RDF xmlns:r="http://www.w3.org/TR/RDF/"
>      xmlns:d="http://purl.org/dc/elements/1.0/"  
>      xmlns="http://directory.mozilla.org/rdf">
> 
> <Topic r:id="Top">
>   <tag catid="1"/>
>   <d:Title>Top</d:Title>
>   <narrow r:resource="Top/Arts"/>
>    ....
> ==============================================
> 
> I tried <r:RDF> but that does not help. I assume that
> the RDF is ok, so could anyone help me to find out what
> I am doing wrong?

The first error message isn't very helpful here - I'll look into that.
The problem is that 'catid' attribute on element 'tag' has no namespace.
That makes this potential property element 'tag' broken.

The second error is about the node element 'Topic' and
r:id which is something in a namespace "http://www.w3.org/TR/RDF/"

Despite the name, this is not the RDF namespace URI.
which is http://www.w3.org/1999/02/22-rdf-syntax-ns#
(see documents since 1999).

So in fact the main error is this is not an RDF/XML document, since the
root element isn't RDF in the RDF namespace URI and none of the terms
are RDF ones.

I've looked at the dmoz data dumps before and it's pretty bad; there are
usually lots of XML and Unicode encoding problems millions of lines into
the file.   Don't try parsing that with a DOM :)

There are scripts out there to find and maybe fix the Unicode, XML and
RDF bugs.  You could try http://rainwaterreptileranch.org/steve/sw/odp/
to start.

Dave
Received on Sunday, 21 September 2003 19:05:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:52:02 GMT