W3C home > Mailing lists > Public > public-lod@w3.org > March 2009

Re: Parsing Freebase RDF

From: Giovanni Tummarello <g.tummarello@gmail.com>
Date: Sat, 14 Mar 2009 12:12:08 +0000
Message-ID: <210271540903140512pf3c2069t5f4209afe76ffa0e@mail.gmail.com>
To: Jamie Taylor <jamie@metaweb.com>
Cc: Seo Sanghyeon <sanxiyn@gmail.com>, public-lod@w3.org
Hi Jamie,

i see that your RDF per URI is more "expressive" than the "usual"

instead of just giving triples out of (or into) the subject of the
page you also give the description of other notable entities inside

for example in the blade runner movie you give the full description of
all the "film performances" (tying the real actor, the fictional
character and the movie).  Each film performance then has its URI
which is itself resolvable so  "in theory" to give the detail of the
"film performance" was not necessary, according to LOD, but in
practice its definitly useful as we know.

Would you know the rule by which you decide to put multiple entities
in the description that you give out?
this has important implications.

On the one hand if there was a simple rule, always the same, it makes
it easy for me to get your snapshot and index each URI rdf description
by applying this same rule (what we do for LOD datasets which simply
split "all the triples with subject or object X"). Else i can crawl
and do my things internally, under the assumption that what you are
providing are not a bunch of unrelated RDF files, but are really
"slices" of the same dataset.

to assert this is the case (and allow me to play more freely with the
information) it would be useful to have a semantic sitemap linked in
your robot.txt stating the URI of the dataset, with the name and the
prefix at which you're serving its content as LinkedData.

example sitemap. Here the "slicing" is set to "subject-object" in your
case i guess not setting it is the most appropriate option probably.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
    <sc:datasetLabel>Example Corp. Product Catalog</sc:datasetLabel>


in your case would it be technically simple to also provide an RDF dump?
"no its too time consuming" is a prefectly good answer :-) (which
means we have to live with it, e.g. by politely crawling)


On Fri, Mar 13, 2009 at 8:37 PM, Jamie Taylor <jamie@metaweb.com> wrote:
> Seo -
> Yes, this is a bug in the current LOD/RDF interface to Freebase.  I believe
> it is fixed in the upcoming release, which can be previewed at
> http://rdftest.mqlx.com/ns/en.blade_runner..
> I checked turtle output with:
> rapper -i turtle http://rdftest.mqlx.com/ns/en.blade_runner
> Please give this sandbox version of the interface a try.  I'm interested in
> feedback from others on the list as well.
> I hope to have the new version in production sometime next week.
> Jamie
> On Mar 10, 2009, at 10:31 PM, Seo Sanghyeon wrote:
>> Hello, new to the list,
>> I am trying to figure out how to use Freebase RDF service.
>> (See http://blog.freebase.com/2008/10/30/introducing_the_rdf_service/)
>> $ curl -L http://rdf.freebase.com/ns/en.blade_runner -o en.blade_runner
>> $ rdfproc freebase parse en.blade_runner turtle
>> It is Turtle, right? Above errors with:
>> rdfproc: Parsing URI
>> file:///home/tinuviel/devel/freebase/en.blade_runner with turtle
>> parser
>> rdfproc: Error - URI
>> file:///home/tinuviel/devel/freebase/en.blade_runner:2: The namespace
>> prefix in "http:" was not declared.
>> URI file:///home/tinuviel/devel/freebase/en.blade_runner:2 raptor
>> fatal error - turtle_qname_to_uri failed
>> rdfproc: Error - URI
>> file:///home/tinuviel/devel/freebase/en.blade_runner:2: syntax error
>> rdfproc: Failed to parse into the graph
>> rdfproc: The parsing returned 2 errors and 0 warnings
>> Help?
>> --
>> Seo Sanghyeon
Received on Saturday, 14 March 2009 12:12:50 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:15:55 UTC