W3C home > Mailing lists > Public > public-lod@w3.org > March 2009

Re: Parsing Freebase RDF

From: Rob Styles <rob.styles@talis.com>
Date: Mon, 16 Mar 2009 09:21:09 +0000
Message-ID: <511E67DF-9CAF-4583-9525-AA33D62326A5@talis.com>
To: <giovanni.tummarello@deri.org>, "Jamie Taylor" <jamie@metaweb.com>, "Seo Sanghyeon" <sanxiyn@gmail.com>, <public-lod@w3.org>
This is an interesting question and one which we've been thinking  
about here at Talis as well.

As we build linked data apps, with a view to the linked data being  
used as an api for other applications, we've thought that it is worth  
putting more into the response, typically we try to put everything  
you'd need to recreate the HTML representation.

When you say it has important implications, can you expand on those? I  
had been thinking it was harmless. As I see it a client that expects  
only a DESCRIBE ?s should simply ignore the additional data provided,  
whereas clients that are crawling and merging into a graph will find  
they already have things as they expand what they know about.

I can see that understanding what is likely to come back has big  
optimisation benefits for things like Sindice.

What is the 'correct' thing to do?


On 14 Mar 2009, at 12:12, Giovanni Tummarello wrote:

> Hi Jamie,
> i see that your RDF per URI is more "expressive" than the "usual"
> instead of just giving triples out of (or into) the subject of the
> page you also give the description of other notable entities inside
> for example in the blade runner movie you give the full description of
> all the "film performances" (tying the real actor, the fictional
> character and the movie).  Each film performance then has its URI
> which is itself resolvable so  "in theory" to give the detail of the
> "film performance" was not necessary, according to LOD, but in
> practice its definitly useful as we know.
> Would you know the rule by which you decide to put multiple entities
> in the description that you give out?
> this has important implications.
> On the one hand if there was a simple rule, always the same, it makes
> it easy for me to get your snapshot and index each URI rdf description
> by applying this same rule (what we do for LOD datasets which simply
> split "all the triples with subject or object X"). Else i can crawl
> and do my things internally, under the assumption that what you are
> providing are not a bunch of unrelated RDF files, but are really
> "slices" of the same dataset.
> to assert this is the case (and allow me to play more freely with the
> information) it would be useful to have a semantic sitemap linked in
> your robot.txt stating the URI of the dataset, with the name and the
> prefix at which you're serving its content as LinkedData.
> example sitemap. Here the "slicing" is set to "subject-object" in your
> case i guess not setting it is the most appropriate option probably.
> <?xml version="1.0" encoding="UTF-8"?>
> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
>        xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd 
> ">
>  <sc:dataset>
>    <sc:datasetLabel>Example Corp. Product Catalog</sc:datasetLabel>
>    <sc:datasetURI>http://example.com/catalog.rdf#catalog</ 
> sc:datasetURI>
>    <sc:linkedDataPrefix
> slicing="subject-object">http://example.com/products/</ 
> sc:linkedDataPrefix>
>    <sc:sampleURI>http://example.com/products/widgets/X42</ 
> sc:sampleURI>
>    <sc:sparqlEndpointLocation
> slicing="subject-object">http://example.com/sparql</ 
> sc:sparqlEndpointLocation>
>    <sc:dataDumpLocation>http://example.com/data/catalogdump.rdf.gz</ 
> sc:dataDumpLocation>
>    <changefreq>weekly</changefreq>
>  </sc:dataset>
> </urlset>
> in your case would it be technically simple to also provide an RDF  
> dump?
> "no its too time consuming" is a prefectly good answer :-) (which
> means we have to live with it, e.g. by politely crawling)
> Giovanni
> On Fri, Mar 13, 2009 at 8:37 PM, Jamie Taylor <jamie@metaweb.com>  
> wrote:
>> Seo -
>> Yes, this is a bug in the current LOD/RDF interface to Freebase.  I  
>> believe
>> it is fixed in the upcoming release, which can be previewed at
>> http://rdftest.mqlx.com/ns/en.blade_runner..
>> I checked turtle output with:
>> rapper -i turtle http://rdftest.mqlx.com/ns/en.blade_runner
>> Please give this sandbox version of the interface a try.  I'm  
>> interested in
>> feedback from others on the list as well.
>> I hope to have the new version in production sometime next week.
>> Jamie
>> On Mar 10, 2009, at 10:31 PM, Seo Sanghyeon wrote:
>>> Hello, new to the list,
>>> I am trying to figure out how to use Freebase RDF service.
>>> (See http://blog.freebase.com/2008/10/30/introducing_the_rdf_service/)
>>> $ curl -L http://rdf.freebase.com/ns/en.blade_runner -o  
>>> en.blade_runner
>>> $ rdfproc freebase parse en.blade_runner turtle
>>> It is Turtle, right? Above errors with:
>>> rdfproc: Parsing URI
>>> file:///home/tinuviel/devel/freebase/en.blade_runner with turtle
>>> parser
>>> rdfproc: Error - URI
>>> file:///home/tinuviel/devel/freebase/en.blade_runner:2: The  
>>> namespace
>>> prefix in "http:" was not declared.
>>> URI file:///home/tinuviel/devel/freebase/en.blade_runner:2 raptor
>>> fatal error - turtle_qname_to_uri failed
>>> rdfproc: Error - URI
>>> file:///home/tinuviel/devel/freebase/en.blade_runner:2: syntax error
>>> rdfproc: Failed to parse into the graph
>>> rdfproc: The parsing returned 2 errors and 0 warnings
>>> Help?
>>> --
>>> Seo Sanghyeon

Rob Styles
tel: +44 (0)870 400 5000
fax: +44 (0)870 400 5001
mobile: +44 (0)7971 475 257
msn: mmmmmrob@yahoo.com
irc: irc.freenode.net/mmmmmrob,isnick
web: http://www.talis.com/
blog: http://www.dynamicorange.com/blog/
blog: http://blogs.talis.com/panlibus/
blog: http://blogs.talis.com/nodalities/
blog: http://blogs.talis.com/n2/
Please consider the environment before printing this email.

Find out more about Talis at www.talis.com 

shared innovationTM

Any views or personal opinions expressed within this email may not be those of Talis Information Ltd or its employees. The content of this email message and any files that may be attached are confidential, and for the usage of the intended recipient only. If you are not the intended recipient, then please return this message to the sender and delete it. Any use of this e-mail by an unauthorised recipient is prohibited.

Talis Information Ltd is a member of the Talis Group of companies and is registered in England No 3638278 with its registered office at Knights Court, Solihull Parkway, Birmingham Business Park, B37 7YB.
Received on Monday, 16 March 2009 09:21:46 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 15:15:55 UTC