- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Mon, 16 Mar 2009 09:26:36 -0400
- CC: Jamie Taylor <jamie@metaweb.com>, public-lod@w3.org
Richard Cyganiak wrote: > On 16 Mar 2009, at 09:21, Rob Styles wrote: > >> This is an interesting question and one which we've been thinking >> about here at Talis as well. >> >> As we build linked data apps, with a view to the linked data being >> used as an api for other applications, we've thought that it is worth >> putting more into the response, typically we try to put everything >> you'd need to recreate the HTML representation. > > Yes, I think that's an excellent approach. > >> When you say it has important implications, can you expand on those? >> I had been thinking it was harmless. As I see it a client that >> expects only a DESCRIBE ?s should simply ignore the additional data >> provided, whereas clients that are crawling and merging into a graph >> will find they already have things as they expand what they know about. > > The main implication of choosing a less regular pattern is that others > cannot accurately re-create the linked data view from an RDF dump of > the dataset. For example, Sindice will index your dataset from your > RDF dump if you publish one and announce it through a semantic > sitemap. But Sindice will assume that each of your linked data > documents only contains the immediate surrounding triples of the > described resource. If you have additional triples in there, Sindice > will not know it because that fact is not visible from just looking at > the dump. The consequence is that searching in Sindice will sometimes > miss one of your documents even if it contains all the right > keywords/URIs. > > But that shouldn't affect how you publish your linked data, after all > the dumps are merely an optimization that allows easy bulk processing > of your linked data. > >> I can see that understanding what is likely to come back has big >> optimisation benefits for things like Sindice. > > Yes. > >> What is the 'correct' thing to do? > > For your linked data, you're doing the correct thing. > > If you produce RDF dumps, and you want Sindice and others to be able > to re-produce the structure of your linked data documents with maximum > fidelity, then consider producing quad dumps in N-Quads format [1] > instead of straight RDF dumps. Jamie, I would expect Quad Dumps to actually be quite natural for Freebase, right? Kingsley > > Best, > Richard > > [1] http://sw.deri.org/2008/07/n-quads/ > > > > > >> >> >> rob >> >> >> >> >> On 14 Mar 2009, at 12:12, Giovanni Tummarello wrote: >> >>> Hi Jamie, >>> >>> i see that your RDF per URI is more "expressive" than the "usual" >>> >>> instead of just giving triples out of (or into) the subject of the >>> page you also give the description of other notable entities inside >>> >>> for example in the blade runner movie you give the full description of >>> all the "film performances" (tying the real actor, the fictional >>> character and the movie). Each film performance then has its URI >>> which is itself resolvable so "in theory" to give the detail of the >>> "film performance" was not necessary, according to LOD, but in >>> practice its definitly useful as we know. >>> >>> Would you know the rule by which you decide to put multiple entities >>> in the description that you give out? >>> this has important implications. >>> >>> On the one hand if there was a simple rule, always the same, it makes >>> it easy for me to get your snapshot and index each URI rdf description >>> by applying this same rule (what we do for LOD datasets which simply >>> split "all the triples with subject or object X"). Else i can crawl >>> and do my things internally, under the assumption that what you are >>> providing are not a bunch of unrelated RDF files, but are really >>> "slices" of the same dataset. >>> >>> to assert this is the case (and allow me to play more freely with the >>> information) it would be useful to have a semantic sitemap linked in >>> your robot.txt stating the URI of the dataset, with the name and the >>> prefix at which you're serving its content as LinkedData. >>> >>> example sitemap. Here the "slicing" is set to "subject-object" in your >>> case i guess not setting it is the most appropriate option probably. >>> >>> <?xml version="1.0" encoding="UTF-8"?> >>> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" >>> >>> xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd"> >>> <sc:dataset> >>> <sc:datasetLabel>Example Corp. Product Catalog</sc:datasetLabel> >>> <sc:datasetURI>http://example.com/catalog.rdf#catalog</sc:datasetURI> >>> <sc:linkedDataPrefix >>> slicing="subject-object">http://example.com/products/</sc:linkedDataPrefix> >>> >>> <sc:sampleURI>http://example.com/products/widgets/X42</sc:sampleURI> >>> <sc:sparqlEndpointLocation >>> slicing="subject-object">http://example.com/sparql</sc:sparqlEndpointLocation> >>> >>> >>> <sc:dataDumpLocation>http://example.com/data/catalogdump.rdf.gz</sc:dataDumpLocation> >>> >>> >>> <changefreq>weekly</changefreq> >>> </sc:dataset> >>> </urlset> >>> >>> in your case would it be technically simple to also provide an RDF >>> dump? >>> "no its too time consuming" is a prefectly good answer :-) (which >>> means we have to live with it, e.g. by politely crawling) >>> >>> Giovanni >>> >>> On Fri, Mar 13, 2009 at 8:37 PM, Jamie Taylor <jamie@metaweb.com> >>> wrote: >>>> Seo - >>>> >>>> Yes, this is a bug in the current LOD/RDF interface to Freebase. I >>>> believe >>>> it is fixed in the upcoming release, which can be previewed at >>>> http://rdftest.mqlx.com/ns/en.blade_runner.. >>>> >>>> I checked turtle output with: >>>> rapper -i turtle http://rdftest.mqlx.com/ns/en.blade_runner >>>> >>>> Please give this sandbox version of the interface a try. I'm >>>> interested in >>>> feedback from others on the list as well. >>>> >>>> I hope to have the new version in production sometime next week. >>>> >>>> Jamie >>>> >>>> On Mar 10, 2009, at 10:31 PM, Seo Sanghyeon wrote: >>>> >>>>> Hello, new to the list, >>>>> >>>>> I am trying to figure out how to use Freebase RDF service. >>>>> (See >>>>> http://blog.freebase.com/2008/10/30/introducing_the_rdf_service/) >>>>> >>>>> $ curl -L http://rdf.freebase.com/ns/en.blade_runner -o >>>>> en.blade_runner >>>>> $ rdfproc freebase parse en.blade_runner turtle >>>>> >>>>> It is Turtle, right? Above errors with: >>>>> >>>>> rdfproc: Parsing URI >>>>> file:///home/tinuviel/devel/freebase/en.blade_runner with turtle >>>>> parser >>>>> rdfproc: Error - URI >>>>> file:///home/tinuviel/devel/freebase/en.blade_runner:2: The namespace >>>>> prefix in "http:" was not declared. >>>>> URI file:///home/tinuviel/devel/freebase/en.blade_runner:2 raptor >>>>> fatal error - turtle_qname_to_uri failed >>>>> rdfproc: Error - URI >>>>> file:///home/tinuviel/devel/freebase/en.blade_runner:2: syntax error >>>>> rdfproc: Failed to parse into the graph >>>>> rdfproc: The parsing returned 2 errors and 0 warnings >>>>> >>>>> Help? >>>>> >>>>> -- >>>>> Seo Sanghyeon >>>>> >>>>> >>>> >>>> >>>> >>>> >>> >> >> Rob Styles >> tel: +44 (0)870 400 5000 >> fax: +44 (0)870 400 5001 >> mobile: +44 (0)7971 475 257 >> msn: mmmmmrob@yahoo.com >> irc: irc.freenode.net/mmmmmrob,isnick >> web: http://www.talis.com/ >> blog: http://www.dynamicorange.com/blog/ >> blog: http://blogs.talis.com/panlibus/ >> blog: http://blogs.talis.com/nodalities/ >> blog: http://blogs.talis.com/n2/ >> Please consider the environment before printing this email. >> >> Find out more about Talis at www.talis.com >> shared innovationTM >> >> Any views or personal opinions expressed within this email may not be >> those of Talis Information Ltd or its employees. The content of this >> email message and any files that may be attached are confidential, >> and for the usage of the intended recipient only. If you are not the >> intended recipient, then please return this message to the sender and >> delete it. Any use of this e-mail by an unauthorised recipient is >> prohibited. >> >> Talis Information Ltd is a member of the Talis Group of companies and >> is registered in England No 3638278 with its registered office at >> Knights Court, Solihull Parkway, Birmingham Business Park, B37 7YB. >> > > > -- Regards, Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen President & CEO OpenLink Software Web: http://www.openlinksw.com
Received on Monday, 16 March 2009 13:27:12 UTC