Re: Linked Data on qdos.com from Steve Harris on 2007-12-04 (semantic-web@w3.org from December 2007)

From: Steve Harris <steve.harris@garlik.com>
Date: Tue, 4 Dec 2007 12:30:01 +0000
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Damian Steer <damian.steer@hp.com>, Chris Bizer <chris@bizer.de>, semantic-web <semantic-web@w3.org>
Message-Id: <CA7922A5-951B-49F8-BFB6-6B902BBFBC92@garlik.com>
On 3 Dec 2007, at 18:18, Richard Cyganiak wrote:
>
> Steve,
>
> Thanks for the interesting background! I'm delighted to learn that  
> qdos does RDF not just on the outside, but also in the backend!

Yeah, though as you can tell, it's a mixed blessing!

Not 100% of the data is kept in RDF, the customer database (with  
crypted passwords and personal information is kept in SQL), and some  
intermediate score data is cached in SQL, but the users public data  
and the raw numbers that make up the score are all held in a SPARQL  
store. There's not that many triples currently - a few tens of millions.

The profile pages on the site and search results are all done in  
SPARQL - you can see the SPARQL query used to provide the search  
results if you look in the HTML source of a search results page -  
mmmm... debugging :)

> On 3 Dec 2007, at 11:03, Steve Harris wrote:
>>> At the moment, unfortunately, the URI scheme is broken. It uses  
>>> the same URI to identify two different things, a web document and  
>>> a person. Web documents are different things than the things  
>>> talked about in the document -- that's a fundamental axiom.
>>
>> Well, not exactly - the URI originally denoted the person, and the  
>> HTML profile page was http://qdos.com/profile?uri=... which was  
>> ugly, and the persons URI did not dereference to anything. So, I  
>> made the person's URI dereference to .../turtle if you Accept:'d  
>> application/x-turtle
>
> Okay, fine until here ...
>
>> and return some HTML if not.
>
> That's the crux -- Returning any content, be it HTML or RDF, makes a  
> URI identify a web document. And it cannot possibly identify a  
> person at the same time. That's the collision.

Yes, sure.

> 2. Using different URIs for person and document, and 303(!)- 
> redirecting from the person URI to the web page URI.

I think I'm going to do that - I can change the internal links to  
point to /html, so users should only get forwarded from old links they  
find lying around.

>> Consequently there's no statements about the page with <http://qdos.com/celeb/8340a9fc46297f805e66b6f9e89feb80 
>> > as the subject.
>>
>> Probably/possibly I should have made the html version .../html or  
>> something, but I didn't. I might change this in the future, but I  
>> want to avoid forwarding normal users all over the place.
>
> Yes, I agree -- redirecting normal users is not good.
>
> How about putting the HTML pages at <http://qdos.com/profile/ 
> 83...eb80> or something? This would be different from /celeb/, and  
> still be a nice and clean URI. Always use that URI when linking from/ 
> to HTML pages -- no need to redirect.

We also have /user/, so I'l go for adding /html on the end. It's not  
too ugly.

>>> DC date and creator refer to the SPARQL graph, not the HTML page.  
>>> The HTML is dynamically generated.
>
> Well, the date and creator certainly do not refer to the person, and  
> you shouldn't publish them attached to the person's URI.

Sure, but like I said they're conceptually attached to the graph, it's  
just that the graph and person have the same URI. I can't remove them,  
or attach them to some other URI. I could suppress them in the linked  
data using a FILTER in the CONSTRUCT, though they'll still be in the  
SPARQL.

I should have done:

<person/graph-uri> q:internalRecord [
    dc:date xxx ;
    ...
] .

But I didn't have much time to think about it, and it's not an easy  
thing to fix right now. I'll fix it at some point, but much of the  
data about people is manually edited, so it wont be fixed until  
there's some reason to change it.

>> You cannot be serious! There's no way I would go through and change  
>> every URI referring to a person in the RDF store behind a live  
>> system.
>
> Sorry -- I didn't know that the URIs come from an RDF store. I  
> naively assumed there was a database in the back, and the triples  
> were generated by a template.

I shouldn't have assumed that you knew what the internal architecture  
was! When your so deep in it things seem obvious sometimes :)

>> The URI of the graph is the same as the URI of the person. That's  
>> slightly odd, but we have a lot of queries like
>>
>>  GRAPH ?person {
>>    ?person ?prop ?obj .
>>    ...
>>  }
>>
>> That are used to stop queries spanning graphs. SPARQL doesn't have  
>> a concatenate function we could use to do
>>
>>  GRAPH concat(?person, "/graph") { ?person ... }
>>
>> or whatever.
>
> I see -- makes sense to me. The only problem is when you mix  
> metadata about the graph with data about the person in the public RDF.

Sure - it's definitely bad, I just don't see any way round it.

>> The turtle is just the output of a CONSTRUCT query.
>
> Ah, interesting. I like this.

Me too :) SPARQL makes things like this, and ultimately providing an  
API very easy. You already have a web-accessible, standards compliant  
access method to provide to 3rd parties.

Cheers,
    Steve

-- 
Steve Harris
Garlik Limited
2 Sheen Road
Richmond  TW9 1AE

T   +44(0)20 8973 2465
F   +44(0)20 8973 2301
www.garlik.com

Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10  
9AD
Received on Tuesday, 4 December 2007 12:30:39 UTC