GRDDL Primer

This is a first pass at the Guitar review senario GRDDL Primer.
http://suda.co.uk/sandbox/GRDDL/Primer.htm

Any suggestions/advice/ideas are more than welcome. I've never written
a primer, so i'm not exactly sure what needs to be in/out. Some of the
SPAQRL examples need to be created, but if there is something you
don't understand or i need to explain more, please let me know.
Hopefully, wednesday we can discuss this futher. On a side note, i am
also trying to get all the software installed on a webserver to
actually replicate what is being described.

Primer: Using GRDDL & Microformats to Aggregating data

Stephan wishes to buy a guitar, so decides to check reviews. There are
various special interest publications online which feature musical
instrument reviews. There are also blogs which contain reviews by
individuals. Among the reviewers there may be friends of Stephan,
people whose opinion Stephan values (e.g. well-known musicians and
people whose reviews Stephan has found useful in the past). There may
also be reviews planted by instrument manufacturers which offer very
biased views.

First, Steven needs to get a list of people he considers trusted
sources into some sort of machine readable document. FoaF and
vCard-RDF are both suitable sources to extract the data from. The
question is how to get these values? Microformats define to simple
formats which can easily convert between HTML and RDF through the use
of GRDDL. To extract a vCard-RDF from HTML you can use
(hCard2vcardrdf.xsl ???) which will transform an hCard encoded HTML
document.

<address class="vcard" id="smith-stephan">
<a href="http://example.org" class="fn url">Stephan Smith</a>
</address>

This snippit of HTML is converted into RDF with the use of the XSLT

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF
  xmlns:rdf  ="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#"
>	
  <rdf:Description rdf:about="http://example.org/">
	<vCard:FN>Stephan Smith</vCard:FN>
	<vCard:URL>http://example.org/</vCard:URL>
 </rdf:Description>
</rdf:RDF>

Another microformat that allows for more information to be gleaned
from the document is XFN. XFN is the XHTML friends network. Using
values in the rel attribute it is possible to assert the types of
relationships between the site owner and their friends, colleagues,
co-workers, etc. Since XFN values are found on 'a' elements, this
gives us another resource to follow and look for more hCards and more
XFN values. This allows for use to modify the circle of trust from our
direct friends to first-order friends of our friends.

<ul>
	<li><a href="http://" rel="met friend collegue">Peter Smith</a></li>
	<li><a href="http://" rel="met">John Doe</a></li>
	<li><a href="http://" rel="met">Paul Revere</a></li>
</ul>

Given a seed URL with XFN data, a GRDDL transformation can extract
FoaF data about all of these people. That FoaF file will then give us
an additional list of URLs that can be spidered for additional GRDDL
vCard-RDF data about each friend.

Another property in XFN is 'me' which is used for identification
consolidation. With this value it is possible to say that the data
over on site 1 is also me and should be considered as if it were from
the my own site. This allows us to extend our ability to use different
resources. For instance:

<ul>
	<li><a href="http://del.icio.us/guitar-rocker45" rel="me">My
Del.icio.us Link</a></li>
	<li><a href="http://claimid.com/guitar-rocker" rel="me">Me on ClaimID</a></li>
	<li><a href="http://guitar-rocker.com" rel="me">I love guitars</a></li>
</ul>

The power of the rel="me" and the identity consolidation is that it
allows use to glean data from multiple sources and merge it all into a
single RDF document about a single individual. The Del.icio.us links
could be encoded into RDF and associated with a user
"guitar-rocker45", but because of the rel="me" and any reciprocal to
"example.org" assertions can be made that the bookmarks have an owner
"Stephan Smith" who has an RDF-vCard at "example.org" and has data in
other places on other services such as claimid.com and
guitar-rocker.com. All of these can be merged to form a bigger picture
of "Stephan Smith" at "example.org"

On the Guitar site, there are product reviews for each guitar. The
guitars are also marked-up with microformats so it is possible to
extract machine-readable data about each item. Along with manufacturer
data, each member of the site can also leave feedback about the item
in the form of a review.

Stephan's friend Peter Smith has written several reviews of a new
guitars. Each review has a link to the reviewer, which in this case is
a link back to Peter's profile page on the guitar site. We know that
the profile page is Stephan's friend Peter by visual inspection, but a
machine does not. Luckily, on Peter's profile page on the guitar site,
it allows him to link back to his own personal site. This link has a
rel="me" value. Now a machine can assert that the Peter on the Guitar
site, is the same Peter that is listed in Stephan's XFN list, which
was converted to FoaF, because the URLs resolve to the same resource.

With all of these tools it is possible to find Stephan's friends and
to find additional resources that we know those friends created. Using
GRDDL is it possible to glean information about the guitar in the form
of product specifications supplied by the manufacture and reviews from
site members. Once we have this data as RDF it can be passed into a
SPARQL engine and queries can be run on it.

If Stephan was looking for a Guitar in a specific price range, by a
certain manufacturer, a with specific review rating or higher, from a
selected group of friends, we now have enough data in RDF to do just
that.

EXAMPLE SPARQL QUERY HERE

The first restriction on the data can be a pass on manufacturer data
such as price, type, etc. Once we have all the matching guitars, we
can then restricted based on Stephan's friends' reviews. Using a
seeded list of XFN URLs given by Stephan that are converted to FoaF,
we can match the URLS to any URL from the vCard-RDF generated from the
profile pages of the guitar members pages. Now we have a list of
members that Stephan Trusts relative to the guitar site. We might also
get a list of reviews that those trusted members have written. We can
then execute a UNION on that original data restricted on Manufacturer
specs, and the data from Stephan's friends reviews. The resulting set
is a SPARQL result matching our original question.

EXAMPLE SPARQL QUERY HERE

This SPARQL result is in XML or JSON and can easily be consumed by
another application. This can display the results on screen, email
them to Stephan or it can be pulled into another application to search
the web for the best prices on the short list of guitars.

Brian Suda,
$Id: Primer.html,v 0.01 2006/09/04 $

-- 
brian suda
http://suda.co.uk

Received on Monday, 4 September 2006 22:42:18 UTC