W3C home > Mailing lists > Public > public-rdfa@w3.org > September 2008

Digg RDFa SIOC - Part 1 - Overview

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Sun, 28 Sep 2008 12:18:48 -0400
Message-ID: <48DFAE68.4060307@digitalbazaar.com>
To: RDFa Community <public-rdfa@w3.org>

What follows is an explanation covering some example markup that I've
done for Digg. This is a first cut and needs to be refined. It captures
some basic concepts on the site and shows how RDFa can be integrated
into Digg's existing HTML with relative ease. I made sure to not change
the existing structural layout - RDFa should be integrated into Digg
as-is, no major re-work necessary.

The validating XHTML file can be found here:

http://rdfa.digitalbazaar.com/demos/digg/stemcells.html

Here's the process that I went through when adding RDFa to Digg's page:

1. Set the DTD and HTML version to "XHTML+RDFa 1.0".
2. Cleaned up 94 validation errors on the page. The page now validates
   cleanly as XHTML+RDFa 1.0 with minor changes to structural layout[1].
   These pages can then be served to IE6/7 with minor changes to the
   Apache configuration.
3. Read through a decent bit of the SIOC[2] (Semantically Interlinked
   Online Communities) vocabulary to see what concepts it supported
   and whether or not it mapped cleanly to Digg's site concepts.
4. Started adding SIOC to Digg's page using Fuzzbot[3] in Firefox 3 on
   Debian Linux to debug the triples that were being generated. This
   whole process made it clear that this advanced stage of editing
   pre-existing sites is nearly impossible without proper debug tools.
   I got to a point where, even though I know the RDFa processing rules
   and have written a conforming parser for RDFa, I was not sure as to
   what the output would be and resorted to trial-and-error at points -
   depending on Fuzzbot to tell me which triples were actually being
   generated. I was wrong about which triples I thought should be
   emitted and which ones actually were emitted more often than I'd like
   to admit.

Overall, the process took around 3 hours. The page uses a combination of
dcterms and sioc to express the following concepts:

* The Digg Community (sioc:Community)
* The various sections and sub-sections of Digg Politics, Technology,
  Technology/Apple, etc. (sioc:Forum)
* Digg stories (sioc:Thread)
* Digg users (sioc:User)
* Digg replies (sioc:Post)

Various attributes for each of these concepts, such as avatar image,
relationship between posts and threads, who created each post and when
it was submitted, are also marked up. There will be follow-up e-mails
detailing each concept described above in more detail.

-- manu

[1]http://rdfa.digitalbazaar.com/demos/digg/stemcells.html
[2]http://sioc-project.org/ontology
[3]http://rdfa.digitalbazaar.com/fuzzbot/

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.0 Website Launches
http://blog.digitalbazaar.com/2008/07/03/bitmunk-3-website-launches
Received on Sunday, 28 September 2008 16:19:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 28 September 2008 16:19:40 GMT