Digg RDFa SIOC - Part 3 - Threads, Users and Posts

If you take a look at the last part of the XHTML+RDFa and triples
generated for this URL:

http://rdfa.digitalbazaar.com/demos/digg/stemcells.html

You will notice three additional concepts for each story submitted to
Digg: Threads, Users and Posts.

Each Digg story is effectively an sioc:Thread concept, with each User
reply being an sioc:Post concept. Threads effectively contain Posts.
Each user could be marked up using FOAF, but I tried to keep things
simple and to see what could be done with pure SIOC and Dublin Core.

Each Thread entry looks something like the following:

<h3 id="title">
   <a href="http://www.technologyreview.com43/Biotech/210/"
      rel="dcterms:source" property="dcterms:title">
      At Last-Stem Cells without Side Effects!
   </a>
</h3>
<p>
   <em class="url">technologyreview.com</em>
   <span about="" typeof="sioc:Thread" property="dcterms:abstract">
      Researchers at Harvard University, the Harvard Stem Cell
      Institute, and the MGH Center for Regenerative Medicine have found
      a way to create healthy stem cells from adult cells--no embryo
      required--using an adenovirus.
   </span>
</p>
...
<a rel="dcterms:creator" rev="sioc:creator_of"
   href="http://digg.com/users/JusTuring">
   <div about="http://digg.com/users/JusTuring" typeof="sioc:User">
      <span rel="sioc:avatar">
         <img src="http://digg.com/users/JusTuring/s.png"
              alt="JusTuring" class="user-photo"
              height="16" width="16" />
      </span>
      <span property="sioc:name">JusTuring</span>
   </div>
</a>

Digg was already marking up the source and title and date, the markup
goes a bit further and states that the page is a Thread. It also marks
up the person that posted it as a sioc:User and identifies their avatar
(sioc:avatar) image and name (sioc:name). Here are the triples that are
generated from the markup above (<> is short-hand for "current page"):

<>
   dcterms:creator
      <http://digg.com/users/JusTuring> .
<http://digg.com/users/JusTuring>
   sioc:creator_of
      <> .
<>
   dcterms:source
      <http://www.technologyreview.com43/Biotech/210/> .
<>
   dcterms:title
      "At Last-Stem Cells without Side Effects!" .
<>
   rdf:type
      sioc:Thread .
<>
   dcterms:abstract
      "Researchers at Harvard University, the..." .
<http://digg.com/users/JusTuring>
   rdf:type
      sioc:User .
<http://digg.com/users/JusTuring>
   sioc:avatar
      <http://digg.com/users/JusTuring/s.png> .
<http://digg.com/users/JusTuring>
   sioc:name
      "JusTuring" .

Comments on each thread are marked up as sioc:Post concepts with markup
that looks like the following:

<li about="http://digg.com/posts/129387" typeof="sioc:Post"
    resource="" class="l0" id="c19182108">
   <div about="http://digg.com/users/VegasKill" typeof="sioc:User"
        rel="sioc:creator_of" resource="http://digg.com/posts/129387"
        property="sioc:name">VegasKill</div>,
   <a about="" rev="sioc:reply_to"
      href="http://digg.com/posts/129387">
         <span property="dcterms:dateSubmitted"
               content="2008-09-26T13:24:31+01:00">
            36 minutes ago
         </span>
   </a>, -3/+1
   <span property="sioc:content">
      Scripts ahoy. But otherwise progress made.
   </span>
</li>

I had to fudge the post URL a bit to make this a bit easier to read, but
basically, each post has an associated sioc:User who is the creator of
the post and in which the post is a reply to the parent thread. Each
post has a dcterms:dateSubmitted value as well as the content of the
post. This results in the following triples:

<http://digg.com/posts/129387>
   rdf:type
      sioc:Post .
<http://digg.com/posts/129387>
   dcterms:dateSubmitted
      "2008-09-26T13:24:31+01:00" .
<http://digg.com/posts/129387>
   sioc:content
      "Scripts ahoy. But otherwise progress made." .
<http://digg.com/posts/129387>
   sioc:reply_to
      <> .
<http://digg.com/users/VegasKill>
   rdf:type
      sioc:User .
<http://digg.com/users/VegasKill>
   sioc:creator_of
      <http://digg.com/posts/129387> .
<http://digg.com/users/VegasKill>
   sioc:name
      "VegasKill" .

All of this is a first cut and there are many more relationships that
you can mark up once this first cut is implemented. Some relationships,
such as number of Diggs/votes, don't exist in the SIOC vocabulary.
Perhaps somebody else knows about a vocabulary that has the concept of
social news digging/voting? If not, feel free to create the vocabulary
and put it online somewhere... perhaps after talking with some of the
folks from Newsvine, Reddit, Delicious.

There are other concepts in here that are arguable - such as the concept
that the page is an sioc:Thread and the sioc:creator_of of that
page/thread is the person that posted the story. A reasoning agent could
mistakenly assume that the sioc:creator_of the page is also the creator
of the ads and other items in the page, which isn't true. So, some care
should be taken in making these decisions. In other words, ask yourself
how the triples could be mis-understood by reasoning agents and UIs that
need to display the triples to a data mining agent.

Hope this helps, and please do ask questions if any of this doesn't make
sense :)

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.0 Website Launches
http://blog.digitalbazaar.com/2008/07/03/bitmunk-3-website-launches

Received on Monday, 29 September 2008 03:26:05 UTC