RDFa Primer 1.0

1 Purpose and Preliminaries

Current web pages, written in HTML, contain significant inherent structured data: calendar events, contact information, photo captions, song titles, copyright licensing information, etc. When publishers can express this data precisely, and when tools can read it robustly, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites. An event on a web page can be directly imported into a desktop calendar. A license on a document can be detected to inform the user of his rights automatically. A photo's creator, camera setting information, resolution, and topic can be published as easily as the original photo itself.

RDFa lets an HTML author express this structured data using extra attributes. Where the data is already present on the page, e.g. the photo's caption, the HTML+RDFa author need not repeat it. RDFa uses RDF, which lets any publisher extend existing vocabularies or create new ones altogether. This is confusing and scary to somebody that hasn't heard about RDF - they came here to learn about RDFa, not about how it came to be or what RDF does.

RDFa builds upon the Resource Description Framework (RDF), but the reader of this document is not expected to understand RDF: if you can write XHTML, you can write XHTML+RDFa. Changed slightly to be a little less technical and more welcoming. What you said was very accurate, but most people don't know the difference between HTML and XHTML.

We note that RDFa makes use of XML namespaces. In this document, we assume, for simplicity's sake, that the following namespaces are defined: dc for Dublin Core, foaf for FOAF, cc for Creative Commons, and xsd for XML Schema Definitions:

dc: http://purl.org/dc/elements/1.1/
foaf: http://xmlns.com/foaf/0.1/
cc: http://web.resource.org/cc/
xsd: http://www.w3.org/2001/XMLSchema

2 Simple Data: Publishing Events and Contacts

Jo blogs about her work, which involves web development.

2.1 The Basic HTML

Jo has an upcoming talk at the XTech Conference, on May 8th at 10am, where she will be discussing "web widgets". She posts an announcement of her talk on her blog. at http://jo-blog.example.org/. Her blog also includes her contact information (Jo has a fantastic spam filter, so she is unafraid of publishing her email address):

I know there has been a great deal of work that has been put into this example, but perhaps we should be picking an example that would resonate more with everybody? Normal folks don't speak at conferences and work as web developers, so this example might alienate the general blogging population. Perhaps we could use another example, like planning a large picnic or BBQ for a group of friends? Something that people around the world do, regardless of culture or economic background?

We should show what the text would look like before jumping into the HTML.

<html>
    <head><title>Jo's Blog</title></head>
    <body>
...
    <p>
        I'm giving a talk at the XTech Conference about web widgets, on May 8th at 10am.
    </p>
...
    <p class="contactinfo">
        My name is Jo Smith. I'm a distinguished web engineer
        at
        <a href="http://example.org">
            Example.org
        </a>.
        You can contact me
        <a href="mailto:jo@example.org">
            via email
        </a>.
    </p>
...
    </body>
</html>

This short piece of mark-up is already full of contains important structured data.

The markup describes an event: a talk that Jo is giving. This event starts at 10am on May 8th. A summary of the event is "a talk at XTech 2007 on web widgets." We also have contact information for Jo: she works for the organization Example.org use a better organization name, like WebWare, Inc., with job title of "Distinguished Web Engineer." She can be contacted at the email address "jo@example.org." remember to change the e-mail address if you change the company name =P

At the moment, it is very difficult for software — like web browsers and search engines — to make use of this information data's implicit structure. We need a standard mechanism to explicitly express it, so that it can be extracted consistently. This is precisely what RDFa does. where RDFa comes in.

2.2 Publishing An Event

Jo would like to add some structure to this blog entry so that could we use a term like 'tag' to make this more accessible? "...Jo would like to tag this blog entry so that..." readers might be able to add her talk directly to their calendar. RDFa allows her to express this structure using a small handful of extra attributes. Since this is a calendar event, Jo will specifically use the iCal vocabulary [ICAL-RDF] to denote the data's structure.

The first step is to reference the iCal vocabulary within the HTML page, so that a parser may know where to look up the vocabulary terms: can we say "...so that a web browser can understand events..." - focus on what we're enabling without focusing on the technical details of what we're doing.

<html xmlns:cal="http://www.w3.org/2002/12/cal/ical#">
    ...

then, Jo declares a new event: We should think about color coding the sections in the HTML that are important to highlight - in this case, it is the text that we're adding to the HTML. It makes picking out the important bits much easier.

 <p instanceof="cal:Vevent"> ... </p>

Note how the instanceof attribute is used here to define the kind of the data being expressed. (If Jo wanted to declare multiple types, she could include more than one value in the instanceof attribute with space separation.) The use of this attribute on the P element ensures that, by default, data expressed on contained elements inside this element refers to the same event. Thus, inside this event declaration, Jo can set up the event fields, reusing the existing HTML. For example, the event summary can be added like this: declared as:

       I'm giving <span property="cal:summary">
         a talk at the XTech Conference about web widgets
       </span>,

The property attribute on the span element declares a data field attached to event declared by instanceof in the enclosing P. Take a step back and re-read that previous sentence... while technically accurate, I think it won't make sense to your run-in-the-mill blogger. Note how the existing rendered content, "a talk at the XTech Conference about web widgets", is the value of this field. Sometimes, this isn't the desired effect. Specifically, the start time of the event should be rendered displayed nicely — "May 8th" —, but should also likely be represented in an easy, a machine-parsable way, the standard iCal format: 20070508T1000+0200 you've probably lost the bloggers at this point, they might be thinking "What in the hell does that big number with the T in the middle of it mean?" In this case, the markup needs only a slight modification:

       <span property="cal:dtstart" content="20070508T1000+0200">
          May 8th at 10am
       </span>

In this case, the actual content of the span element, "May 8th at 10am", is ignored for structured data purposes: it has been replaced by the explicit content attribute. The full markup is then: Make this less technical sounding - am I annoying you yet :) - truth is, I would've written this the same exact way and we don't want to make it so simple that it could be easily misunderstood.

<html xmlns:cal="http://www.w3.org/2002/12/cal/ical#">
    <head><title>Jo's Blog</title></head>
    <body>
...
    <p instanceof="cal:Vevent">
        I'm giving
        <span property="cal:summary">
            a talk at the XTech Conference about web widgets
        </span>,
        on 
        <span property="cal:dtstart" content="20070508T1000+0200">
            May 8th at 10am
        </span>.
    </p>
...
    </body>
</html>

Note that Jo could have used any other HTML element, not just span, to carry the structure of mark up her event data. In other words, when the structure of the data event information is already laid out in the HTML using elements such as h1, em, div, etc..., Jo can simply add the property attribute, and optionally the content attribute, to mark up the event. indicate the specific structure.

(For the RDF-inclined reader, the RDF triples that correspond to the above markup are available in Section ???.)

END OF EDITS ----- END OF EDITS ----- END OF EDITS ----- END OF EDITS ----- END OF EDITS ----- END OF EDITS

2.3 Publishing Contact Information

Now that Jo has published an event using structured data, she realizes there is much data on her blog that she can mark up in the same way. Her contact information, in particular, is an easy target for structured markup with RDFa:

...
    <p class="contactinfo">
        My name is Jo Smith. I'm a distinguished web engineer
        at
        <a href="http://example.org">
            Example.org
        </a>.
        You can contact me 
        <a href="mailto:jo@example.org">
            via email
        </a>.
    </p>
...

Jo discovers the vCard RDF vocabulary [VCARD-RDF], which she adds to her existing page. Since Jo thinks of vCards as a way to publish her contact information, she uses the prefix contact to designate this vocabulary. Note that, although Jo already imported the iCal vocabulary, adding the vCard vocabulary is just as easy and does not interfere:

<html xmlns:cal="http://www.w3.org/2002/12/cal/ical#"
      xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#">
...

Jo then sets up her vCard using RDFa, by deciding that the P with class set to contactinfo will be the container for her vcard. She notes, however, that the vCard schema does not require declaring a vCard type. Instead, it is recommended that a vCard refer to a web page that identifies the individual. Jo thus uses RDFa's special attribute about for just for this purpose, indicating that all contained HTML pertains to Jo's designated URI. Note how the about attribute is inherited from parent elements in the HTML: the about attribute on the nearest ancestor applies to declared structured data.

...
    <p class="contactinfo" about="http://example.org/staff/jo">
        ...everything here pertains to http://example.org/staff/jo...
    </p>
...

"Simple enough!" Jo realizes. She adds her first vCard fields: name, title, organization and email.

...
    <p class="contactinfo" about="http://example.org/staff/jo">
        My name is
        <span property="contact:fn">
            Jo Smith
        </span>.
        I'm a
        <span property="contact:title">
            distinguished web engineer
        </span>
        at
        <a rel="contact:org" href="http://example.org">
            Example.org
        </a>.
        You can contact me
        <a rel="contact:email" href="mailto:jo@example.org">
            via email
        </a>.
    </p>
...

Notice how Jo was able to use the rel attribute directly within the anchor tag for designating her organization and email address. In this case, the rel indicates a relationship between the current URI, designated by about, and the target URI, designated by href. The type of relationship is defined by the rel. In this case, contact:org indicates a relationship of type "vCard organization", while contact:email indicates a relationship of type "vCard email".

Note how, for simplicity's sake, we have slightly abused the vCard vocabulary above: vCard technically requires that the type of the email address be specified, e.g. work or home email. In Section 4.3 Layered Data, we show how rel can be used without a corresponding href, in order to create subresources and provide the correct markup for expressing a true vCard.

2.4 The Complete HTML with RDFa

Jo's complete HTML with RDFa is thus:

<html xmlns:cal="http://www.w3.org/2002/12/cal/ical#"
      xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#">
...
    <p instanceof="cal:Vevent">
        I'm giving
        <span property="cal:summary">
            a talk at the XTech Conference about web widgets
        </span>,
        on
        <span property="cal:dtstart" content="20070508T1000+0200">
            May 8th at 10am
        </span>.
    </p>
...
    <p class="contactinfo" about="http://example.org/staff/jo">
        My name is
        <span property="contact:fn">
            Jo Smith
        </span>.
        I'm a
        <span property="contact:title">
            distinguished web engineer
        </span>
        at
        <a rel="contact:org" href="http://example.org">
            Example.org
        </a>.
        You can contact me 
        <a rel="contact:email" href="mailto:jo@example.org">
            via email
        </a>.
    </p>
...

Note how, if Jo changes her email address link, her organization, or the title of her talk, the RDFa approach will automatically pick up these changes in the marked up, structured data. The only places where this doesn't happen is when the content attribute must override the rendered content, which is inevitable when the human-rendered data and the machine-readable data must differ.

(Once again, the RDF-inclined reader will want to consult the resulting RDF triples 4 RDF Correspondence.)

2.5 Working Within a Fragment of the HTML

What if Jo does not have complete control over the HTML of her blog? For example, she may be using a templating system which makes it particularly difficult to add the vocabularies in the html element at the top of her page without adding it to every page on her site. Or, she may be using a web provider that doesn't allow her to change the header of the page to begin with.

Fortunately, RDFa uses standard XML namespaces, which means that the vocabularies can be imported "locally" to an HTML element. Jo's HTML blog page could express the exact same structured data with the following markup:

<html>
...
    <p instanceof="cal:Vevent"
       xmlns:cal="http://www.w3.org/2002/12/cal/ical#">
        I'm giving
        <span property="cal:summary">
            a talk at the XTech Conference about web widgets
        </span>,
        on
        <span property="cal:dtstart" content="20070508T1000+0200">
            May 8th at 10am
        </span>.
    </p>
...
    <p class="contactinfo" about="http://example.org/staff/jo"
       xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#">
        My name is
        <span property="contact:fn">
            Jo Smith
        </span>.
        I'm a
        <span property="contact:title">
            distinguished web engineer
        </span>
        at
        <a rel="contact:org" href="http://example.org">
            Example.org
        </a>.
        You can contact me 
        <a rel="contact:email" href="mailto:jo@example.org">
            via email
        </a>.
    </p>
...

Of course, just like in the case of the vocabularies defined on the top-level html tag, more than one vocabulary can be imported into any element. In this case, each p only needs one vocabulary: the first uses iCal, the second uses vCard. This approach helps with the desired ability to copy-and-paste HTML from one page to another: the closer the namespace declarations to their relevant statements, the easier it is to copy and paste the content safely.

3 Advanced Concepts: Custom Vocabularies, Document Fragments, Complex Data, ...

RDFa can do much more than the simple examples described above. In this section, we explore some of its advanced capabilities. We consider:

the creation of a custom vocabulary,
the use of precise datatypes,
the description of resources beyond the current web page, and
the definition and annotation of "subresources".

3.1 Creating a Custom Vocabulary and Using Compact URIs

All field names and data types in RDFa are URIs, e.g. http://purl.org/dc/elements/1.1/title is the "Dublin Core title" field. In RDFa, we often use compact versions of those URIs, by defining a prefix using XML namespaces, and using the prefixed notation to designate the URI. This helps keep the markup short and clean:

<div xmlns:dc="http://purl.org/dc/elements/1.1/">
   <span property="dc:title">Yowl</span>,
   created by
   <span property="dc:creator">Mark Birbeck</span>.
</div>

Because concepts are simply URIs, it is trivial to create one's own vocabulary: simply mint new URIs in a domain you control, and use them in RDFa markup.

Consider a (fictional) photo management web site called Shutr, whose web site is http://www.shutr.net. Users of Shutr can upload their photos at will, annotate them, organize them into albums, and share them with the world. They can choose to keep these photos private, or make them available for public consumption under licensing terms of their choosing.

Shutr chooses to mark up its photos with RDFa, so that client-side tools may be able to extract information automatically. Some concepts, such as dc:title, dc:date, etc. can be clearly reused from the Dublin Core vocabulary, but other concepts, such as lens settings, camera model, and other photographer parameters, may need to be defined from scratch. For this purpose, Shutr defines a vocabulary namespace URI:

http://shutr.net/vocab/1.0/

Shutr can then publish terms such as http://shutr.net/vocab/1.0/takenWithCamera, http://shutr.net/vocab/1.0/aperture, etc.

3.2 Qualifying Other Documents and Document Chunks

Shutr may choose to present many photos in a given HTML page. In particular, at the URI http://www.shutr.net/user/markb/album/12345, all of the album's photos will appear inline. Structured data about each photo can be included simply by specifying an about attribute, which indicates the resource that fields refer to within that HTML element.

<ul>
  <li> <img src="/user/markb/photo/23456_thumbnail" />,
    <span about="/user/markb/photo/23456" property="dc:title">
      Sunset in Nice
    </span>
  </li>

  <li> <img src="/user/markb/photo/34567_thumbnail" />,
    <span about="/user/markb/photo/34567" property="dc:title">
      W3C Meeting in Mandelieu
    </span>
  </li>
</ul>

This same approach applies to statements with URI objects. For example, each photo in the album has a creator and may have its own usage license. We can use the convenient inheritance of the about attribute to refer to the photo once, and add as many fields as we need:

<ul>
  <li about="/user/markb/photo/23456">
    
    <img src="/user/markb/photo/23456_thumbnail" />,
    
    <span property="dc:title">
      Sunset in Nice
    </span>
    
    taken by photographer
    
    <a property="dc:creator"
       href="/user/markb">
      Mark Birbeck
    </a>,
    
    licensed under a
    
    <a rel="cc:license"
       href="http://creativecommons.org/licenses/by-nc/2.5/">
      Creative Commons Non-Commercial License
    </a>.
    
  </li>

  <li about="/user/markb/photo/34567">
    
    <img src="/user/markb/photo/34567_thumbnail" /> 
    
    <span property="dc:title">
      W3C Meeting in Mandelieu
    </span>
    
    taken by photographer
    
    <a property="dc:creator"
       href="/user/stevenp">
         Steven Pemberton
    </a>,
    
    licensed under a
    
    <a rel="cc:license"
       href="http://creativecommons.org/licenses/by/2.5/">
         Creative Commons Commercial License
    </a>.
  
  </li>
</ul>

While it makes sense for Shutr to have a whole web page dedicated to each photo album, it might not make as much sense to have a single page for each camera owned by a user. A single page that describes all cameras belong to a single user is the more likely scenario. For this purpose, RDFa provides ways to make structured data statements about chunks of documents using natural HTML constructs.

Consider the page http://www.shutr.net/user/markb/cameras, which, as its URI implies, lists Mark Birbeck's cameras. Its HTML includes:

<ul>
  <li id="nikon_d200"> Nikon D200, 3 pictures/second.
  </li>

  <li id="canon_sd550"> Canon Powershot SD550, 5 pictures/second.
  </li>
</ul>

and the photo page will then include information about which camera was used to take each photo:

<ul>
  <li>
    <img src="/user/markb/photo/23456_thumbnail" />
    ...
    using the <a href="/user/markb/cameras#nikon_d200">Nikon D200</a>,
    ...
  </li>
...
</ul>

The RDFa syntax for formally specifying the relationship is exactly the same as before, as expected:

<ul>
  <li about="/user/markb/photo/23456">
    <img src="/user/markb/photo/23456_thumbnail" />
    ...
    using the <a rel="shutr:takenWithCamera" 
         href="/user/markb/cameras#nikon_d200">Nikon D200</a>,
    ...
  </li>
...
</ul>

Then, the HTML snippet at http://www.shutr.net/user/markb/cameras is:

<ul>
  <li id="nikon_d200" about="#nikon_d200">
    
    <span property="dc:title">
      Nikon D200
    </span>
    
    <span property="shutr:shutterSpeed">
      3 pictures/second
    </span>
    
  </li>

  <li id="canon_sd550" about="#canon_sd550">

    <span property="dc:title">
      Canon Powershot SD550
    </span>

    <span property="shutr:shutterSpeed">
      5 pictures/second
    </span>

  </li>
</ul>

3.3 Data Types

When dealing with fields of structured data, one may well want (or need) to specify a data type. Consider the expression of a date. We have already seen how the human-rendered and machine-readable data may not be the same, and how we can use content to provide a machine-readable value. Adding a datatype is only one more attribute: datatype. For example, when expressing the date on which a photo was taken:

    <ul>
      <li about="/user/markb/photo/23456">

        ...
        
        take on
        <span property="dc:date" content="2007-05-12" datatype="xsd:date">
          May 12th, 2007
        </span>

        ...
        
      </li>

    </ul>

Note how we use XML data types.

3.4 Layers of Structured Data — Subresources

4 RDF Correspondence

RDF [RDF] is the W3C's standard for interoperable structured data. Though one need not be versed in RDF to understand the basic concepts of RDFa, it helps to know that RDFa is effectively the embedding of RDF in HTML.

Briefly, RDF is an abstract generic data model. An RDF statement is a triple, composed of a subject, a predicate, and an object. For example, the following triple has /photos/123 as subject, dc:title as predicate, and the literal "Beautiful Sunset" as object:

  </photos/123> dc:title "Beautiful Sunset" .

A triple effectively relates its subject and object by its predicate: the document /photos/123 has, as title, "Beautiful Sunset". Structured data in RDF is represented as a set of triples. The notation above is called N3 [N3]. URIs are written using angle brackets, literals are written in quotation marks, and compact URIs are written directly.

All subjects and predicates are nodes, while objects can be nodes or literals. Nodes can be URIs, or they can be blank, in which case they are not addressable by other documents. Blank nodes, denoted _:bnodename, are particularly useful when expressing layered data without having to assign URIs to intermediate nodes.

4.1 Events and Contact Information

In Section 2.2 Publishing An Event, Jo published an event without giving it a URI. The RDF triples extracted from her markup are:

_:bn0
       rdf:type cal:Vevent; 
       cal:summary "a talk at the XTech Conference about web widgets";
       cal:dtstart "20070508T1000+0200" .

In Section 2.3 Publishing Contact Information, Jo published contact information. The RDFa is parsed to generate the following RDF triples:

<http://example.org/staff/jo>
         contact:fn "Jo Smith";
         contact:title "distinguished web engineer";
         contact:org <http://example.org>;
         contact:email <mailto:jo@example.org>.

4.2 Simple Shutr Data

The XHTML+RDFa in the first Shutr example yields the following triples:

</user/markb/photo/23456> dc:title "Sunset in Nice" .
              
</user/markb/photo/34567> dc:title "W3C Meeting in Mandelieu" .

The more complete example, including licensing information, yields the following triples:

</user/markb/photo/23456>
    dc:title "Sunset in Nice" ;
    dc:creator "Mark Birbeck" ;
    cc:license <http://creativecommons.org/licenses/by-nc/2.5/> .

</user/markb/photo/34567>
    dc:title "W3C Meeting in Mandelieu" ;
    dc:creator "Steven Pemberton" ;
    cc:license <http://creativecommons.org/licenses/by/2.5/> .

The example that links a photo to the camera it was taken with corresponds to the following triple:

</user/markb/photo/23456> shutr:takenWith </user/markb/cameras#nikon_d200> .

while the complete camera descriptions yields:

<#nikon_d200>
  dc:title "Nikon D200" ;
  shutr:shutterspeed "3 pictures/second" .

<#canon_sd550>
  dc:title "Canon SD550" ;
  shutr:shutterspeed "5 pictures/second" .

Finally, the datatype attribute indicates a datatype as follows:

</user/markb/photo/23456> dc:date "2007-05-12"^^xsd:date .

4.3 Layered Data

ADD A CHAINING EXAMPLE HERE

4.4 Referring to URIs Without Clickability

ADD A RESOURCE/HREF EXAMPLE HERE

5 Case Studies

FOAF, hAudio, etc...

6 Acknowledgments

This document is the work of the RDF-in-HTML Task Force, including (in alphabetical order) Ben Adida, Mark Birbeck, Jeremy Carroll, Michael Hausenblas, Steven Pemberton, Ralph Swick, and Elias Torres. This work would not have been possible without the help of the Semantic Web Deployment and Best Practices Working Group, in particular chairs Guus Schreiber and David Wood. Earlier versions of this document were officially reviewed by Gary Ng and David Booth, both of whom provided insightful comments that significantly improved the work.

7 Bibliography

FOAF: The Friend of a Friend (FOAF) Project (See http://www.foaf-project.org/.)
RDFHTML: RDF-in-HTML Task Force (See http://www.w3.org/2001/sw/BestPractices/HTML/.)
SWD-WG: Semantic Web Best Deployment Working Group (See http://www.w3.org/2006/07/SWD/.)
SWBPD-WG: Semantic Web Best Practices and Deployment Working Group (See http://www.w3.org/2001/sw/BestPractices/.)
HTML-WG: HTML Working Group (See http://www.w3.org/MarkUp/Group/.)
ICAL-RDF: RDF Calendar Interest Group Note (See http://www.w3.org/TR/rdfcal/.)
VCARD-RDF: Representing vCard Objects in RDF/XML (See http://www.w3.org/TR/vcard-rdf.)