Copyright © W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use, and software licensing rules apply.
Current web pages, written in HTML, contain significant inherent structured data: calendar events, contact information, photo captions, song titles, copyright licensing information, etc. When publishers can express this data precisely, and when tools can read it robustly, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites. An event on a web page can be directly imported into a desktop calendar. A license on a document can be detected to inform the user of his rights automatically. A photo's creator, camera setting information, resolution, and topic can be published as easily as the original photo itself.
RDFa lets an HTML author express this structured data using extra HTML attributes. Where the data is already present on the page, e.g. the photo's caption, the HTML+RDFa author need not repeat it. RDFa uses RDF, which lets any publisher extend existing vocabularies or create new ones altogether.
This document introduces XHTML authors to RDFa with simple examples. For more detailed syntax specification, please consult the RDFa Syntax Document.
This is an internal draft produced by the Semantic Web Deployment Working Group [SWD-WG], in cooperation with the HTML Working Group [HTML-WG]. Initial work on RDFa began with the Semantic Web Best Practices and Deployment Working Group [SWBPD-WG].
This document is for internal review only and is subject to change without notice. This document has no formal standing within the W3C.
Since Working Draft #3 of this document:
class
attribute is no longer used to declare rdf:type
, as this was found to be too confusing. We now use the new instanceof
attribute.1 Purpose and Preliminaries
2 Simple Data: Publishing Events and Contacts
2.1 The Basic HTML
2.2 Publishing An Event
2.3 Publishing Contact Information
2.4 The Complete HTML with RDFa
2.5 Working Within a Fragment of the HTML
3 Advanced Concepts: Custom Vocabularies, Document Fragments, Complex Data, ...
3.1 Creating a Custom Vocabulary and Using Compact URIs
3.2 Qualifying Other Documents and Document Chunks
3.3 Data Types
3.4 Layers of Structured Data — Subresources
4 RDF Correspondence
4.1 Events and Contact Information
4.2 Simple Shutr Data
4.3 Layered Data
4.4 Referring to URIs Without Clickability
5 Case Studies
6 Acknowledgments
7 Bibliography
Current web pages, written in HTML, contain significant inherent structured data: calendar events, contact information, photo captions, song titles, copyright licensing information, etc. When publishers can express this data precisely, and when tools can read it robustly, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites. An event on a web page can be directly imported into a desktop calendar. A license on a document can be detected to inform the user of his rights automatically. A photo's creator, camera setting information, resolution, and topic can be published as easily as the original photo itself.
RDFa lets an HTML author express this structured data using extra attributes. Where the data is already present on the page, e.g. the photo's caption, the HTML+RDFa author need not repeat it. RDFa uses RDF, which lets any publisher extend existing vocabularies or create new ones altogether. This is confusing and scary to somebody that hasn't heard about RDF - they came here to learn about RDFa, not about how it came to be or what RDF does.
RDFa builds upon the Resource Description Framework (RDF), but the reader of this document is not expected to understand RDF: if you can write XHTML, you can write XHTML+RDFa. Changed slightly to be a little less technical and more welcoming. What you said was very accurate, but most people don't know the difference between HTML and XHTML.
We note that RDFa makes use of XML namespaces. In this document,
we assume, for simplicity's sake, that the following namespaces
are defined: dc
for Dublin Core, foaf
for
FOAF, cc
for Creative Commons, and xsd
for
XML Schema Definitions:
dc
: http://purl.org/dc/elements/1.1/foaf
: http://xmlns.com/foaf/0.1/cc
: http://web.resource.org/cc/xsd
: http://www.w3.org/2001/XMLSchemaJo blogs about her work, which involves web development.
Jo has an upcoming talk at the XTech Conference, on May
8th at 10am, where she will be discussing "web
widgets". She posts an announcement of her talk on her blog.
at http://jo-blog.example.org/
. Her blog also
includes her contact information (Jo has a fantastic spam
filter, so she is unafraid of publishing her email
address):
<html>
<head><title>Jo's Blog</title></head>
<body>
...
<p>
I'm giving a talk at the XTech Conference about web widgets, on May 8th at 10am.
</p>
...
<p class="contactinfo">
My name is Jo Smith. I'm a distinguished web engineer
at
<a href="http://example.org">
Example.org
</a>.
You can contact me
<a href="mailto:jo@example.org">
via email
</a>.
</p>
...
</body>
</html>
This short piece of mark-up is already full of contains important structured data.
The markup describes an event: a talk that Jo is giving. This event starts at 10am on May 8th. A summary of the event is "a talk at XTech 2007 on web widgets." We also have contact information for Jo: she works for the organization Example.org use a better organization name, like WebWare, Inc., with job title of "Distinguished Web Engineer." She can be contacted at the email address "jo@example.org." remember to change the e-mail address if you change the company name =P
At the moment, it is very difficult for software — like web browsers and search engines — to make use of this information data's implicit structure. We need a standard mechanism to explicitly express it, so that it can be extracted consistently. This is precisely what RDFa does. where RDFa comes in.
Jo would like to add some structure to this blog entry so that could we use a term like 'tag' to make this more accessible? "...Jo would like to tag this blog entry so that..." readers might be able to add her talk directly to their calendar. RDFa allows her to express this structure using a small handful of extra attributes. Since this is a calendar event, Jo will specifically use the iCal vocabulary [ICAL-RDF] to denote the data's structure.
The first step is to reference the iCal vocabulary within the HTML page, so that a parser may know where to look up the vocabulary terms: can we say "...so that a web browser can understand events..." - focus on what we're enabling without focusing on the technical details of what we're doing.
<html xmlns:cal="http://www.w3.org/2002/12/cal/ical#">
...
then, Jo declares a new event: We should think about color coding the sections in the HTML that are important to highlight - in this case, it is the text that we're adding to the HTML. It makes picking out the important bits much easier.
<p instanceof="cal:Vevent"> ... </p>
Note how the instanceof
attribute is used here to
define the kind of the data being expressed. (If Jo wanted to declare
multiple types, she could include more than one value in the
instanceof
attribute with space separation.) The use of
this attribute on the P
element ensures that, by default,
data expressed on contained elements inside this element refers to the same event. Thus,
inside this event declaration, Jo can set up the event fields, reusing
the existing HTML. For example, the event summary can be added like this: declared as:
I'm giving <span property="cal:summary">
a talk at the XTech Conference about web widgets
</span>,
The property
attribute on the span
element declares a data field attached to event declared by instanceof
in the enclosing P
.
Take a step back and re-read that previous sentence... while
technically accurate, I think it won't make sense to your
run-in-the-mill blogger.
Note how the existing rendered content, "a talk at the XTech Conference
about web widgets", is the value of this field. Sometimes, this isn't
the desired effect. Specifically, the start time of the event should be
rendered displayed nicely — "May 8th" —, but should also likely be represented in an
easy, a machine-parsable way, the standard iCal format: 20070508T1000+0200
you've
probably lost the bloggers at this point, they might be thinking "What
in the hell does that big number with the T in the middle of it mean?" In this case, the markup needs only a slight modification:
<span property="cal:dtstart" content="20070508T1000+0200">
May 8th at 10am
</span>
In this case, the actual content of the span
element, "May 8th at 10am", is ignored for structured data purposes: it has been replaced by the explicit content
attribute. The full markup is then: Make
this less technical sounding - am I annoying you yet :) - truth is, I
would've written this the same exact way and we don't want to make it
so simple that it could be easily misunderstood.
<html xmlns:cal="http://www.w3.org/2002/12/cal/ical#">
<head><title>Jo's Blog</title></head>
<body>
...
<p instanceof="cal:Vevent">
I'm giving
<span property="cal:summary">
a talk at the XTech Conference about web widgets
</span>,
on
<span property="cal:dtstart" content="20070508T1000+0200">
May 8th at 10am
</span>.
</p>
...
</body>
</html>
Note that Jo could have used any other HTML element, not just span
,
to carry the structure of mark up her event data. In other words, when the structure
of the data event information is already laid out in the HTML using elements such as h1
, em
, div
, etc..., Jo can simply add the property
attribute, and optionally the content
attribute, to mark up the event. indicate the specific structure.
(For the RDF-inclined reader, the RDF triples that correspond to the above markup are available in Section ???.)
END OF EDITS ----- END OF EDITS ----- END OF EDITS ----- END OF EDITS ----- END OF EDITS ----- END OF EDITS
Now that Jo has published an event using structured data, she realizes there is much data on her blog that she can mark up in the same way. Her contact information, in particular, is an easy target for structured markup with RDFa:
...
<p class="contactinfo">
My name is Jo Smith. I'm a distinguished web engineer
at
<a href="http://example.org">
Example.org
</a>.
You can contact me
<a href="mailto:jo@example.org">
via email
</a>.
</p>
...
Jo discovers the vCard RDF vocabulary [VCARD-RDF],
which she adds to her existing page. Since Jo thinks of vCards as a way
to publish her contact information, she uses the prefix contact
to designate this vocabulary. Note that, although Jo already imported
the iCal vocabulary, adding the vCard vocabulary is just as easy and
does not interfere:
<html xmlns:cal="http://www.w3.org/2002/12/cal/ical#"
xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#">
...
Jo then sets up her vCard using RDFa, by deciding that the
P
with class
set to contactinfo
will be
the container for her vcard. She notes, however, that the vCard schema
does not require declaring a vCard type. Instead, it is recommended
that a vCard refer to a web page that identifies the individual. Jo
thus uses RDFa's special attribute about
for just for this
purpose, indicating that all contained HTML pertains to Jo's designated
URI. Note how the about
attribute is inherited from parent
elements in the HTML: the about
attribute on the nearest
ancestor applies to declared structured data.
...
<p class="contactinfo" about="http://example.org/staff/jo">
...everything here pertains to http://example.org/staff/jo...
</p>
...
"Simple enough!" Jo realizes. She adds her first vCard fields: name, title, organization and email.
...
<p class="contactinfo" about="http://example.org/staff/jo">
My name is
<span property="contact:fn">
Jo Smith
</span>.
I'm a
<span property="contact:title">
distinguished web engineer
</span>
at
<a rel="contact:org" href="http://example.org">
Example.org
</a>.
You can contact me
<a rel="contact:email" href="mailto:jo@example.org">
via email
</a>.
</p>
...
Notice how Jo was able to use the rel
attribute directly within the anchor tag for designating her organization and email address. In this case, the rel
indicates a relationship between the current URI, designated by about
, and the target URI, designated by href
. The type of relationship is defined by the rel
. In this case, contact:org
indicates a relationship of type "vCard organization", while contact:email
indicates a relationship of type "vCard email".
Note how, for simplicity's sake, we have slightly abused the vCard vocabulary above: vCard technically requires that the type of the email address be specified, e.g. work or home email. In Section 4.3 Layered Data, we show how rel
can be used without a corresponding href
, in order to create subresources and provide the correct markup for expressing a true vCard.
Jo's complete HTML with RDFa is thus:
<html xmlns:cal="http://www.w3.org/2002/12/cal/ical#"
xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#">
...
<p instanceof="cal:Vevent">
I'm giving
<span property="cal:summary">
a talk at the XTech Conference about web widgets
</span>,
on
<span property="cal:dtstart" content="20070508T1000+0200">
May 8th at 10am
</span>.
</p>
...
<p class="contactinfo" about="http://example.org/staff/jo">
My name is
<span property="contact:fn">
Jo Smith
</span>.
I'm a
<span property="contact:title">
distinguished web engineer
</span>
at
<a rel="contact:org" href="http://example.org">
Example.org
</a>.
You can contact me
<a rel="contact:email" href="mailto:jo@example.org">
via email
</a>.
</p>
...
Note how, if Jo changes her email address link, her organization, or
the title of her talk, the RDFa approach will automatically pick up
these changes in the marked up, structured data. The only places where
this doesn't happen is when the content
attribute must override the rendered content, which is inevitable when
the human-rendered data and the machine-readable data must differ.
(Once again, the RDF-inclined reader will want to consult the resulting RDF triples 4 RDF Correspondence.)
What if Jo does not have complete control over the HTML of her blog?
For example, she may be using a templating system which makes it
particularly difficult to add the vocabularies in the html
element at the top of her page without adding it to every page on her
site. Or, she may be using a web provider that doesn't allow her to
change the header of the page to begin with.
Fortunately, RDFa uses standard XML namespaces, which means that the vocabularies can be imported "locally" to an HTML element. Jo's HTML blog page could express the exact same structured data with the following markup:
<html>
...
<p instanceof="cal:Vevent"
xmlns:cal="http://www.w3.org/2002/12/cal/ical#">
I'm giving
<span property="cal:summary">
a talk at the XTech Conference about web widgets
</span>,
on
<span property="cal:dtstart" content="20070508T1000+0200">
May 8th at 10am
</span>.
</p>
...
<p class="contactinfo" about="http://example.org/staff/jo"
xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#">
My name is
<span property="contact:fn">
Jo Smith
</span>.
I'm a
<span property="contact:title">
distinguished web engineer
</span>
at
<a rel="contact:org" href="http://example.org">
Example.org
</a>.
You can contact me
<a rel="contact:email" href="mailto:jo@example.org">
via email
</a>.
</p>
...
Of course, just like in the case of the vocabularies defined on the top-level html
tag, more than one vocabulary can be imported into any element. In this case, each p
only needs one vocabulary: the first uses iCal, the second uses vCard.
This approach helps with the desired ability to copy-and-paste HTML
from one page to another: the closer the namespace declarations to
their relevant statements, the easier it is to copy and paste the
content safely.
RDFa can do much more than the simple examples described above. In this section, we explore some of its advanced capabilities. We consider:
All field names and data types in RDFa are URIs, e.g.
http://purl.org/dc/elements/1.1/title
is the "Dublin Core
title" field. In RDFa, we often use compact versions of those URIs, by
defining a prefix using XML namespaces, and using the prefixed notation to
designate the URI. This helps keep the markup short and clean:
<div xmlns:dc="http://purl.org/dc/elements/1.1/">
<span property="dc:title">Yowl</span>,
created by
<span property="dc:creator">Mark Birbeck</span>.
</div>
Because concepts are simply URIs, it is trivial to create one's own vocabulary: simply mint new URIs in a domain you control, and use them in RDFa markup.
Consider a (fictional) photo management web site
called Shutr, whose web site
is http://www.shutr.net
. Users of Shutr can upload
their photos at will, annotate them, organize them into
albums, and share them with the world. They can choose to
keep these photos private, or make them available for public
consumption under licensing terms of their choosing.
Shutr chooses to mark up its photos with RDFa, so that client-side
tools may be able to extract information automatically. Some concepts,
such as dc:title
, dc:date
, etc. can be clearly
reused from the Dublin Core vocabulary, but other concepts, such as lens
settings, camera model, and other photographer parameters, may need to be
defined from scratch. For this purpose, Shutr defines a vocabulary
namespace URI:
http://shutr.net/vocab/1.0/
Shutr can then publish terms such as http://shutr.net/vocab/1.0/takenWithCamera
, http://shutr.net/vocab/1.0/aperture
, etc.
Shutr may choose to present many photos in a given HTML page. In
particular, at the URI
http://www.shutr.net/user/markb/album/12345
, all of the
album's photos will appear inline. Structured data about each photo can
be included simply by specifying an about
attribute, which
indicates the resource that fields refer to within that HTML element.
<ul>
<li> <img src="/user/markb/photo/23456_thumbnail" />,
<span about="/user/markb/photo/23456" property="dc:title">
Sunset in Nice
</span>
</li>
<li> <img src="/user/markb/photo/34567_thumbnail" />,
<span about="/user/markb/photo/34567" property="dc:title">
W3C Meeting in Mandelieu
</span>
</li>
</ul>
This same approach applies to statements with URI objects. For
example, each photo in the album has a creator and may have its own
usage license. We can use the convenient inheritance of the
about
attribute to refer to the photo once, and add as many fields as we need:
<ul>
<li about="/user/markb/photo/23456">
<img src="/user/markb/photo/23456_thumbnail" />,
<span property="dc:title">
Sunset in Nice
</span>
taken by photographer
<a property="dc:creator"
href="/user/markb">
Mark Birbeck
</a>,
licensed under a
<a rel="cc:license"
href="http://creativecommons.org/licenses/by-nc/2.5/">
Creative Commons Non-Commercial License
</a>.
</li>
<li about="/user/markb/photo/34567">
<img src="/user/markb/photo/34567_thumbnail" />
<span property="dc:title">
W3C Meeting in Mandelieu
</span>
taken by photographer
<a property="dc:creator"
href="/user/stevenp">
Steven Pemberton
</a>,
licensed under a
<a rel="cc:license"
href="http://creativecommons.org/licenses/by/2.5/">
Creative Commons Commercial License
</a>.
</li>
</ul>
While it makes sense for Shutr to have a whole web page dedicated to each photo album, it might not make as much sense to have a single page for each camera owned by a user. A single page that describes all cameras belong to a single user is the more likely scenario. For this purpose, RDFa provides ways to make structured data statements about chunks of documents using natural HTML constructs.
Consider the
page http://www.shutr.net/user/markb/cameras
,
which, as its URI implies, lists Mark Birbeck's
cameras. Its HTML includes:
<ul>
<li id="nikon_d200"> Nikon D200, 3 pictures/second.
</li>
<li id="canon_sd550"> Canon Powershot SD550, 5 pictures/second.
</li>
</ul>
and the photo page will then include information about which camera was used to take each photo:
<ul>
<li>
<img src="/user/markb/photo/23456_thumbnail" />
...
using the <a href="/user/markb/cameras#nikon_d200">Nikon D200</a>,
...
</li>
...
</ul>
The RDFa syntax for formally specifying the relationship is exactly the same as before, as expected:
<ul>
<li about="/user/markb/photo/23456">
<img src="/user/markb/photo/23456_thumbnail" />
...
using the <a rel="shutr:takenWithCamera"
href="/user/markb/cameras#nikon_d200">Nikon D200</a>,
...
</li>
...
</ul>
Then, the HTML snippet at http://www.shutr.net/user/markb/cameras
is:
<ul>
<li id="nikon_d200" about="#nikon_d200">
<span property="dc:title">
Nikon D200
</span>
<span property="shutr:shutterSpeed">
3 pictures/second
</span>
</li>
<li id="canon_sd550" about="#canon_sd550">
<span property="dc:title">
Canon Powershot SD550
</span>
<span property="shutr:shutterSpeed">
5 pictures/second
</span>
</li>
</ul>
When dealing with fields of structured data, one may well want (or
need) to specify a data type. Consider the expression of a date. We have
already seen how the human-rendered and machine-readable data may not be
the same, and how we can use content
to provide a
machine-readable value. Adding a datatype is only one more attribute:
datatype
. For example, when expressing the date on which a
photo was taken:
<ul>
<li about="/user/markb/photo/23456">
...
take on
<span property="dc:date" content="2007-05-12" datatype="xsd:date">
May 12th, 2007
</span>
...
</li>
</ul>
Note how we use XML data types.
RDF [RDF] is the W3C's standard for interoperable structured data. Though one need not be versed in RDF to understand the basic concepts of RDFa, it helps to know that RDFa is effectively the embedding of RDF in HTML.
Briefly, RDF is an abstract generic data model. An RDF statement is a
triple, composed of a subject, a predicate, and an object. For example,
the following triple has /photos/123
as subject, dc:title
as predicate, and the literal "Beautiful Sunset" as object:
</photos/123> dc:title "Beautiful Sunset" .
A triple effectively relates its subject and object by its predicate:
the document /photos/123
has, as title, "Beautiful Sunset".
Structured data in RDF is represented as a set of triples. The notation
above is called N3 [N3]. URIs are written using angle
brackets, literals are written in quotation marks, and compact URIs are
written directly.
All subjects and predicates are nodes, while objects can be nodes or
literals. Nodes can be URIs, or they can be blank, in which case they are
not addressable by other documents. Blank nodes, denoted
_:bnodename
, are particularly useful when expressing layered
data without having to assign URIs to intermediate nodes.
In Section 2.2 Publishing An Event, Jo published an event without giving it a URI. The RDF triples extracted from her markup are:
_:bn0
rdf:type cal:Vevent;
cal:summary "a talk at the XTech Conference about web widgets";
cal:dtstart "20070508T1000+0200" .
In Section 2.3 Publishing Contact Information, Jo published contact information. The RDFa is parsed to generate the following RDF triples:
<http://example.org/staff/jo>
contact:fn "Jo Smith";
contact:title "distinguished web engineer";
contact:org <http://example.org>;
contact:email <mailto:jo@example.org>.
The XHTML+RDFa in the first Shutr example yields the following triples:
</user/markb/photo/23456> dc:title "Sunset in Nice" .
</user/markb/photo/34567> dc:title "W3C Meeting in Mandelieu" .
The more complete example, including licensing information, yields the following triples:
</user/markb/photo/23456>
dc:title "Sunset in Nice" ;
dc:creator "Mark Birbeck" ;
cc:license <http://creativecommons.org/licenses/by-nc/2.5/> .
</user/markb/photo/34567>
dc:title "W3C Meeting in Mandelieu" ;
dc:creator "Steven Pemberton" ;
cc:license <http://creativecommons.org/licenses/by/2.5/> .
The example that links a photo to the camera it was taken with corresponds to the following triple:
</user/markb/photo/23456> shutr:takenWith </user/markb/cameras#nikon_d200> .
while the complete camera descriptions yields:
<#nikon_d200>
dc:title "Nikon D200" ;
shutr:shutterspeed "3 pictures/second" .
<#canon_sd550>
dc:title "Canon SD550" ;
shutr:shutterspeed "5 pictures/second" .
Finally, the datatype
attribute indicates a datatype as follows:
</user/markb/photo/23456> dc:date "2007-05-12"^^xsd:date .
This document is the work of the RDF-in-HTML Task Force, including (in alphabetical order) Ben Adida, Mark Birbeck, Jeremy Carroll, Michael Hausenblas, Steven Pemberton, Ralph Swick, and Elias Torres. This work would not have been possible without the help of the Semantic Web Deployment and Best Practices Working Group, in particular chairs Guus Schreiber and David Wood. Earlier versions of this document were officially reviewed by Gary Ng and David Booth, both of whom provided insightful comments that significantly improved the work.