RDFa/Microdata -- comments from an outsider (user & consumer) from Reece Dunn on 2011-07-07 (public-rdfa-wg@w3.org from July 2011)

From: Reece Dunn <msclrhd@googlemail.com>
Date: Thu, 7 Jul 2011 19:16:39 +0100
To: public-rdfa-wg@w3.org
Message-ID: <CAGdtn26KcxoK8b8_Jo=0YyFZrxqijB9MsMvirV_ueqhRS716cA@mail.gmail.com>
Hi,

I am not sure what is the best place for this, but given the recent
discussions I thought I'd raise my thoughts.

I am a user (author) of RDFa metadata on a website and am writing a
C++ application (document reader) that consumes metadata from multiple
document types (X/HTML, ePub, ODF, DocBook, etc.) and stores that
metadata internally as an RDF graph.


AS AN AUTHOR...


1 ... I want to express RDF metadata triples in a webpage and take
advantage of HTML5 markup.

I am using RDFa because I find the CURIE syntax easier to read and
more compact than the Microdata format -- especially having exposure
of this in XML and RDF documents.

I also don't fully understand how the Microdata format maps to an RDF
graph -- the algorithm for the Microdata specification on how to
generate RDF is difficult to grok easily, whereas the RDFa
representation is easier to understand.

The Microdata format reads more verbosely with the namespaces written
out in full and the general approach.


2 ... I want to express basic document information about the page
(title, author, etc.) in Dublin Core metadata triples.

But I don't want to mix and match metadata syntaxes (e.g. using the DC
schema syntax in the meta element) and want to keep data repetition to
a minimum.


3 ... I want to express Creative Commons attribution and W3C
validation links as given.

That is: I don't want to work out how to convert the markup between
formats (RDFa <-> Microdata).


4 ... I want to express bibliographical references accurately.

For example:

	<li id="ref1" rel="dct:references">
		<span typeof="foaf:Document"
about="http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/">
			<span property="dc:creator">Dave Beckett</span>,
			<a property="dc:title"
href="http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/">RDF/XML
Syntax Specification (Revised)</a>.
			<span rel="dc:publisher" resource="_:W3C"/>
			<span typeof="foaf:Organisation" about="_:W3C">
				<span rel="foaf:homepage" resource="http://www.w3.org"/>
				<span property="foaf:name">World Wide Web Consortium</span>
				(<span property="foaf:name">W3C</span>),
			</span>
			<span property="dc:date" datatype="xsd:date"
content="2004-02-10">2004</span>.
		</span>
	</li>

I have the _:W3C as I point other dc:publisher elements at that single
expression:

			<span rel="dc:publisher" resource="_:W3C"/>World Wide Web
Consortium (W3C)</span>,

This means I inherently want to be able to express graphs, even in a
single document.

My only gripe here with the RDFa syntax is that I cannot say:

			<span rel="dc:publisher" about="_:W3C" typeof="foaf:Organisation">
				<span rel="foaf:homepage" resource="http://www.w3.org"/>
				<span property="foaf:name">World Wide Web Consortium</span>
				(<span property="foaf:name">W3C</span>),
			</span>

to avoid the blank span tag.


AS AN IMPLEMENTER ...


1 ... I want to process the metadata in a single pass

This is because I want to keep the implementation simple and efficient.


2 ... I don't want to generate a DOM for the HTML document in order to
extract metadata

This is related to point 1 -- keeping the implementation efficient
(especially when extracting metadata from a large document such as
Anna Karenina on Project Gutenberg -- 2.1MB).


3 ... I don't want to duplicate the implementation for processing HTML documents

Due to the nature of HTML documents, I am using a relaxed parser for
HTML and am using that parser to handle XHTML documents (to avoid
having two parsers).

- Reece
Received on Friday, 8 July 2011 13:40:27 UTC