RDFa linting/validation [Re: Tomorrow: Social Web XG Meeting cancelled again]

On Tue, 4 May 2010 21:53:23 +0200
Dan Brickley <danbri@danbri.org> wrote:

> Suggested 'homework' for the week off: think about rdfa validators
> and how they could be made modular to help publishers check which
> consumer sites will understand their data...

This is an interesting idea, and one I'd be interested in working on. I
think that the Perl RDF modules on CPAN cover this area quite well, and
it would not be too difficult to get a pretty useful tool working. Basic
technique:

1. We need RDF::RDFa::Linter::Google, RDF::RDFa::Linter::Facebook, etc.
These would provide the following methods:

	* RDF::RDFa::Linter::Foo->usual_prefixes

	A list of the vocabs used by the service, with the prefixes
	they're "normally" bound to, as recommended by the service's
	documentation.

	* RDF::RDFa::Linter::Foo->filter

	Given a stream of triples, should filter some out to leave only
	triples that are thought to be understood by the service. e.g.
	RDF::RDFa::Linter::Facebook would leave only the OGP triples.

	* RDF::RDFa::Linter::Foo->required_predicates($class)

	For any given class URI, returns a list of URIs of predicates
	that the service considers to be required. e.g. some services
	might insist that all foaf:Person instances have a foaf:name.

2. We need RDF::RDFa::Linter which uses RDF::RDFa::Parser to parse an
RDFa document, auto-correcting missing CURIE bindings (but remembering
the error) using the information from a service's 'usual_prefixes'
method, filters using the service's 'filter' function, and generates
warnings based on the 'required_predicates'.

3. We need RDF::RDFa::Writer::Pretty to take an RDF graph and write it
out as pretty, human-readable RDFa. As well as the graph, it should be
able to take a collection of warnings (each of which has a subject
resource URI or bnode identifier associated with it), and include them
in the output.

4. Lastly, we'd need to wrap it up in a web form that asks for a URI
and then presents the results of the various linters in a nice, tabbed
interface.

This seems like a lot of "we need" rather than "we have" given that
I've already said that the Perl RDF modules cover a lot of what we
need. What we already have that should prove useful:

	- An RDF parser that supports tag soup HTML, and
	  provides onprefix, oncurie and ontriple callbacks
	  that would be needed for this; and

	- A decent framework for querying the resulting
	  graph for the "required predicates".

I'm going to have a go at #3 this evening on my train journey, because
I've been wanting something similar for other purposes anyway.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>

Received on Wednesday, 5 May 2010 13:32:44 UTC