Re: finding newer versions of W3C Technical Reports [was: trivial question about SPARQL]

+cc: Dom

Bijan Parsia wrote:
> One thing that might be helpful...and doesn't even require W3Cness!...to 
> put together an alternative interface, e.g., exhibit or j/mspace or...
> 
> A simple scrape of TR plus all the prior versions would not only make it 
> easy to find the latest, but see prior versions etc.
> 
> I don't have the time, etc. to do the whole task, but I'd certainly 
> help/host any efforts in this direction. I, personally, would benefit 
> from such a service :)

That's an interesting idea. No scraping needed though.

Anyone who manages to read all the way to the bottom of 
http://www.w3.org/TR/ and sees the RDF logo there, and decides to click 
it, is rewarded with a link to http://www.w3.org/2002/01/tr-automation/

"""Automating the publication of Technical Reports
Abstract

This document presents the "TR Automation" project; this project, based 
on the use of Semantic Web tools and technologies, has allowed to 
streamline the publication paper trail of W3C Technical Reports, to 
maintain an RDF-formalized index of these specifications and to create a 
number of tools using these newly available data."""

There's an RDF version here, so no need to scrape.

http://www.w3.org/2002/01/tr-automation/tr.rdf

This seems to be fresh:

HEAD /2002/01/tr-automation/tr.rdf HTTP/1.1
Host: foo

HTTP/1.1 200 OK
Date: Fri, 25 Jan 2008 18:08:16 GMT
Server: Apache/2
Last-Modified: Thu, 24 Jan 2008 14:39:20 GMT
ETag: "44478ce9cf600"
Accept-Ranges: bytes
Content-Length: 540774
Cache-Control: max-age=21600
Expires: Sat, 26 Jan 2008 00:08:16 GMT
P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
Content-Type: application/rdf+xml; qs=0.9


This gives 7k triples,
FlipFlop:~ danbri$ rapper --count 
http://www.w3.org/2002/01/tr-automation/tr.rdf
rapper: Parsing URI http://www.w3.org/2002/01/tr-automation/tr.rdf
rapper: Parsing returned 7675 triples


Quick SPARQL experiment:

FlipFlop:tr-automation danbri$ more recent.rq
PREFIX doc: <http://www.w3.org/2000/10/swap/pim/doc#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT DISTINCT ?title ?manifestation ?date
FROM <tr.rdf>
WHERE {
  ?manifestation doc:versionOf ?work; dc:date ?date; dc:title ?title.
    FILTER ( regex( ?date, "200[8]")).
}
ORDER BY ?date


Running this with Roqet, a Redland utility:

FlipFlop:tr-automation danbri$ roqet recent.rq
roqet: Querying from file recent.rq
roqet: Query has a variable bindings result
result: [title=string("XHTML Access Module"), 
manifestation=uri<http://www.w3.org/TR/2008/WD-xhtml-access-20080107>, 
date=string("2008-01-07")]
result: [title=string("OWL 1.1 Web Ontology Language: Mapping to RDF 
Graphs"), 
manifestation=uri<http://www.w3.org/TR/2008/WD-owl11-mapping-to-rdf-20080108/>, 
date=string("2008-01-08")]
result: [title=string("OWL 1.1 Web Ontology Language: Model-Theoretic 
Semantics"), 
manifestation=uri<http://www.w3.org/TR/2008/WD-owl11-semantics-20080108/>, 
date=string("2008-01-08")]
result: [title=string("OWL 1.1 Web Ontology Language: Structural 
Specification and Functional-Style Syntax"), 
manifestation=uri<http://www.w3.org/TR/2008/WD-owl11-syntax-20080108/>, 
date=string("2008-01-08")]
result: [title=string("SMIL Timesheets 1.0"), 
manifestation=uri<http://www.w3.org/TR/2008/WD-timesheets-20080110/>, 
date=string("2008-01-10")]
result: [title=string("Service Modeling Language, Version 1.1"), 
manifestation=uri<http://www.w3.org/TR/2008/WD-sml-20080114/>, 
date=string("2008-01-14")]
result: [title=string("Service Modeling Language Interchange Format 
Version 1.1"), 
manifestation=uri<http://www.w3.org/TR/2008/WD-sml-if-20080114/>, 
date=string("2008-01-14")]
result: [title=string("Synchronized Multimedia Integration Language 
(SMIL 3.0)"), 
manifestation=uri<http://www.w3.org/TR/2008/CR-SMIL3-20080115/>, 
date=string("2008-01-15")]
result: [title=string("SPARQL Query Results XML Format"), 
manifestation=uri<http://www.w3.org/TR/2008/REC-rdf-sparql-XMLres-20080115/>, 
date=string("2008-01-15")]
result: [title=string("SPARQL Protocol for RDF"), 
manifestation=uri<http://www.w3.org/TR/2008/REC-rdf-sparql-protocol-20080115/>, 
date=string("2008-01-15")]
result: [title=string("SPARQL Query Language for RDF"), 
manifestation=uri<http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115/>, 
date=string("2008-01-15")]
result: [title=string("HTML 5"), 
manifestation=uri<http://www.w3.org/TR/2008/WD-html5-20080122/>, 
date=string("2008-01-22")]
result: [title=string("HTML 5 differences from HTML 4"), 
manifestation=uri<http://www.w3.org/TR/2008/WD-html5-diff-20080122/>, 
date=string("2008-01-22")]
result: [title=string("Relationship Between Mobile Web Best Practices 
1.0 and Web Content Accessibility Guidelines"), 
manifestation=uri<http://www.w3.org/TR/2008/WD-mwbp-wcag-20080122/>, 
date=string("2008-01-22")]
roqet: Query returned 14 results


These dates as strings not datatyped, so for this experiment I just 
asked it what W3C had been up to in 2008, expressed as a regex. A real 
app would probably have to rummage around a bit more to find latest 
version, I'm not sure what can be done in a single pass of SPARQL. The 
data also has author information at all, but unfortunately no 
identifying properties of the authors currently.

If anyone's going to play with this, do look around the project page at 
http://www.w3.org/2002/01/tr-automation/ for tools and ideas. And of 
course if you do make a fancy pretty alternate interface, please take 
care to make it clear that the page is your effort not W3C's, and that 
it itself might not be up to date. Otherwise we could end up back where 
we started here :)

cheers,

Dan

ps. on a related note, 
http://tirania.org/blog/archive/2008/Jan-24-1.html describes a similar 
problem with software downloads and users finding the old version by 
accident...

Received on Friday, 25 January 2008 18:31:20 UTC