Some real graphs

In the interests of grounding discussion in real examples,
http://inkdroid.org/journal/2012/05/15/diving-into-viaf/ is
interesting. Ed talks about the VIAF dataset, which integrates library
/ cultural heritage metadata about people and their names. Behind the
scenes they aggregate from many sources, but the publication model
simplifies a lot of that. Still, what they publish as RDF is a set of
graphs, rather than one big one:

"The RDF Cluster Dataset
http://viaf.org/viaf/data/viaf-20120422-clusters.xml.gz is 2.1G gzip
compressed RDF data. Rather than it being one complete RDF/XML file,
each line has a complete RDF/XML document on it, which represents a
single cluster. All in all there are 20,379,541 clusters in the file."

Lots more in Ed's post. I don't know why they chopped the giant graph
into one per person. Or, seen another way, why they composited all the
source graphs for each person into a single flat one.

My hope for this WG's graphs work is that it'll make data manipulation
and sharing at the graph level more natural and fluid, so that
publishers like VIAF could share underlying provenance, and per-person
aggregates, and per-source aggregates, as views into a common
quad-soup...

Dan

Received on Wednesday, 16 May 2012 09:03:40 UTC