some real-world SOURCE test cases

hi all,

I spent some time collecting a few real-world tests cases about SOURCE  
clause usage, partially as reported into a previous message

	http://lists.w3.org/Archives/Public/public-rdf-dawg/2004JulSep/ 
0327.html

and partially from other demos we have been preparing.

Testing data and manifest (as derived from DAWG W3C latest version) is  
available at

	http://www.rdfstore.org/dawg-testcases/data.tar.gz

or test the manifest at
	
	http://www.rdfstore.org/dawg-testcases/data/source-simple2/manifest.n3

Results as run by our software (partially implementing SPARQL syntax) at

	http://www.rdfstore.org/dawg-testcases/results.html

Some important points:

	- SOURCE is tested to either be a URI or a bNode
	- more SOURCE information (provenance?) is being specified at the  
manifest level using a ad-hoc
	  property called qt:metadata - each  qt:metadata contains RDF data  
explicitly associated with a qt:data source (see  
source-metadata-003.rdf
	  for an example)
	- if a URI, SOURCE information can also simply be associated  
(implicitly) using the rdf:about="" RDF/XML construct (canonical case)
	- all data is used for testing purposes only, and some is partially  
real-data collected through google search about FOAF
	 (do we violate some privacy issues by distributing FOAF files  
colledted? or any W3C policy?) --> me prompt to obfuscate data if  
necessary
	- mf:result is generated as RDF/XML (my implementation does not  
generate Turtle/N3 yet)
	- all refs into the manifest and data/metadata files are relative and  
should be parse-able remotely fine
	- last test case (source-query-012) is an attempt to implement the  
'Identity Management' use-case to motivate bNodes as graph-names
	  ( http://www.w3.org/TR/rdf-dawg-uc/#u2.15 )

The algorithm used to carry out the SOURCE identifiers (URI or bNode)  
and associate qt:metadata to qt:data is the following:

	- take all qt:metadata and ingest them into an RDF merged graph  
(mgraph) - or do nothing
	- foreach qt:data check if there is any other triple into the mgraph  
descriging it
	- if not, ingest/merge the qt:data source using the qt:data SOURCE URI  
into mgraph (setting the context/graph-name to the given URI)
	- if mgraph matched about i-esim qt:data URI, ingest/merge the qt:data  
using the matched SOURCE information into mgraph
	  (the matched information as expressed into the qt:metadata could  
either be about the SOURCE URI or the bNode indirectly
	   referring to the SOURCE URI via dc:source)
	- run the query over mgraph

NOTE: the mgraph is defined to be a merged of qt:data and qt:metadata  
to make testing easier - but they could be separated in the general  
case

(see http://www.rdfstore.org/dawg-testcases/runtests.txt for details if  
of interest)

I hope they will help to keep going the SOURCE / CNC discussion -  
and/or provide some support to SOURCE testing

comments / corrections are more than welcome

cheers

Albe

-
Alberto Reggiori, Senior Partner, R&D @Semantics S.R.L.
alberto@asemantics.com  www.asemantics.com
Milan Office, milano@asemantics.com,   +39 0332 667092

Received on Tuesday, 26 October 2004 00:52:18 UTC