RE: Artstor XML -> RDF from Butler, Mark on 2003-10-06 (www-rdf-dspace@w3.org from October 2003)

From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
Date: Mon, 6 Oct 2003 18:53:31 +0100
To: www-rdf-dspace@w3.org
Message-ID: <E864E95CB35C1C46B72FEA0626A2E80820619C@0-mail-br1.hpl.hp.com>
Some comments from Andy on the RDF/XML version of the ArtStor data.

-----Original Message-----
From: Seaborne, Andy 
Sent: 06 October 2003 15:56

Mark,

First - I realise the version is a first stab so read these comments as
"thougths provoked by", rather than being comments directly on the
transformation you did.  Here are my first thoughts.

1/ As I understand it you used an XSLT transform to turn the artstore
example record into RDF/XML.  That is, it is a syntactic conversion.  I'm
not sure that a purely syntactic approach will scale - by reading in XML,
into an abstract RDF graph, we can do more checking in the RDF world.  Also,
for a lot of data, there is likely to be duff records and encoding issues so
targetting an RDF graph might be more flexible.

2/ Like any source, we will have some data cleaning to do.  This will
involve processing the data over and above the XML transformation - example:


<Personal_Name>Leonardo,da Vinci,1452-1519</Personal_Name>

Whatever else we do with people and names (see more below), I don't think
Leonardo is a number!

3/ People: structuring names enables us to search over family names.  The
vCard vocabulary divides name into:

Formatted Name : a string that is how the name should appear.
Structured name:
  [ family "family name"; 
    middle "middle name or initial";
    given "given name"] ;

The vCard vocabulrary has other weakness (not name related) but this is a
good starting point.  It also makes people 1st class concepts as you have
suggested elsewhere.

4/ Can we use other vocabularies, not just have one.  Additionally, can we
split 

Things that occur to be are:
	Our version of VRA core
	Local specails (e.g. filename)
	EXIF for JPEG details
	

5/ Use subproperties, not nested structures.

6/ URIs for constants.  Then annotate the resource with rdfs:label, etc.

7/ You have used datatypes (ints, dates).  Depending on the quality of the
input, we may need to just deal in strings. Low level - but if we get
significant malformed data, preserving the appearance might be important.



Which suggests to be that we might wish the following processes:

A/ XML -> "local RDF" - vocabulary directly reflects the XML schema.  Thus
the RDF is not in the final namespaces, nor has conceptualization occurred,
just translation to low level RDF.  Much checking at the syntactic level.

B/ Verification: checking constants, checking presence of mandatory
properties; other application profile and controlled vocabulary issues.

C/ Mapping file-centric RDF to conceptual model RDF: placement in VRA space,
use of other vocabularies, removal of unwanted properties (we don't have to
map everything).

There may be other steps.  We could do all these at once but we may wish to
repeat (example) just B and C, and of course there is the support issue of
one large "script".

	Andy

-------- Original Message --------
> From: Butler, Mark
> Date: 6 October 2003 12:11
> 
> Andy,
> 
> Don't worry, this is just a first stab, I'm not proposing some set in
> stone model, in fact after some experiences last week I would be very
> surprised if we don't have to change the model. I'm just avoid more
> than one of us typing in all those xsl:template's, xsl:value-of's and
> xsl:select's :)    
> 
> The enclosed stylesheet transforms the sample, now I need to work
> through the schema to check I have complete coverage. 
> 
> If you have any feedback please let me know, and yes we can discuss
> this at the meeting today. 
> 
> cheers
> 
> Mark
Received on Monday, 6 October 2003 13:54:06 UTC