- From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
- Date: Thu, 9 Oct 2003 15:41:30 +0100
- To: www-rdf-dspace@w3.org
- Message-ID: <E864E95CB35C1C46B72FEA0626A2E8082061B0@0-mail-br1.hpl.hp.com>
Hi team, in prep for the telecon today.
Here is an overview of the design decisions I made when creating the
stylesheet and schema for the Artstor data.
Artstor.xsd describes records that contain the following nested elements:
Image, MediaFiles, MediaFile, Collection, MetaData, Relation, Source,
Style_Period, Creator, Date, Description, ID_Number, Location, Title,
Material, Measurements, Subject.
The first problem was that Relation, Source, Style_Period, Creator, Date,
Description, ID_Number, Location, Title, Material, Measurements and Subject
are mixed mode i.e. they can contain text or other elements e.g.
<Creator>
<Personal_Name>Leonardo,da Vinci,1452-1519</Personal_Name>
<Corporate_Name>School of Leonardo</Corporate_Name>
</Creator>
or
<Creator>Leonardo,da Vinci,1452-1519</Creator>
are both acceptable. In fact, they can even do both at the same time, as
does <Subject> in the sample instance data
<Subject>Portraits
<Geographic>Italy</Geographic>
<Topic>Drawing</Topic>
</Subject>
So the XSLT stylesheet had to use some tricks to get around this. Also the
schema will need to define properties in addition to classes where a mixed
mode nested element is converted to a class, e.g. a Creator element with
nested elements is an instance of the Creator class, but if it contains text
as well this is represented as a creator property.
The second problem was deciding if the nested elements indicated
superproperties, classes or context - see earlier email. I decided that
- Image, MediaFiles/MediaFile, Collection, Relation and Creator are classes.
- Source is a superproperty used as a context e.g. there are two possible
instances of the Location element, one which is a nested element occurring
in MetaData, the other which is a text element occurring in Source. So I
inferred these two uses of Location are different, so Source is being used
here as a context. As there are only two uses of Location, decided to
replace <Source><Location> with a property called sourceLocation.
- MetaData, Style_Period, Date, Description, ID_Number, Title, Material,
Measurements and Subject are super-properties. As they are super-properties,
it is possible to flatten out their subproperties they contain (see Andy's
note about why this is desirable) so for example instead of Style_Period
being a property of MetaData, it's a property of the Image class, which is a
subproperty of MetaData. One issue here is both ID_Number and Date can
contain the elements Current_Repository and Former_Repository. However the
meaning of these elements does not seem to change based on the nesting, so
this was resolved by creating two properties called current_repository and
former_repository that are subproperties of both id_number and date.
Outstanding issues:
1. Does everybody agree with the choice of classes, as this strongly
influences the model?
2. Andy raised some good issues about sub-properties / super-properties that
I have yet to resolve:
"I don't think there is a uniform approach because the
original specs aren't uniform in use of a "qualifier".
Looking at vra3:title:
vra3:Title.Variant
could be subProperty of title
its still a title for the work
vra3:Title.Translation
could be subProperty of title
its still a title for the work
but
vra3:Title.Series
Not a subproperty
Would seem preferrable to link to the
"series" description
vra3:Title.LargerEntity
Not a subproperty - this isn't a title for the work"
So my question here is can we solve this using the approach Dave Reynolds
suggested i.e. create a property called qualifier, which has a subproperty
called subproperty, and then use subproperty in the schema in the cases
where a qualifier indicates subproperty relationships, but qualifier in
other cases? If not, what other approaches could we take?
3. The next step is to add a DC mapping, based on the information in the VRA
Core 3.0 specification (see John's slides). I have created a version of the
schema with a naive attempt at mapping to DC, although this has a number of
problems also pointed out by Andy:
"The var3 mappings to DC also need thinking about
vra3:measurements is defined to map to dc:format
measurements.{dimensions,format,resolution}
is about the image (actually about the work or about the image)
vra3:material is defined to dc:format but is about the
substance of the work."
So the next step is to review the Artstor to DC mapping?
I enclose the current versions of the stylesheet, the schema with no DC
mapping (artstor_nodc.rdfs), and the schema with a DC mapping. They can also
be found in the SIMILE CVS.
For anyone who is interested, one way to examine the schemas and the
transform is to
1. download Protege (http://protege.stanford.edu) and the RDF(S) plugin
2. use a transform engine like Saxon (http://saxon.sourceforge.net) to style
the example artstor data to RDF/XML
3. load the schema and the artstor data it Protege.
Note I encountered problems in Protege 1.9 because it couldn't cope with RDF
datatyping, so I removed the datatyping from the styled RDF/XML by hand, for
the height, width and creation_date properties. This is related to Andy's
point about whether we should use datatyping. In terms of the schema /
instance data, I think we are okay using it for Artstor (although the proof
will be in the Artstor instance data) as the artstor XML Schema indicates
these fields are coming from a database where they are datatyped to integers
and dates respectively. However Andy makes a valid point as the VRA Core
spec does not mandate a particular format or datatype for these elements.
One practical problem is not all tools (e.g. Protege) support datatying, so
it may be desirable to omit it - does anybody know if Protege 2.0 supports
datatyping?
Dr Mark H. Butler
Research Scientist HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/
Attachments
- application/octet-stream attachment: artstor.xsl
- application/octet-stream attachment: artstor.rdfs
- application/octet-stream attachment: artstor_nodc.rdfs
Received on Thursday, 9 October 2003 10:44:19 UTC