- From: Butler, Mark <Mark_Butler@hplb.hpl.hp.com>
- Date: Thu, 9 Oct 2003 15:41:30 +0100
- To: www-rdf-dspace@w3.org
- Message-ID: <E864E95CB35C1C46B72FEA0626A2E8082061B0@0-mail-br1.hpl.hp.com>
Hi team, in prep for the telecon today. Here is an overview of the design decisions I made when creating the stylesheet and schema for the Artstor data. Artstor.xsd describes records that contain the following nested elements: Image, MediaFiles, MediaFile, Collection, MetaData, Relation, Source, Style_Period, Creator, Date, Description, ID_Number, Location, Title, Material, Measurements, Subject. The first problem was that Relation, Source, Style_Period, Creator, Date, Description, ID_Number, Location, Title, Material, Measurements and Subject are mixed mode i.e. they can contain text or other elements e.g. <Creator> <Personal_Name>Leonardo,da Vinci,1452-1519</Personal_Name> <Corporate_Name>School of Leonardo</Corporate_Name> </Creator> or <Creator>Leonardo,da Vinci,1452-1519</Creator> are both acceptable. In fact, they can even do both at the same time, as does <Subject> in the sample instance data <Subject>Portraits <Geographic>Italy</Geographic> <Topic>Drawing</Topic> </Subject> So the XSLT stylesheet had to use some tricks to get around this. Also the schema will need to define properties in addition to classes where a mixed mode nested element is converted to a class, e.g. a Creator element with nested elements is an instance of the Creator class, but if it contains text as well this is represented as a creator property. The second problem was deciding if the nested elements indicated superproperties, classes or context - see earlier email. I decided that - Image, MediaFiles/MediaFile, Collection, Relation and Creator are classes. - Source is a superproperty used as a context e.g. there are two possible instances of the Location element, one which is a nested element occurring in MetaData, the other which is a text element occurring in Source. So I inferred these two uses of Location are different, so Source is being used here as a context. As there are only two uses of Location, decided to replace <Source><Location> with a property called sourceLocation. - MetaData, Style_Period, Date, Description, ID_Number, Title, Material, Measurements and Subject are super-properties. As they are super-properties, it is possible to flatten out their subproperties they contain (see Andy's note about why this is desirable) so for example instead of Style_Period being a property of MetaData, it's a property of the Image class, which is a subproperty of MetaData. One issue here is both ID_Number and Date can contain the elements Current_Repository and Former_Repository. However the meaning of these elements does not seem to change based on the nesting, so this was resolved by creating two properties called current_repository and former_repository that are subproperties of both id_number and date. Outstanding issues: 1. Does everybody agree with the choice of classes, as this strongly influences the model? 2. Andy raised some good issues about sub-properties / super-properties that I have yet to resolve: "I don't think there is a uniform approach because the original specs aren't uniform in use of a "qualifier". Looking at vra3:title: vra3:Title.Variant could be subProperty of title its still a title for the work vra3:Title.Translation could be subProperty of title its still a title for the work but vra3:Title.Series Not a subproperty Would seem preferrable to link to the "series" description vra3:Title.LargerEntity Not a subproperty - this isn't a title for the work" So my question here is can we solve this using the approach Dave Reynolds suggested i.e. create a property called qualifier, which has a subproperty called subproperty, and then use subproperty in the schema in the cases where a qualifier indicates subproperty relationships, but qualifier in other cases? If not, what other approaches could we take? 3. The next step is to add a DC mapping, based on the information in the VRA Core 3.0 specification (see John's slides). I have created a version of the schema with a naive attempt at mapping to DC, although this has a number of problems also pointed out by Andy: "The var3 mappings to DC also need thinking about vra3:measurements is defined to map to dc:format measurements.{dimensions,format,resolution} is about the image (actually about the work or about the image) vra3:material is defined to dc:format but is about the substance of the work." So the next step is to review the Artstor to DC mapping? I enclose the current versions of the stylesheet, the schema with no DC mapping (artstor_nodc.rdfs), and the schema with a DC mapping. They can also be found in the SIMILE CVS. For anyone who is interested, one way to examine the schemas and the transform is to 1. download Protege (http://protege.stanford.edu) and the RDF(S) plugin 2. use a transform engine like Saxon (http://saxon.sourceforge.net) to style the example artstor data to RDF/XML 3. load the schema and the artstor data it Protege. Note I encountered problems in Protege 1.9 because it couldn't cope with RDF datatyping, so I removed the datatyping from the styled RDF/XML by hand, for the height, width and creation_date properties. This is related to Andy's point about whether we should use datatyping. In terms of the schema / instance data, I think we are okay using it for Artstor (although the proof will be in the Artstor instance data) as the artstor XML Schema indicates these fields are coming from a database where they are datatyped to integers and dates respectively. However Andy makes a valid point as the VRA Core spec does not mandate a particular format or datatype for these elements. One practical problem is not all tools (e.g. Protege) support datatying, so it may be desirable to omit it - does anybody know if Protege 2.0 supports datatyping? Dr Mark H. Butler Research Scientist HP Labs Bristol mark-h_butler@hp.com Internet: http://www-uk.hpl.hp.com/people/marbut/
Attachments
- application/octet-stream attachment: artstor.xsl
- application/octet-stream attachment: artstor.rdfs
- application/octet-stream attachment: artstor_nodc.rdfs
Received on Thursday, 9 October 2003 10:44:19 UTC