Problems with the AMICO rdf files / schema

Problems with the AMICO rdf files / schema

1. Basic XML / RDF errors spotted by Andy Seaborne

MAA.70.??? needed:
A couple of XML fixes  i.e. illegal XML tags and &#233 should have been
é 

AIC* needed something but I can't recall what.

In sample.rdf I think there was an inconsistence in use of  rdf:about versus
rdf:ID.  I just fed the file through ARP (Jena's RDF/XML parser - the
command line program is "jena.rdfparse") and fixed the warnings about
relative URIs

The other thing needed for all the files is to set the XML base for the
files as other wise URIs are file: relative.

2. Some usages of plurals in the class names are inconsistent

Books is a subclass of PhysicalObject and from the instance data it looks
like it can only contain a single item so surely it should be Book Ditto for
Photographs, Paintings, Drawingsandwatercolors.

3. Some properties are composite literals which should be avoided e.g.

<creationDate>1930 - 1934</creationDate>

<measurementText>33 1/8 x 60 in. (84.1 x 152.4 cm)</measurementText>

 <dimensions>1024x561</dimensions>

 <fileSize>1.69 MB</fileSize>

 <dateLocation>American; 1882-1967</dateLocation>

Some proposals for better ways of doing this:

<creationDate rdf:parseType="Resource">
  <yearStart>1930</yearStart>
  <yearEnd>1934</yearEnd>
</creationDate>

<measurement rdf:parseType="Resource">
   <widthInch>33 1/8</widthInch>
   <heightInch>60</heightInch>
   <widthCm>84.1</widthCm>
   <heightCm>152.4</heightCm>
</measurement>

Alternatievely it should be possible to just encode either widthCm or
heightCm and then convert them to inches or vice-versa. 

<dimensions rdf:parseType="Resource">
  <widthPixel>1024</widthPixel>
  <heightPixel>561</heightPixel>
</dimensions>

dateLocation is used to distinguish between artists of different names by
indicating their nationality and year of birth and death e.g. 

  <artist>
    <Person>
      <name>Edward Hopper</name>
      <sortName>Hopper, Edward</sortName>
      <nationality>American</nationality>
      <dateLocation>American; 1882-1967</dateLocation>
    </Person>
  </artist>   

so this could be rewritten as follows, removing duplicated information

  <artist rdf:parseType="Resource">
      <name rdf:parseType="Resource">
            <firstName>Edward</firstName> <familyName>Hopper</familyName>
      </name>
      <biographic rdf:parseType="Resource">
          <yearStart>1882</yearStart>
          <yearEnd>1967</yearEnd>
          <nationality>American</nationality>
       </biographic>
  </artist>   

4. The schema places domain constraints on a class, they can only be used on
properties:

<rdfs:Class rdf:about = "&amns;Period">
  <rdfs:label>Period</rdfs:label>
  <rdfs:domain rdf:resource = "&amns;style" />
</rdfs:Class>

this should be the other way round e.g.

<rdf:Property rdf:about = "&amns;style">  <!-- STG -->
  <rdfs:label>style</rdfs:label>
  <rdfs:domain rdf:resource = "&amns;Period"/>
</rdf:Property>

5. There is an error in the schema where they try to use rdf:Book, and of
course no such class exists in the RDF namespace e.g. 

<rdfs:Book rdf:about = "&amns;Books">
  <rdfs:label>Books</rdfs:label>
  <rdfs:subClassOf rdf:resource = "&amns;PhysicalObject" /> </rdfs:Book>

this should be

<rdfs:Class rdf:about = "&amns;Book">
  <rdfs:label>Book</rdfs:label>
  <rdfs:subClassOf rdf:resource = "&amns;PhysicalObject" /> </rdfs:Class>

6. The property "name" is declared to have a domain of Person, but in the
instance data "name" is applied to artifacts also e.g. 

<rdf:Property rdf:about = "&amns;name">
  <rdfs:label>Name</rdfs:label>
  <rdfs:domain rdf:resource = "&amns;Person" />
</rdf:Property>

from sample.rdf:
  <owner>
    <Organization>
      <name>The Art Institute of Chicago</name>
      <place>Chicago, Illinois, USA</place>

7. The instance data makes use of several classes and properties that are
not defined in the schema e.g. haspart, Work, thumbnailImage, Title,
variationTitle

WMAA.96.209a-uuu.rdf

<haspart>
  <Work rdf:about ...

<thumbnailImage rdf:resource =
"http://www.w3.org/2002/04/12-amico/orig/thumbs/WMAA.209ajpg" />

WMAA.96.209cc.rdf
...
<Books rdf:about = "http://www.w3.org/2002/04/12-amico/data/WMAA.96.209cc">
  <Title>Art....

MAA.70.256.rdf
...
 <variationTitle>Drawing for painting Nighthawks</variationTitle>

8. The data is not fully normalised as some data is repeated in many files
e.g. 

  <Person rdf:about = "http://www.amico.org/laf/entities/hopper,_edward">
      <name>Edward Hopper</name>
      <sortName>Hopper, Edward</sortName>
      <nationality>American</nationality>
      ...

Why not just <Person rdf:resource="..."/> with the definition once?

Also, consider the following:

<Period>
 <rdf:value>North and Central America, North America, United
States</rdf:value> </Period>

And elsewhere:

<Period rdf:about =
"http://www.amico.org/subject/terms#North_and_Central_America,_North_America
,_United_States">
  <description>North and Central America, North America, United
States</description> </Period>

9. Why does a 'Period' have geographic information?

10. Due to the striped serialisation of RDF/XML, it is common to see the use
of the typed node construction when grouping properties together. However it
is not clear if it is always appropriate to use typed nodes in this way.
Arguably, if a property can only have a single type as its range, then
explicitly typing blank nodes that are the object of the property is
unnecessary because the type can be inferred from the property. This allows
the serialisation can be simplified, as can the vocabulary namespace, both
of which reduce the chance of user error. 

Note also the description logic community take the approach of inferring
type dynamically, rather than making types explicit so omitting type
information supports this type of approach. For an example of this see
OilEd. 

Consider 

  <format>
    <Media>
      <encoding>TIFF</encoding>
      <dimensions>1024x561</dimensions>
      <fileSize>1.69 MB</fileSize>
      <compression>none</compression>
    </Media>
  </format>

does this mean there can be other values for format apart from Media? If the
Media / format distinction is unnecessary, this could be rewritten as
follows:

  <mediaFormat rdf:parseType="Resource">
      <encoding>TIFF</encoding>
      <dimensions>1024x561</dimensions>
      <fileSize>1.69 MB</fileSize>
      <compression>none</compression>
  </mediaFormat>

Some other examples include

  <artist>
    <Person>
      <name>Edward Hopper</name>
      <sortName>Hopper, Edward</sortName>
      <nationality>American</nationality>
      <dateLocation>American; 1882-1967</dateLocation>
    </Person>
  </artist>

can there be an artist who is not a Person?

  <style>
    <Period>
      <description>North and Central America, North America, United
States</description>
    </Period>
  </style>

can there be a style which is not Period?

  <creationDate>
    <Date>
      <start>19420101</start>
      <end>19421231</end>
    </Date>
  </creationDate>

can there be a creationDate which is not a Date?

  <catalogedBy> 
    <Person>
      <name>Gregory Tschann</name>
    </Person>
  </catalogedBy>

can something by catalogedBy something other than a Person?

However, there are some occasions where the choice to use this is valid e.g.

  <owner>
    <Organization>
      <name>The Art Institute of Chicago</name>
      <place>Chicago, Illinois, USA</place>
      <accessionNumber rdf:resource = "1942.51" />
      <credit>The Art Institute of Chicago, Friends of American Art
Collection</credit>
    </Organization>
  </owner>

as presumably owner could be a Person as opposed to an Organization?

11. There is some unnecessary duplication of information because date fields
have not been broken up. Often a decision needs to be made about the
required granularity of fields. However in some cases information has
duplicated so that data can be represented at different levels of
granularity e.g.

<Validation rdf:about = "amico_AIC_.E13868.TIF">
   <describes rdf:resource = "#AIC_.E13868.TIF" />
   <validationDate>20000609</validationDate>
   <validationVersion>1.2</validationVersion>
   <libraryYear>2000</libraryYear>
</Validation>

This could be replaced with

<Validation rdf:about = "amico_AIC_.E13868.TIF">
   <describes rdf:resource = "#AIC_.E13868.TIF" />
    <validationDate rdf:parseType="Resource">
      <year>2000</year>
      <month>06</month>
      <day>09</day>
   </validationDate>
   <validationVersion>1.2</validationVersion>
</Validation>

 

Received on Thursday, 14 August 2003 11:52:04 UTC