Re: Problems with the AMICO rdf files / schema

Are we anticipating that the Amico data will be available in other 
formats, or are we constrained to make the best use we can of the data 
that is available?   As a rule I think most users fall into the latter 
category although that may not apply in this case. 

I mapped the schema as defined in the amico.rdfs file and put both a PDF 
and an editable .dia file into the CVS repository that shows all of the 
relationships that are specified there.   I'd be interested in 
comments/updates to that diagram if anyone knows more about these 
files.  If we modify the rdfs file, or the 'amico2rdf.pl' script and 
rerun it to address your concerns I'd like to try to keep the diagram in 
synch with those changes.

Cheers,
-kls

Gilbert, John wrote:

>Problems with the AMICO rdf files / schema
>
>1. Basic XML / RDF errors spotted by Andy Seaborne
>
>MAA.70.??? needed:
>A couple of XML fixes  i.e. illegal XML tags and &#233 should have been
>é 
>
>AIC* needed something but I can't recall what.
>
>In sample.rdf I think there was an inconsistence in use of  rdf:about versus
>rdf:ID.  I just fed the file through ARP (Jena's RDF/XML parser - the
>command line program is "jena.rdfparse") and fixed the warnings about
>relative URIs
>
>The other thing needed for all the files is to set the XML base for the
>files as other wise URIs are file: relative.
>
>2. Some usages of plurals in the class names are inconsistent
>
>Books is a subclass of PhysicalObject and from the instance data it looks
>like it can only contain a single item so surely it should be Book Ditto for
>Photographs, Paintings, Drawingsandwatercolors.
>
>3. Some properties are composite literals which should be avoided e.g.
>
><creationDate>1930 - 1934</creationDate>
>
><measurementText>33 1/8 x 60 in. (84.1 x 152.4 cm)</measurementText>
>
> <dimensions>1024x561</dimensions>
>
> <fileSize>1.69 MB</fileSize>
>
> <dateLocation>American; 1882-1967</dateLocation>
>
>Some proposals for better ways of doing this:
>
><creationDate rdf:parseType="Resource">
>  <yearStart>1930</yearStart>
>  <yearEnd>1934</yearEnd>
></creationDate>
>
><measurement rdf:parseType="Resource">
>   <widthInch>33 1/8</widthInch>
>   <heightInch>60</heightInch>
>   <widthCm>84.1</widthCm>
>   <heightCm>152.4</heightCm>
></measurement>
>
>Alternatievely it should be possible to just encode either widthCm or
>heightCm and then convert them to inches or vice-versa. 
>
><dimensions rdf:parseType="Resource">
>  <widthPixel>1024</widthPixel>
>  <heightPixel>561</heightPixel>
></dimensions>
>
>dateLocation is used to distinguish between artists of different names by
>indicating their nationality and year of birth and death e.g. 
>
>  <artist>
>    <Person>
>      <name>Edward Hopper</name>
>      <sortName>Hopper, Edward</sortName>
>      <nationality>American</nationality>
>      <dateLocation>American; 1882-1967</dateLocation>
>    </Person>
>  </artist>   
>
>so this could be rewritten as follows, removing duplicated information
>
>  <artist rdf:parseType="Resource">
>      <name rdf:parseType="Resource">
>            <firstName>Edward</firstName> <familyName>Hopper</familyName>
>      </name>
>      <biographic rdf:parseType="Resource">
>          <yearStart>1882</yearStart>
>          <yearEnd>1967</yearEnd>
>          <nationality>American</nationality>
>       </biographic>
>  </artist>   
>
>4. The schema places domain constraints on a class, they can only be used on
>properties:
>
><rdfs:Class rdf:about = "&amns;Period">
>  <rdfs:label>Period</rdfs:label>
>  <rdfs:domain rdf:resource = "&amns;style" />
></rdfs:Class>
>
>this should be the other way round e.g.
>
><rdf:Property rdf:about = "&amns;style">  <!-- STG -->
>  <rdfs:label>style</rdfs:label>
>  <rdfs:domain rdf:resource = "&amns;Period"/>
></rdf:Property>
>
>5. There is an error in the schema where they try to use rdf:Book, and of
>course no such class exists in the RDF namespace e.g. 
>
><rdfs:Book rdf:about = "&amns;Books">
>  <rdfs:label>Books</rdfs:label>
>  <rdfs:subClassOf rdf:resource = "&amns;PhysicalObject" /> </rdfs:Book>
>
>this should be
>
><rdfs:Class rdf:about = "&amns;Book">
>  <rdfs:label>Book</rdfs:label>
>  <rdfs:subClassOf rdf:resource = "&amns;PhysicalObject" /> </rdfs:Class>
>
>6. The property "name" is declared to have a domain of Person, but in the
>instance data "name" is applied to artifacts also e.g. 
>
><rdf:Property rdf:about = "&amns;name">
>  <rdfs:label>Name</rdfs:label>
>  <rdfs:domain rdf:resource = "&amns;Person" />
></rdf:Property>
>
>from sample.rdf:
>  <owner>
>    <Organization>
>      <name>The Art Institute of Chicago</name>
>      <place>Chicago, Illinois, USA</place>
>
>7. The instance data makes use of several classes and properties that are
>not defined in the schema e.g. haspart, Work, thumbnailImage, Title,
>variationTitle
>
>WMAA.96.209a-uuu.rdf
>
><haspart>
>  <Work rdf:about ...
>
><thumbnailImage rdf:resource =
>"http://www.w3.org/2002/04/12-amico/orig/thumbs/WMAA.209ajpg" />
>
>WMAA.96.209cc.rdf
>...
><Books rdf:about = "http://www.w3.org/2002/04/12-amico/data/WMAA.96.209cc">
>  <Title>Art....
>
>MAA.70.256.rdf
>...
> <variationTitle>Drawing for painting Nighthawks</variationTitle>
>
>8. The data is not fully normalised as some data is repeated in many files
>e.g. 
>
>  <Person rdf:about = "http://www.amico.org/laf/entities/hopper,_edward">
>      <name>Edward Hopper</name>
>      <sortName>Hopper, Edward</sortName>
>      <nationality>American</nationality>
>      ...
>
>Why not just <Person rdf:resource="..."/> with the definition once?
>
>Also, consider the following:
>
><Period>
> <rdf:value>North and Central America, North America, United
>States</rdf:value> </Period>
>
>And elsewhere:
>
><Period rdf:about =
>"http://www.amico.org/subject/terms#North_and_Central_America,_North_America
>,_United_States">
>  <description>North and Central America, North America, United
>States</description> </Period>
>
>9. Why does a 'Period' have geographic information?
>
>10. Due to the striped serialisation of RDF/XML, it is common to see the use
>of the typed node construction when grouping properties together. However it
>is not clear if it is always appropriate to use typed nodes in this way.
>Arguably, if a property can only have a single type as its range, then
>explicitly typing blank nodes that are the object of the property is
>unnecessary because the type can be inferred from the property. This allows
>the serialisation can be simplified, as can the vocabulary namespace, both
>of which reduce the chance of user error. 
>
>Note also the description logic community take the approach of inferring
>type dynamically, rather than making types explicit so omitting type
>information supports this type of approach. For an example of this see
>OilEd. 
>
>Consider 
>
>  <format>
>    <Media>
>      <encoding>TIFF</encoding>
>      <dimensions>1024x561</dimensions>
>      <fileSize>1.69 MB</fileSize>
>      <compression>none</compression>
>    </Media>
>  </format>
>
>does this mean there can be other values for format apart from Media? If the
>Media / format distinction is unnecessary, this could be rewritten as
>follows:
>
>  <mediaFormat rdf:parseType="Resource">
>      <encoding>TIFF</encoding>
>      <dimensions>1024x561</dimensions>
>      <fileSize>1.69 MB</fileSize>
>      <compression>none</compression>
>  </mediaFormat>
>
>Some other examples include
>
>  <artist>
>    <Person>
>      <name>Edward Hopper</name>
>      <sortName>Hopper, Edward</sortName>
>      <nationality>American</nationality>
>      <dateLocation>American; 1882-1967</dateLocation>
>    </Person>
>  </artist>
>
>can there be an artist who is not a Person?
>
>  <style>
>    <Period>
>      <description>North and Central America, North America, United
>States</description>
>    </Period>
>  </style>
>
>can there be a style which is not Period?
>
>  <creationDate>
>    <Date>
>      <start>19420101</start>
>      <end>19421231</end>
>    </Date>
>  </creationDate>
>
>can there be a creationDate which is not a Date?
>
>  <catalogedBy> 
>    <Person>
>      <name>Gregory Tschann</name>
>    </Person>
>  </catalogedBy>
>
>can something by catalogedBy something other than a Person?
>
>However, there are some occasions where the choice to use this is valid e.g.
>
>  <owner>
>    <Organization>
>      <name>The Art Institute of Chicago</name>
>      <place>Chicago, Illinois, USA</place>
>      <accessionNumber rdf:resource = "1942.51" />
>      <credit>The Art Institute of Chicago, Friends of American Art
>Collection</credit>
>    </Organization>
>  </owner>
>
>as presumably owner could be a Person as opposed to an Organization?
>
>11. There is some unnecessary duplication of information because date fields
>have not been broken up. Often a decision needs to be made about the
>required granularity of fields. However in some cases information has
>duplicated so that data can be represented at different levels of
>granularity e.g.
>
><Validation rdf:about = "amico_AIC_.E13868.TIF">
>   <describes rdf:resource = "#AIC_.E13868.TIF" />
>   <validationDate>20000609</validationDate>
>   <validationVersion>1.2</validationVersion>
>   <libraryYear>2000</libraryYear>
></Validation>
>
>This could be replaced with
>
><Validation rdf:about = "amico_AIC_.E13868.TIF">
>   <describes rdf:resource = "#AIC_.E13868.TIF" />
>    <validationDate rdf:parseType="Resource">
>      <year>2000</year>
>      <month>06</month>
>      <day>09</day>
>   </validationDate>
>   <validationVersion>1.2</validationVersion>
></Validation>
>
> 
>  
>


-- 
========================================================
   Kevin Smathers                kevin.smathers@hp.com    
   Hewlett-Packard               kevin@ank.com            
   Palo Alto Research Lab                                 
   1501 Page Mill Rd.            650-857-4477 work        
   M/S 1135                      650-852-8186 fax         
   Palo Alto, CA 94304           510-247-1031 home        
========================================================
use "Standard::Disclaimer";
carp("This message was printed on 100% recycled bits.");

Received on Thursday, 14 August 2003 12:29:15 UTC