- From: Gilbert, John <John.Gilbert@hp.com>
- Date: Thu, 14 Aug 2003 16:51:22 +0100
- To: www-rdf-dspace@w3.org
Problems with the AMICO rdf files / schema 1. Basic XML / RDF errors spotted by Andy Seaborne MAA.70.??? needed: A couple of XML fixes i.e. illegal XML tags and é should have been é AIC* needed something but I can't recall what. In sample.rdf I think there was an inconsistence in use of rdf:about versus rdf:ID. I just fed the file through ARP (Jena's RDF/XML parser - the command line program is "jena.rdfparse") and fixed the warnings about relative URIs The other thing needed for all the files is to set the XML base for the files as other wise URIs are file: relative. 2. Some usages of plurals in the class names are inconsistent Books is a subclass of PhysicalObject and from the instance data it looks like it can only contain a single item so surely it should be Book Ditto for Photographs, Paintings, Drawingsandwatercolors. 3. Some properties are composite literals which should be avoided e.g. <creationDate>1930 - 1934</creationDate> <measurementText>33 1/8 x 60 in. (84.1 x 152.4 cm)</measurementText> <dimensions>1024x561</dimensions> <fileSize>1.69 MB</fileSize> <dateLocation>American; 1882-1967</dateLocation> Some proposals for better ways of doing this: <creationDate rdf:parseType="Resource"> <yearStart>1930</yearStart> <yearEnd>1934</yearEnd> </creationDate> <measurement rdf:parseType="Resource"> <widthInch>33 1/8</widthInch> <heightInch>60</heightInch> <widthCm>84.1</widthCm> <heightCm>152.4</heightCm> </measurement> Alternatievely it should be possible to just encode either widthCm or heightCm and then convert them to inches or vice-versa. <dimensions rdf:parseType="Resource"> <widthPixel>1024</widthPixel> <heightPixel>561</heightPixel> </dimensions> dateLocation is used to distinguish between artists of different names by indicating their nationality and year of birth and death e.g. <artist> <Person> <name>Edward Hopper</name> <sortName>Hopper, Edward</sortName> <nationality>American</nationality> <dateLocation>American; 1882-1967</dateLocation> </Person> </artist> so this could be rewritten as follows, removing duplicated information <artist rdf:parseType="Resource"> <name rdf:parseType="Resource"> <firstName>Edward</firstName> <familyName>Hopper</familyName> </name> <biographic rdf:parseType="Resource"> <yearStart>1882</yearStart> <yearEnd>1967</yearEnd> <nationality>American</nationality> </biographic> </artist> 4. The schema places domain constraints on a class, they can only be used on properties: <rdfs:Class rdf:about = "&amns;Period"> <rdfs:label>Period</rdfs:label> <rdfs:domain rdf:resource = "&amns;style" /> </rdfs:Class> this should be the other way round e.g. <rdf:Property rdf:about = "&amns;style"> <!-- STG --> <rdfs:label>style</rdfs:label> <rdfs:domain rdf:resource = "&amns;Period"/> </rdf:Property> 5. There is an error in the schema where they try to use rdf:Book, and of course no such class exists in the RDF namespace e.g. <rdfs:Book rdf:about = "&amns;Books"> <rdfs:label>Books</rdfs:label> <rdfs:subClassOf rdf:resource = "&amns;PhysicalObject" /> </rdfs:Book> this should be <rdfs:Class rdf:about = "&amns;Book"> <rdfs:label>Book</rdfs:label> <rdfs:subClassOf rdf:resource = "&amns;PhysicalObject" /> </rdfs:Class> 6. The property "name" is declared to have a domain of Person, but in the instance data "name" is applied to artifacts also e.g. <rdf:Property rdf:about = "&amns;name"> <rdfs:label>Name</rdfs:label> <rdfs:domain rdf:resource = "&amns;Person" /> </rdf:Property> from sample.rdf: <owner> <Organization> <name>The Art Institute of Chicago</name> <place>Chicago, Illinois, USA</place> 7. The instance data makes use of several classes and properties that are not defined in the schema e.g. haspart, Work, thumbnailImage, Title, variationTitle WMAA.96.209a-uuu.rdf <haspart> <Work rdf:about ... <thumbnailImage rdf:resource = "http://www.w3.org/2002/04/12-amico/orig/thumbs/WMAA.209ajpg" /> WMAA.96.209cc.rdf ... <Books rdf:about = "http://www.w3.org/2002/04/12-amico/data/WMAA.96.209cc"> <Title>Art.... MAA.70.256.rdf ... <variationTitle>Drawing for painting Nighthawks</variationTitle> 8. The data is not fully normalised as some data is repeated in many files e.g. <Person rdf:about = "http://www.amico.org/laf/entities/hopper,_edward"> <name>Edward Hopper</name> <sortName>Hopper, Edward</sortName> <nationality>American</nationality> ... Why not just <Person rdf:resource="..."/> with the definition once? Also, consider the following: <Period> <rdf:value>North and Central America, North America, United States</rdf:value> </Period> And elsewhere: <Period rdf:about = "http://www.amico.org/subject/terms#North_and_Central_America,_North_America ,_United_States"> <description>North and Central America, North America, United States</description> </Period> 9. Why does a 'Period' have geographic information? 10. Due to the striped serialisation of RDF/XML, it is common to see the use of the typed node construction when grouping properties together. However it is not clear if it is always appropriate to use typed nodes in this way. Arguably, if a property can only have a single type as its range, then explicitly typing blank nodes that are the object of the property is unnecessary because the type can be inferred from the property. This allows the serialisation can be simplified, as can the vocabulary namespace, both of which reduce the chance of user error. Note also the description logic community take the approach of inferring type dynamically, rather than making types explicit so omitting type information supports this type of approach. For an example of this see OilEd. Consider <format> <Media> <encoding>TIFF</encoding> <dimensions>1024x561</dimensions> <fileSize>1.69 MB</fileSize> <compression>none</compression> </Media> </format> does this mean there can be other values for format apart from Media? If the Media / format distinction is unnecessary, this could be rewritten as follows: <mediaFormat rdf:parseType="Resource"> <encoding>TIFF</encoding> <dimensions>1024x561</dimensions> <fileSize>1.69 MB</fileSize> <compression>none</compression> </mediaFormat> Some other examples include <artist> <Person> <name>Edward Hopper</name> <sortName>Hopper, Edward</sortName> <nationality>American</nationality> <dateLocation>American; 1882-1967</dateLocation> </Person> </artist> can there be an artist who is not a Person? <style> <Period> <description>North and Central America, North America, United States</description> </Period> </style> can there be a style which is not Period? <creationDate> <Date> <start>19420101</start> <end>19421231</end> </Date> </creationDate> can there be a creationDate which is not a Date? <catalogedBy> <Person> <name>Gregory Tschann</name> </Person> </catalogedBy> can something by catalogedBy something other than a Person? However, there are some occasions where the choice to use this is valid e.g. <owner> <Organization> <name>The Art Institute of Chicago</name> <place>Chicago, Illinois, USA</place> <accessionNumber rdf:resource = "1942.51" /> <credit>The Art Institute of Chicago, Friends of American Art Collection</credit> </Organization> </owner> as presumably owner could be a Person as opposed to an Organization? 11. There is some unnecessary duplication of information because date fields have not been broken up. Often a decision needs to be made about the required granularity of fields. However in some cases information has duplicated so that data can be represented at different levels of granularity e.g. <Validation rdf:about = "amico_AIC_.E13868.TIF"> <describes rdf:resource = "#AIC_.E13868.TIF" /> <validationDate>20000609</validationDate> <validationVersion>1.2</validationVersion> <libraryYear>2000</libraryYear> </Validation> This could be replaced with <Validation rdf:about = "amico_AIC_.E13868.TIF"> <describes rdf:resource = "#AIC_.E13868.TIF" /> <validationDate rdf:parseType="Resource"> <year>2000</year> <month>06</month> <day>09</day> </validationDate> <validationVersion>1.2</validationVersion> </Validation>
Received on Thursday, 14 August 2003 11:52:04 UTC