Re: Newbie frustrations from Richard Newman on 2006-01-04 (semantic-web@w3.org from January 2006)

From: Richard Newman <r.newman@reading.ac.uk>
Date: Wed, 4 Jan 2006 11:47:20 +0000
To: wollman+semantic-web@bimajority.org
Cc: semantic-web@w3.org
Message-Id: <71447E68-64C3-4655-A288-83BBA2F39B09@reading.ac.uk>
> My first difficulty was in how to contain the information explosion.
> In a typical 75-photo gallery, there are 376 distinct resources: one
> index page, 75 thumbnails, and a description page and image file for
> each of two resolutions.  But in the abstract "photo gallery"
> semantics, there are only 151 actual *things*: an index page, possibly
> with extended narrative, 75 photos, and 75 photo captions.

Though you probably want to make a note of the 3 images available for  
each photo, right?

> It's not at all clear how to represent this.

I had a go for iPhoto, and some day I'll finish it off in the context  
of the FRBR.

(See "The author's existing work", [1])

> My first implementation simply output the
> photographer name as a literal in the dc:creator property, but I felt
> like I ought to be able to better.

dc:creator is usually used with literals, so this is quite reasonable.

If you know that all of your names are distinct, you can have two rules:

x dc:creator y .
=> x foaf:maker [ a foaf:Person ; foaf:name y ] .

x foaf:name y .
z foaf:name y .
=> x = z .

and apply inference to produce resource-based FOAFy RDF. Note that  
the second rule is IFP.

> I knew, in particular, that users
> might want to use the foaf vocabulary to describe individuals depicted
> in their photos, so I decided to represent photographers as instances
> of foaf:Person.  The naive approach of using a foaf:Person as a value
> of the dc:creator property failed to represent the important
> underlying expectation that two photographers with the same name in
> the same gallery are the same person.

This is actually quite fair; you haven't expressed the knowledge that  
people with the same name are the same people, and the SW would be  
unwise to make that assumption. (E.g., I know two people named "David  
Green".)

If you know that this is the case, then you can cheat, either through  
rules applied to the RDF, annotating the properties yourself (e.g.,  
making foaf:name inverse functional) or by generating the RDF with  
the correct values.

> Having made an initial proof-of-concept hack, I started to annotate an
> existing photo gallery with metadata, and quickly ran aground.  There
> are four obvious categories of metadata one might be interested in for
> an individual photograph:
>
> a) Technical: how the photo was taken, at what resolution, in what
> orientation, etc.  I am mostly not concerned with this, since it is of
> no value to my application.

Though I do cover it in my iPhoto work, because the information is  
there.

> b) Temporal: when the photo was taken.  This is easy to accomplish and
> the choice of representation is obvious.  (I used dcterms:created and
> represent the date in DTF.)

Good. I necessarily used archivedOn and lastModified properties,  
because those are what iPhoto gave me.

> c) Geographic: where was the photo taken.  This was much more
> difficult; the obvious schemas all took a very computer-oriented
> approach to geocoding, representing locations as grid coordinates --
> information I do not have.  I searched for hours looking for a good
> representation of an ordinary street address (the only kind of
> geographic location I might have access to for my photos) and didn't
> find anything I would describe as "good".

I've done quite a bit of work on this. [2]

I've ended up doing something like this (probably abusing the  
contact: vocab now I think of it, but hey). The vast array of types  
at the beginning are for interoperability's sake.

:DanaStreetRoastingCompany
   :placename "Dana Street Roasting Company" ;

   a cyc:SpatialThing-Localized,
     whois:Place,
     cyc2:SpatialThing-Localized,
     geo:SpatialThing ;

   rdfs:seeAlso <http://www.live.com/danastreet/> ,
                <http://www.metroactive.com/papers/metro/07.18.02/ 
bars-danastroasting-0229.html> ;
   rdfs:comment "Opening hours: 6:30am-10pm Mon-Thu, 6:30am-11pm Fri,  
8am-11pm Sat, 8am-6pm Sun" ;
   contact:street [ a contact:StreetAddress ;
     contact:streetName "W. Dana St." ;
     contact:number "755" ] ;
   contact:phone "+1 650 390 9638" ;
   contact:city "Mountain View" ;
   contact:stateOrProvince "California" ;
   contact:postalCode "94041-1304" ;
   geo:lat "37.392374" ; geo:long "-122.078919" ;
   rdfs:seeAlso <http://maps.google.com/maps?q=744+W+Dana+St,+Mountain 
+View,+CA,+94041-1304> ;
   contact:country "USA" .

If I were making my own vocabulary, the phone number would be a tel:  
URI.

> d) Subject matter: what is this a photo of?  It seems that here I have
> to develop my own ontology, since I don't tend to take photos of
> airports and only rarely take photos of people (where foaf provides
> everything that's required).

<http://xmlns.com/foaf/0.1/#term_depicts>

"The foaf:depicts property is a relationship between a foaf:Image and  
something that the image depicts."

The range is owl:Thing, so you can use foaf:depicts for any object.

> I take pictures of radio towers.  Most towers have a number, assigned
> by the FCC, but some don't.  All I wanted was a way to represent
> "photo X shows tower Y" in a way which would allow me to answer
> queries like "show me all the photos of tower Y in temporal order".
> It shouldn't be this hard!

photo:x foaf:depicts tower:y .
tower:y dc:identifier "some fcc number" .   # gross abuse of  
dc:identifier!

SPARQL to select thumbnails for all photos of a tower with identifier  
"1234", sorted by date:

SELECT ?xthumb WHERE {
   ?y dc:identifier "1234" .
   ?x foaf:depicts ?y .
   ?x photo:thumbnail ?xthumb .
   ?x photo:archivedOn ?date .
}
ORDER BY ?date

> Sorry for the long-winded rant.  Am I being unreasonable or is it
> really expected to be this difficult?

It's not meant to be easy -- you're firstly having to properly model  
the domain, rather than implicitly embed the semantics in arbitrarily- 
structured XML, and you're also trying to extract some of those  
hidden semantics (such as name uniqueness). Moreover, you're trying  
to do it in a way that other people can reuse.

Hope this helps, though!

-R


[1] <http://www.holygoat.co.uk/projects/images/>
[2] <http://www.holygoat.co.uk/blog/entry/2005-08-11-3>
Received on Wednesday, 4 January 2006 11:47:27 UTC