Re: Newbie frustrations from Richard Cyganiak on 2006-01-04 (semantic-web@w3.org from January 2006)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 4 Jan 2006 18:25:18 +0100
To: wollman+semantic-web@bimajority.org
Cc: semantic-web@w3.org
Message-Id: <51FEB529-BE98-4669-8546-7245637AE5FF@cyganiak.de>
Hi,

Here's another thought about your points c) and d) and the dc:creator/ 
foaf:Person issue: You say you want to add metadata to "assist a  
search application". That's pretty vague. What kind of functionality  
do you want to enable?

    A web page with a list of all radio towers, with the photos of
    each? A Google Maps mashup which shows the location of every
    photo you've taken? A route planning app that lets you find the
    quickest way to get to each tower? An RDF aggregator that pulls
    the FOAF profiles of people shown on your photos into your
    system and displays their data below the photo? The same thing
    for the photographer? RSS feeds for each tower, photographer,
    depicted person, US state? Do you want a timeline of your
    photos? Automatic uploading into Flickr? What else?

Everything becomes easier if you can answer this.

1. Figure out what functionality your metadata is supposed to enable

2. Figure out what information needs to be collected if you wanted to  
build this functionality in your own app

3. Pick or design RDF vocabularies to expose the information to third  
parties in a way that allows them to build it in a straightforward way.

I find that this kind of thinking helps to "ground" the whole  
modelling and design process.

To summarize: RDF is about machine-processable metadata. Modelling  
machine-processable metadata without knowing what the target machine  
is supposed to do is *hard*.

Richard


On 3 Jan 2006, at 23:48, wollman+semantic-web@bimajority.org wrote:

>
> I have a small application which I use to generate photo galleries for
> my Web site.  I've been meaning for some time to add some semantic
> metadata to the galleries for some time, as having that information
> would greatly assist a search application.  I thought this would be
> easy to accomplish -- the gallery structure and exposition are already
> in an XML representation of my own devising, and the individual pages
> in the gallery are generated using make and XSLT; it would not be
> difficult to add a <metadata> element to the DTD and write another
> XSLT script to extract the properties of each image and format the
> result as RDF/XML.
>
> It turned out to be far more difficult than I had expected.
>
> My photo galleries are fairly typical affairs: for each gallery, there
> is an index page, with a description of the gallery as a whole.  Each
> photo is provided in multiple resolutions, and for each pair (photo
> number, resolution) there is a photo description page in HTML which
> embeds the photo and contains navigational links.
>
> My first difficulty was in how to contain the information explosion.
> In a typical 75-photo gallery, there are 376 distinct resources: one
> index page, 75 thumbnails, and a description page and image file for
> each of two resolutions.  But in the abstract "photo gallery"
> semantics, there are only 151 actual *things*: an index page, possibly
> with extended narrative, 75 photos, and 75 photo captions.  It's not
> at all clear how to represent this.  I ended up (after trying several
> alternatives that were unsatisfactory) representing each abstract
> photo and each abstract caption as named blank nodes, with a
> dcterms:hasFormat property indicating each of the available sizes.  (I
> never figured out how to use rdf:Bag or rdf:Alt for this last bit, so
> each hasFormat property is written separately.)  Then the abstract
> photo can refer to the abstract caption for its dc:description
> property, and at least some of the implicit semantic structure is made
> explicit.
>
> The next problem was also an information explosion, and I see from the
> archives of this list that it's a well-traveled road.  Each photo
> description in my source file has a photographer attribute.
> Originally, this was only used to automatically generate copyright
> notices on each page, so all I have is the name of the photographer in
> conversational order.  This is an obvious candidate for inclusion in
> the gallery metadata.  My first implementation simply output the
> photographer name as a literal in the dc:creator property, but I felt
> like I ought to be able to better.  I knew, in particular, that users
> might want to use the foaf vocabulary to describe individuals depicted
> in their photos, so I decided to represent photographers as instances
> of foaf:Person.  The naive approach of using a foaf:Person as a value
> of the dc:creator property failed to represent the important
> underlying expectation that two photographers with the same name in
> the same gallery are the same person.  So again I used the
> named-blank-node approach (which obviously only works because I
> keep the metadata for the whole gallery in a single document).  But in
> this case it's rather less than satisfactory; I was forced to use the
> generate-id() XPath function to create the names, which means that the
> author of the gallery has no way of adding additional properties to
> one of these automatically-generated foaf:Person instances.  I may end
> up removing this function entirely and requiring the user to handle
> this herself -- which I already do for the textual elements, since
> my DTD doesn't represent authorship of the descriptions.
>
> Having made an initial proof-of-concept hack, I started to annotate an
> existing photo gallery with metadata, and quickly ran aground.  There
> are four obvious categories of metadata one might be interested in for
> an individual photograph:
>
> a) Technical: how the photo was taken, at what resolution, in what
> orientation, etc.  I am mostly not concerned with this, since it is of
> no value to my application.
>
> b) Temporal: when the photo was taken.  This is easy to accomplish and
> the choice of representation is obvious.  (I used dcterms:created and
> represent the date in DTF.)
>
> c) Geographic: where was the photo taken.  This was much more
> difficult; the obvious schemas all took a very computer-oriented
> approach to geocoding, representing locations as grid coordinates --
> information I do not have.  I searched for hours looking for a good
> representation of an ordinary street address (the only kind of
> geographic location I might have access to for my photos) and didn't
> find anything I would describe as "good".
>
> d) Subject matter: what is this a photo of?  It seems that here I have
> to develop my own ontology, since I don't tend to take photos of
> airports and only rarely take photos of people (where foaf provides
> everything that's required).
>
> I take pictures of radio towers.  Most towers have a number, assigned
> by the FCC, but some don't.  All I wanted was a way to represent
> "photo X shows tower Y" in a way which would allow me to answer
> queries like "show me all the photos of tower Y in temporal order".
> It shouldn't be this hard!
>
> Sorry for the long-winded rant.  Am I being unreasonable or is it
> really expected to be this difficult?
>
> -GAWollman
>
>
>
>
Received on Wednesday, 4 January 2006 17:25:39 UTC