Re: Southampton Pub data as linked open data

Chris,

I'll try to answer some of the questions.

On 28 Jul 2008, at 13:18, Chris Wallace wrote:

> Thanks John for this resource - It inspires me to help my students  
> to do a similar data collection exercise in Bristol!
>
> A few things puzzle me though, probably as a newcomer to this field.  
> I'm in the process of RDFing our faculty data so these issues are  
> taxing me too.
>
> 1) The resource URI eg. http://www.johngoodwin.me.uk/pubs/id/pub1
>
> is not humanly readable. Is this considered to be a problem?  For  
> example DBPedia would be I think be less valuable with system- 
> generated resource ids, even though natural resource ids require a  
> mechanism for disambiguation.

Human-readable unique identifiers are nice, but the exception. It's  
true that DBpedia would be less valuable without the human-readable  
IDs, but DBpedia piggy-banks on Wikipedia's identifier scheme, which  
is maintained by an army of volunteers. At the end of the day,  
uniqueness is more important than human-readable. If the unique  
identifiers in your original data source are not human-readable, and  
you don't have the resources to curate a new identifier scheme, then  
using a numeric scheme is better than not publishing the data at all...

> 2) The pub name has been re-formatting to catalogue order, but pub  
> names are proper nouns and I'd be laughed at if I asked the way to  
> "Alexandra, The".  Perhaps both forms could be included with a  
> different tag for the catalog format if it is not computable from  
> the natural name.

I don't see why pub names are different from movie names, artist  
names, or book names, all of which can often be found reformatted in  
this way.

>  3) Why have both rdfs:label and pub:name since they seem to have  
> the same content?

Generic RDF tools (which do not know about the pub vocabulary) often  
use rdfs:label for display/headline purposes. So if your domain- 
specific vocabular has its own vocabulary, it might be a good idea to  
add both. In an ideal world, John would declare pub:name a subproperty  
of rdfs:label, and the tools would infer the rdfs:label value... But  
most clients don't do that yet.

> 4) I feel uncomfortable with the non-uniform representation of the  
> address - partly with domain specific-tags pub:street and  
> pub:postcode, partly with a company-specific (and non-humanly  
> decipherable) URI.  I know that this is a can of worms e.g.http://xml.coverpages.org/namesAndAddresses.html#eccma 
>  and I can’t find a suitable address vocabulary but this mixture  
> doesn’t look very satisfactory.

If only we could finally agree on *one* vCard-in-RDF vocabulary...

> 5) pub:dateSurveyed:  isn’t this  just the date at which the  
> description was authored (if not when it was entered into this  
> format) i.e. dc:date

dc:date could mean many things: when the pub was surveyed, when the  
RDF document was published, when the pub was opened... Using  
pub:dateSurveyed makes the meaning clear to the user of the data.

Best,
Richard



> 6) Generally , these seem such general properties of any place that   
> I'm surprised that any local vocabulary is needed at all, given that  
> no data is actually domain specific (like a list of beers served).
>
> This case study seems a great example of the issues in vocabulary  
> and resource reuse. It would be interesting to compare the different  
> solutions which different analysts would use to represent this  
> data.  Perhaps something like it would be a good exercise for the  
> Oxford VoCamp?
>
> Chris
>
>
> Chris Wallace
> Senior Lecturer
> Department of Information Science and Digital Media
> University of the West of England, Bristol
>
>
>
> This email was independently scanned for viruses by McAfee anti- 
> virus software and none were found
>

Received on Monday, 28 July 2008 14:24:07 UTC