W3C home > Mailing lists > Public > public-lod@w3.org > August 2009

RE: Top three levels of Dewey Decimal Classification published as linked data

From: Panzer,Michael <panzerm@oclc.org>
Date: Thu, 20 Aug 2009 16:50:09 -0400
Message-ID: <AA3DCFAA4E87BD40BBAA507B1C36CC3D0293ADF3@OAEXCH4SERVER.oa.oclc.org>
To: "Ed Summers" <ehs@pobox.com>
Cc: <public-lod@w3.org>
Hi Ed 

> I haven't fully read the wiki page yet, so I apologize if 
> this question is already answered there. I was wondering why 
> you chose to mint multiple URIs for the same concept in 
> different languages. 

[...]
 
> I kind of expected the assertions to hang off of a language 
> and version agnostic URI, with perhaps dct:hasVersion links 
> to previous versions.
> 
> <http://dewey.info/class/641/>
>     cc:attributionName "OCLC Online Computer Library Center, Inc." ;
>     cc:attributionURL <http://www.oclc.org/dewey/> ;
>     cc:morePermissions <http://www.oclc.org/dewey/about/licensing/> ;
>     dct:hasVersion <http://dewey.info/class/641/2009/08/> ;
>     dct:language "de"^^dct:RFC4646 ;
>     a skos:Concept ;
>     xhtml:license 
> <http://creativecommons.org/licenses/by-nc-nd/3.0/> ;
>     skos:broader <http://dewey.info/class/64/2003/08/about.de> ;
>     skos:inScheme <http://dewey.info/scheme/2003/08/about.de> ;
>     skos:notation "641"^^<http://dewey.info/schema-terms/Notation> ;
>     skos:prefLabel "Food & drink"@en, "Essen und Trinken"@de .

A very good question (and not an easy one to answer). The short answer
would be: Language _is_ an element of the domain to be described (Dewey
concepts), so a different language should generate a different URI,
because it describes a separate instance of a concept. A longer answer: 

My basic premise here was that a URI like http://dewey.info/class/641/
should indentify class 641 across all versions/languages of the DDC, not
just the most current version or a multilingual version. Why?

1. Labels can change over time for a given class, which could lead to
inconsistencies with the SKOS model. For example, at one point in time
class 210 had the prefLabel "Natural theology"; now it has the prefLabel
"Philosophy & theory of religion" (which reflects changes to its
semantics as well). This would lead to problems when hanging them from
one concept: 

<http://dewey.info/class/641/>
	skos:prefLabel "Natural theology"@en;
	skos:prefLabel "Philosophy & theory of religion"@en;
...
	skos:prefLabel "Religionsphilosophie, Religionstheorie"@de.

2. The prefLabel is not the only relationship that might be dependent on
the concept language. And many of these relationships are not used with
plain literals in object position that can be disambiguated with a
language tag. Semantic relationships may be different for a concept in
the German version of the DDC. Example: 220.5312 Luther-Bibel und
Revisionen is a concept that, because of an expansion, only exists in
the German edition. There has to be a way to identify this class as a
German edtion concept, not only as a Dewey concept that happens to only
have a German caption.

3. One could argue that this would not be a relevant counter-argument if
translations where 100% interoperable (i.e., 220.5312 might one day be
included in the English edition as "Luther Bible and revisions".
(Interoperability in this case would mean that no other translation or
the English edition could have claimed this number to coin a different
concept.) But translations are not always perfectly synchronized. So
http://dewey.info/class/641/2008/01/03/ in Portugese could in fact be a
translation of an earlier version of the English concept, e.g.
http://dewey.info/class/641/2005/08/09/about.en (that might have been
updated with different index terms, different semantic relationships and
so on in the meantime).

4. Finally, having different identifiers for the same concept in a
different language, or, more precisely, for a concept with the same
Dewey number, makes it possible to answer very useful questions like: 

"Which concepts exist in the German but not in the English edition?"

---

The compromise here is to use language as part of the document URIs (as
a dimension of the representation), not as part of the "abstract" URIs.
So, if you want to refer to a class, or timestamped version, you can do
so in an abstract way:

http://dewey.info/class/641/      -> 303 See Other:
http://dewey.info/class/641/about
http://dewey.info/class/641/2003/ -> 303 See Other:
http://dewey.info/class/641/2003/about

A 303 points a user agent to the generic document which is "/about".

Whereas when you want to refer to a specific language or format, you
have to use a specific URI for an information resource, e.g.,
http://dewey.info/class/641/about.de or
http://dewey.info/class/641/2003/about.de. So language _is_ recognized
as part of the domain, but only as part of representations, not of
concepts.

> See how multiple skos:prefLabel assertions can be made using 
> the same subject? To illustrate why I think this is important 
> consider how someone may use the resources at dewey.info:
> 
>     <http://openlibrary.org/b/OL11604988M> dct:subject 
> <http://dewey.info/class/641/> .
> 
> If we follow our nose to http://dewey.info/class/641 we will 
> get back some RDF, but the description we get isn't about the 
> subject <http://dewey.info/class/641/> so what are we to make 
> of the above assertion?

Going back the the premise, I thing the question is here: What is
http://dewey.info/class/641/? Is it a SKOS concept? If we analyse a
couple of triples from the response we can determine what a user agent
might learn from the answer of the service:

<http://dewey.info/class/641/2009/08/about.en>
    dct:isVersionOf <http://dewey.info/class/641/> .
<http://dewey.info/class/641/2009/08/about.de>
    dct:isVersionOf <http://dewey.info/class/641/> .
...
<http://dewey.info/class/641/2009/08/about.es>
    dct:isVersionOf <http://dewey.info/class/641/> .

It is true that http://dewey.info/class/641/ is never mentioned in
subject position, but it is in object position several times. So an
agent might deduce the following about http://dewey.info/class/641/ from
the response:

<http://dewey.info/class/641/>
    dct:hasVersion <http://dewey.info/class/641/2009/08/about.en> ;
    dct:hasVersion <http://dewey.info/class/641/2009/08/about.de> ;
...
    dct:hasVersion <http://dewey.info/class/641/2009/08/about.es> .

And this coincides with what we get if we use a SPARQL query directly:

DESCRIBE <http://dewey.info/class/641/>

(http://tinyurl.com/nrlsqm) 

Query result:

<http://dewey.info/class/641/> rdf:type skos:Concept ;
    dct:hasVersion <http://dewey.info/class/641/2009/08/about.en> ,
                   <http://dewey.info/class/641/2003/08/about.de> ,
                   <http://dewey.info/class/641/2009/08/about.fr> ,
                   <http://dewey.info/class/641/2009/08/about.es> ,
                   <http://dewey.info/class/641/2009/08/about.sv> ,
                   <http://dewey.info/class/641/2009/08/about.pt> ,
                   <http://dewey.info/class/641/2009/08/about.ru> ,
                   <http://dewey.info/class/641/2009/08/about.zh> ,
                   <http://dewey.info/class/641/2009/08/about.ar> ;
    skos:notation "641"^^"schema-terms/Notation" .

I agree that it is somewhat misleading to not give this information when
dereferencing http://dewey.info/class/641/. So do you think the results
of a DESCRIBE query should always be included for any given URI in a
linked data service? Should (all) triples be included also where the URI
is used in object position? Would you think it is misleading to give
more information?

The class http://dewey.info/class/641/ is not so much a single concept
but a set, an aggregation of all instances of this class. This might be
comparable to the notion of a work in FRBR. A work is very much an
aggregation of manifestations; a "work set". So my thinking was to
respond to such a request by returning all versions that make up this
set. The most current version could be obtained by
http://dewey.info/class/641/current/ or similar.

 
> Another unrelated thing I noticed is that the RDFa doesn't 
> seem to be usable by the RDFa Distiller:
> 
>     http://tinyurl.com/nm8lfa

The problem here is that pyRdfa doesn't seem to send the correct Accept
header. Since dewey.info defaults to RDF/XML if it doesn't see
"text/html" or "application/xhtml+rdfa" in the request, pyRdfa doesn't
see any RDFa at all. You have to skip conneg and use the URL for the
HTML-specific resource directly, e.g.
http://dewey.info/class/641/about.html.

http://tinyurl.com/kjpxmt


> Sorry if this was too long. Let me just say again how 
> exciting it is to see this work coming out of OCLC.

I am pretty sure my ramblings were too long! Sorry about that.

Thanks, Ed, for your help and encouragement.

Michael
Received on Thursday, 20 August 2009 20:50:57 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:23 UTC