Re: Open Library and RDF from Thomas Baker on 2010-08-16 (public-lld@w3.org from August 2010)

From: Thomas Baker <tbaker@tbaker.de>
Date: Mon, 16 Aug 2010 11:33:39 -0400
To: Dan Brickley <danbri@danbri.org>
Cc: Karen Coyle <kcoyle@kcoyle.net>, "gordon@gordondunsire.com" <gordon@gordondunsire.com>, "Young,Jeff (OR)" <jyoung@oclc.org>, public-lld@w3.org
Message-ID: <20100816153339.GC1156@octavius>
On Mon, Aug 16, 2010 at 09:35:30AM +0200, Dan Brickley wrote:
> We should be careful to manage expectations here - even the most
> 'strongly specified' ontology can only express certain kinds of
> constraint, in RDFS/OWL at least. High quality, precise metadata
> requires additional discipline that comes it at quite another level. I
> think you know this very well but just to spell it out for the record!

That was indeed the intended message, but you make alot of
the points more elegantly :-)

> Ontologies are more like legal reference literature than police; they
> don't directly enforce anything. 

I like Alistair's characterization: they provide a "license
for inferencing", i.e., for inferring knowledge about the
instances in the world that are being described.

>                                   This is partly why in the Dublin Core
> community we found a need for 'application profiles'; not just to
> combine other's work rather than always have to be defining new terms,
> but to be able to talk explicitly about the structure of descriptions,
> as well as about the structure of the world those descriptions
> describe.

That is the idea behind the planned Joint meeting of LLD XG
with DCMI Architecture Forum on Friday afternoon, 22 October,
in Pittsburgh [1] to focus on "Application Profiles for Linked
Data: models and requirements".  The discussion in the Dublin
Core community resulted in the DCMI Abstract Model (DCAM) and
the Description Set Profile constraint language.  I'd like for
that meeting to discuss requirements for application profiles
in general, take a critical look at DCAM in particular, and
it would be great if someone could come prepared to present
some alternative approaches to application profiles.

If "application profiles" are acknowledged to be an important
piece of the puzzle in communities such as FR and RDA,
then this group should give some thought as to whether we
think further work is needed to standardize formalisms for
application profiles, and if so, where that work might take
place -- DCMI?  W3C?  The FR and RDA committees that need
application profiles?  Some combination of the above?

[1] http://www.asis.org/Conferences/DC2010/program-sessions.html#jointmeeting

> Regardly the 'Semantic Web', we call it 'semantics' because the rules
> in our ontologies are generalisations about the world; sometimes
> however we want the rules to be talking more directly about the data:
> 'when you mention a person and they are no longer alive, mention their
> date of death'; 'if you mention a group and you know its founder,
> mention their name and give an identifying description of them', etc.
> [top of my head examples]. These sorts of rule aren't directly about
> people or groups, but about the structure of certain kinds of
> description. It is possible to bend and twist...

Nicely put.

>                                            ...semantic technology to
> work like this (some of us have used/abused SPARQL, others OWL eg.
> nice work from clarkparsia recently) but the main thing to emphasise
> is that - fresh out of the box - even 'strong' world-describing
> ontologies don't express these kinds of rules. They'll tell you about
> types of things, types of property and relationship, alongside
> patterns of agreed meaning for talking and reasoning about them,
> including bundles of facts that can't [or that must] be simultaneously
> true. So they'll tell you about the world but they won't tell you how
> to talk about the world. If we leave things there, the data might be
> logically consistent, expressing no contradictions, but it can still
> be low quality. This is because there are a thousand ways to screw up
> data beyond being contradicting yourself.

Great stuff.

> >           However, for the data to be good and
> > consistent, it does not follow that the underlying vocabularies
> > themselves must necessarily carry heavy ontological baggage.
> 
> Measuring complexity is hard, it's like a lump in the carpet that pops
> up somewhere else when you try to flatten it away. 

LOL! :-)

> > internationally recognized standards.  I do not see this
> > changing.
...
> itself are all changing. It's happening slowly enough that we don't
> need to panic, but fast enough that I'm wary about asserting that
> anything will remain unchanged.

Okay then, "I do not see this changing" = "I'm not actually
seeing this change as we speak" :-)

> > My point is that it is not necessarily strongly specified
> > ontologies that will buy that precision, whereas strongly
> > specified ontologies _will_ impose their ontological baggage
> > on any downstream consumers of that data.
> 
> Depends what you mean by 'strongly specified ontologies'. Sorry to
> keep blabbing about FOAF but I can share some experience maybe. Lots
> of consumers of FOAF data don't even read the human-facing spec, let
> along parse the RDF schema to discover the 'strong' OWL claims
> (disjointness, inverses) or weaker RDFS information (domain/range,
> subclass). They often don't even use a 'proper' XML parser, let alone
> get triples via RDF/XML parsing. So the claim that using rich
> ontological modelling upstream makes work for those downstream, I
> don't buy. Usually the underlying ontologies are ignored, and people
> take the data at some kind of face value. 

That's a good point, and I do not doubt that people usually
ignore the rich modeling bits in practice (and ignoring
them does not "make work").  What I had in mind is that if a
consumer downstream were to run some data through a reasoner,
dereferencing strongly specified namespace documents, those
rich modeling bits could kick up some confusing surprises.

>                                             There is a variant issue
> though: if your model in terms of entities and relationships is
> rich/strong/complex/powerful, it probably makes a pile of distinctions
> that show up in your data even if consumers aren't using RDF/OWL, in
> that there will be more terms, identifiers etc (class and property
> names and however they manifest themselves syntactically, in XML,
> HTML, JSON, CSV etc). But that's much more about levels of detail than
> about the 'strength' of some ontological content.

As I understand it, the FR and RDA approach is indeed to coin
separate properties for each of the WEMI entities, in parallel,
e.g., manifestationTitle, expressionTitle...

> This seems an important thing to discuss. I've seen things previously
> ascribed to FRBR which sound like they are data integrity / discipline
> rules, but are rather awkwardly manifesting themselves as deeper
> ontological claims (eg. about persons, subjects, and which can be
> what...). 

So nicely put...

>           As rules about what you might find in a certain kind of
> FRBR-approved description, those rules are very valuable; considered
> as observations about a world that will be further described by other
> independent parties, they can seem quirky since they assume a kind of
> closed world in which FRBR is the only party who gets to make
> ontological rules. Sorry not to back this up with a detailed example -
> I think I'm thinking of some of the issues Karen has previously
> blogged on.

Tom

-- 
Thomas Baker <tbaker@tbaker.de>
Received on Monday, 16 August 2010 15:34:21 UTC