Comments on Section 3

Comments on Section 3: Motivating Problems
In section 3.1, metadata versioning is included in with with metadata
extraction.  Perhaps that area should get its own bullet point?

This mailing list has had somewhat of a straw man debate between
"validation" and "best effort", with both sides agreeing that data should
be validated and that software should be robust to invalid data.  More
contentious, perhaps, is David's statement that "[validation] that forbids
human beings from entering the data they want to enter because it isn't
'valid' is a system humans will be unwilling to use."

I don't believe this is the case; for better or for worse, human being
certainly are willing to put up with a lot of restrictions from their
computers.  Even if this wasn't the case, in a description language as
flexible as RDF validation should never prevent a user from entering the
data they want to enter; they are always free to create their own ontology
and associated metadata, with the understanding that existing tools may be
inadequate for working with it.  The authors of those tools, though, need
to be able to make certain assumptions about the data and this is where
validation comes into play.  It seems more practical to validate every
time we write data than every time we read it.

In section 3.2.4, Relationships, controlled vocabularies seemed to be
placed on an equal footing with schemas.  However, there is little mention
of controlled vocabularies before this point. N.B.  3.2.2, where schema
discovery/creation/evolution and so forth are discussed.  Is this to imply
that discovery and versioning of controlled vocabularies is not a topic of
interest?  Is a controlled vocabulary merely to be considered a restricted
form of schema, and the same mechanisms that deal with versioning schemas
can be used to version controlled vocabularies?

Section 3.2.5 discusses merging.  I think it is important to distinguish
the two subproblems in merging: identifying records to be merged and the
merging the components themselves. This section talks largely about the
second component, which is very similar to the mapping problems outlined
in 3.2.4. It might be good to give some balance to the first part of the
the problem as well.  In particular, you may wish to discuss the
possibilities of automatically detecting records to be merged upon
ingestion.

Section 3.2.7 discusses naming.  Not to add more to the plate, but we may
wish to address the question of distributed naming to some degree.  Unique
names are typically doled out by centralized authorities (e.g. social
security numbers, ISBN numbers, domain names...).  However, how does one
locally come up with a name for a global resource? For example, suppose
two libraries cataloged photographs of the Effiel Tower.  How do they name
"The Effiel Tower" in such a way that a user looking for photographs of
the tower can find them? Or, if a community such as the architecture dept.
has a name authority for things like buildings, how do other communities
discover it?

3.5.1 I'm not sure I follow the example of "I have lots of instances.
Please suggest a schema".  how can we specify instances to the computer
without some schema?  I think the problem is really one of bootstrapping;
we may have a set of instances implicited labelled with some weak schema
such as "text files" and may want to promote them to a more specific or
powerful schema (email messages, citations, addresses).

Received on Thursday, 10 April 2003 13:40:43 UTC