- From: Nick Matsakis <matsakis@mit.edu>
- Date: Thu, 10 Apr 2003 13:40:42 -0400 (EDT)
- To: www-rdf-dspace@w3.org
Comments on Section 3: Motivating Problems In section 3.1, metadata versioning is included in with with metadata extraction. Perhaps that area should get its own bullet point? This mailing list has had somewhat of a straw man debate between "validation" and "best effort", with both sides agreeing that data should be validated and that software should be robust to invalid data. More contentious, perhaps, is David's statement that "[validation] that forbids human beings from entering the data they want to enter because it isn't 'valid' is a system humans will be unwilling to use." I don't believe this is the case; for better or for worse, human being certainly are willing to put up with a lot of restrictions from their computers. Even if this wasn't the case, in a description language as flexible as RDF validation should never prevent a user from entering the data they want to enter; they are always free to create their own ontology and associated metadata, with the understanding that existing tools may be inadequate for working with it. The authors of those tools, though, need to be able to make certain assumptions about the data and this is where validation comes into play. It seems more practical to validate every time we write data than every time we read it. In section 3.2.4, Relationships, controlled vocabularies seemed to be placed on an equal footing with schemas. However, there is little mention of controlled vocabularies before this point. N.B. 3.2.2, where schema discovery/creation/evolution and so forth are discussed. Is this to imply that discovery and versioning of controlled vocabularies is not a topic of interest? Is a controlled vocabulary merely to be considered a restricted form of schema, and the same mechanisms that deal with versioning schemas can be used to version controlled vocabularies? Section 3.2.5 discusses merging. I think it is important to distinguish the two subproblems in merging: identifying records to be merged and the merging the components themselves. This section talks largely about the second component, which is very similar to the mapping problems outlined in 3.2.4. It might be good to give some balance to the first part of the the problem as well. In particular, you may wish to discuss the possibilities of automatically detecting records to be merged upon ingestion. Section 3.2.7 discusses naming. Not to add more to the plate, but we may wish to address the question of distributed naming to some degree. Unique names are typically doled out by centralized authorities (e.g. social security numbers, ISBN numbers, domain names...). However, how does one locally come up with a name for a global resource? For example, suppose two libraries cataloged photographs of the Effiel Tower. How do they name "The Effiel Tower" in such a way that a user looking for photographs of the tower can find them? Or, if a community such as the architecture dept. has a name authority for things like buildings, how do other communities discover it? 3.5.1 I'm not sure I follow the example of "I have lots of instances. Please suggest a schema". how can we specify instances to the computer without some schema? I think the problem is really one of bootstrapping; we may have a set of instances implicited labelled with some weak schema such as "text files" and may want to promote them to a more specific or powerful schema (email messages, citations, addresses).
Received on Thursday, 10 April 2003 13:40:43 UTC