Comments on Motivating problems from Kevin Smathers on 2003-06-16 (www-rdf-dspace@w3.org from June 2003)

From: Kevin Smathers <kevin.smathers@hp.com>
Date: Mon, 16 Jun 2003 16:03:11 -0700
To: www-rdf-dspace <www-rdf-dspace@w3.org>
Message-ID: <3EEE4CAF.70007@hp.com>

Hi all,

More to follow, but here are my first set of comments from a quick 
review of the current document status.


-- 
========================================================
   Kevin Smathers                kevin.smathers@hp.com    
   Hewlett-Packard               kevin@ank.com            
   Palo Alto Research Lab                                 
   1501 Page Mill Rd.            650-857-4477 work        
   M/S 1135                      650-852-8186 fax         
   Palo Alto, CA 94304           510-247-1031 home        
========================================================
use "Standard::Disclaimer";
carp("This message was printed on 100% recycled bits.");

Index: technologies.tex
===================================================================
RCS file: /cvs/simile/docs/relevantTechnologies/technologies.tex,v
retrieving revision 1.1
diff -c -r1.1 technologies.tex
*** technologies.tex	16 Jun 2003 15:56:42 -0000	1.1
--- technologies.tex	16 Jun 2003 23:01:10 -0000
***************
*** 69,82 ****
  of a metadata schema. It is the process of adding to or modifying the metadata 
  of a long-term electronic record without degrading the evidentiary status of the metadata \cite{victoriametadata}.
  \item [Metadata extraction] Metadata extraction refers to the extraction and codification of metadata
! either from other existing metadata and content. An example here is
  the extraction of embedded track information from an MP3 file. 
  \item [Dynamic Metadata] There is a distinction between extracting dynamic 
  metadata, and copying
  the metadata out of an asset. Dynamic metadata may change over time,
  and must be verified by crosscheck with the source for that data for 
  example RSS feeds. Note Haystack allows users to do event
! subscriptions on changes to an underlying RDF statement. If you abstract
  beyond the statement to where the statement came from, then the statement
  should be updated any time the data source is updated. Copying out
  metadata on the other hand is more useful for relatively static assets. Print analogues,
--- 69,82 ----
  of a metadata schema. It is the process of adding to or modifying the metadata 
  of a long-term electronic record without degrading the evidentiary status of the metadata \cite{victoriametadata}.
  \item [Metadata extraction] Metadata extraction refers to the extraction and codification of metadata
! from other existing metadata and content. An example here is
  the extraction of embedded track information from an MP3 file. 
  \item [Dynamic Metadata] There is a distinction between extracting dynamic 
  metadata, and copying
  the metadata out of an asset. Dynamic metadata may change over time,
  and must be verified by crosscheck with the source for that data for 
  example RSS feeds. Note Haystack allows users to do event
! subscriptions on changes to an underlying RDF statement.  If you abstract
  beyond the statement to where the statement came from, then the statement
  should be updated any time the data source is updated. Copying out
  metadata on the other hand is more useful for relatively static assets. Print analogues,
***************
*** 121,142 ****
  set of descriptors. It may optionally provide 
  a thesaurus of descriptors,  synonyms, preferred usage terms, 
  relationships among terms and aids to 
! selecting the best terms \cite{whatisacontrolledvocab}. 
! \item [Ontology] In general terms, an 
! ontology is an explicit specification of a conceptualization. The 
! term is borrowed from philosophy, where an ontology is a systematic 
! account of existence. When the knowledge of a 
! domain is represented in a declarative formalism, the set of objects 
! that can be represented is called the universe of discourse. This 
! set of objects, and the describable relationships among them, are 
! reflected in the representational vocabulary with which a 
! knowledge-based program represents knowledge. Thus, in the context 
! of AI, we can describe the ontology of a program by defining a set 
! of representational terms. In such an ontology, definitions associate 
! the names of entities in the universe of discourse (e.g., classes, 
! relations, functions, or other objects) with human-readable text 
! describing what the names mean, and formal axioms that constrain 
! the interpretation and well-formed use of these terms. Formally, 
  an ontology is the statement of a logical theory \cite{whatisanontology}.
  However as we are using ``ontology languages'', ontology may
  have a very specific meaning that is determined by the particular
--- 121,130 ----
  set of descriptors. It may optionally provide 
  a thesaurus of descriptors,  synonyms, preferred usage terms, 
  relationships among terms and aids to 
! selecting the best terms \cite{whatisacontrolledvocab}.  For example
! Expert Systems have long aided human diagnosis of illness in the 
! medical industry.
! \item [Ontology] Formally, 
  an ontology is the statement of a logical theory \cite{whatisanontology}.
  However as we are using ``ontology languages'', ontology may
  have a very specific meaning that is determined by the particular
***************
*** 379,385 ****
  vocabulary items in one vocabulary to the other. There are other possible
  complexities here: for example subdivisions when
  a term in one vocabulary maps onto multiple terms in the other will require
! human intervention to disambiguate. 
  \end{description}
  
  \subsection{Semantic Validation}
--- 367,373 ----
  vocabulary items in one vocabulary to the other. There are other possible
  complexities here: for example subdivisions when
  a term in one vocabulary maps onto multiple terms in the other will require
! human intervention or additional metadata to disambiguate.  
  \end{description}
  
  \subsection{Semantic Validation}
***************
*** 453,459 ****
  about the same thing?
  \end{itemize}
  
! There are two issues here. One is "how are things named" and the
  obvious answer is "URIs".  The second one is "Should there be
  canonical URIs that can be deduced from what you are looking for"?  
  It is proposed that the answer to the second is no. URIs should be 
--- 441,447 ----
  about the same thing?
  \end{itemize}
  
! There are two issues here.  One is "how are things named" and the
  obvious answer is "URIs".  The second one is "Should there be
  canonical URIs that can be deduced from what you are looking for"?  
  It is proposed that the answer to the second is no. URIs should be 
***************
*** 465,472 ****
  metadata on the object, and any knowledge of how to construct the URL
  can then be turned into a specification of its metadata.
  
  One way that two parties can independently
! come up with names for a given resource is to use the MD5 hash of a
  collection of bits. This only applies to static resources that are, in
  some sense, entirely bits. Still, there are a lot of interesting
  resources that could be viewed this way, including audio CDs, DVDs,
--- 453,472 ----
  metadata on the object, and any knowledge of how to construct the URL
  can then be turned into a specification of its metadata.
  
+ [[Notwithstanding the argument above, I'm not aware of any successful
+ use of meaningless URI's on the Internet.  Naming systems are inherently
+ non-arbitrary and whenever there have been meaningless names, we have
+ had to add in a layer of meaningful name that maps onto the meaningless
+ name.  Witness UFS for inodes, DNS for IP, also tinyurl.com is rapidly 
+ gaining acceptance for complex web URL's.  Finding a resource is not 
+ analogous to naming since a search can return ambiguous results, while 
+ a name should unambiguously identify the resource it names.  Perhaps
+ this is only an argument that SIMILE needs a layer which maps user
+ legible names to opaque URL's, but that was the point of this naming
+ discussion in the first place.]]
+ 
  One way that two parties can independently
! come up with names for a given resource is to use the SHA1 hash of a
  collection of bits. This only applies to static resources that are, in
  some sense, entirely bits. Still, there are a lot of interesting
  resources that could be viewed this way, including audio CDs, DVDs,
***************
*** 474,493 ****
  and less formally email messages and digital
  photographs. Dynamic content and non-digital resources, like the
  Effiel Tower, cannot be named in this way.
! The nice thing about MD5 URLs is
  that they provide a canonical naming rule that reduces the odds of
  getting multiple names for the same object, reducing need for
  inference about equivalence.
  
! One problem with MD5 sums is that the contents of the URL become
! immutably linked to the URL itself.  
! Invariant documents have lots of nice features; distribution, cache 
! control, and cache verification become trivial, but on the down side
! there is no consistent address for the top of tree of a document history.
  If you want to be able to modify your document after publishing its
! MD5-sum URL, then you will need other mechanisms to deal with this. 
  
! The other problem is when we are using URLs to describe documents
  and their subcomponents i.e. 
  identifying resources smaller than the atomic document.   
  Doing this with a URL is arguably convenient, in that it permanently 
--- 474,501 ----
  and less formally email messages and digital
  photographs. Dynamic content and non-digital resources, like the
  Effiel Tower, cannot be named in this way.
! The nice thing about SHA1 URLs is
  that they provide a canonical naming rule that reduces the odds of
  getting multiple names for the same object, reducing need for
  inference about equivalence.
  
! One problem with SHA1 sums is that the contents of the URL become
! immutably linked to the URL itself.  A second problem is that URLs
! that consist solely of content based identifiers are incapable of 
! identifying distinct instances of a documents with identical 
! content.  Last and perhaps most importantly there is no consistent
! address for the most recent version of a document as the URL for
! each version will vary uncontrollably with its content.
  If you want to be able to modify your document after publishing its
! SHA1-sum URL, then you will need other mechanisms to deal with this. 
! 
! On the other hand, invariant documents have lots of nice features; 
! distribution, cache control, and cache verification become trivial.
! A combination of an instance based reference name with an optional
! content based identifier tacked on the end should provide the best
! of both worlds.
  
! The other naming problem is when we are using URLs to describe documents
  and their subcomponents i.e. 
  identifying resources smaller than the atomic document.   
  Doing this with a URL is arguably convenient, in that it permanently

Received on Monday, 16 June 2003 19:04:03 UTC