From: Kevin Smathers <kevin.smathers@hp.com>
Date: Mon, 16 Jun 2003 16:03:11 -0700
Message-ID: <3EEE4CAF.70007@hp.com>
To: www-rdf-dspace <www-rdf-dspace@w3.org>
Hi all,

More to follow, but here are my first set of comments from a quick
review of the current document status.

--
========================================================
Kevin Smathers                kevin.smathers@hp.com
Hewlett-Packard               kevin@ank.com
Palo Alto Research Lab
1501 Page Mill Rd.            650-857-4477 work
M/S 1135                      650-852-8186 fax
Palo Alto, CA 94304           510-247-1031 home
========================================================
use "Standard::Disclaimer";
carp("This message was printed on 100% recycled bits.");

Index: technologies.tex
===================================================================
RCS file: /cvs/simile/docs/relevantTechnologies/technologies.tex,v
retrieving revision 1.1
diff -c -r1.1 technologies.tex
*** technologies.tex	16 Jun 2003 15:56:42 -0000	1.1
--- technologies.tex	16 Jun 2003 23:01:10 -0000
***************
*** 69,82 ****
! either from other existing metadata and content. An example here is
the extraction of embedded track information from an MP3 file.
\item [Dynamic Metadata] There is a distinction between extracting dynamic
the metadata out of an asset. Dynamic metadata may change over time,
and must be verified by crosscheck with the source for that data for
example RSS feeds. Note Haystack allows users to do event
! subscriptions on changes to an underlying RDF statement. If you abstract
beyond the statement to where the statement came from, then the statement
should be updated any time the data source is updated. Copying out
metadata on the other hand is more useful for relatively static assets. Print analogues,
--- 69,82 ----
! from other existing metadata and content. An example here is
the extraction of embedded track information from an MP3 file.
\item [Dynamic Metadata] There is a distinction between extracting dynamic
the metadata out of an asset. Dynamic metadata may change over time,
and must be verified by crosscheck with the source for that data for
example RSS feeds. Note Haystack allows users to do event
! subscriptions on changes to an underlying RDF statement.  If you abstract
beyond the statement to where the statement came from, then the statement
should be updated any time the data source is updated. Copying out
metadata on the other hand is more useful for relatively static assets. Print analogues,
***************
*** 121,142 ****
set of descriptors. It may optionally provide
a thesaurus of descriptors,  synonyms, preferred usage terms,
relationships among terms and aids to
! selecting the best terms \cite{whatisacontrolledvocab}.
! \item [Ontology] In general terms, an
! ontology is an explicit specification of a conceptualization. The
! term is borrowed from philosophy, where an ontology is a systematic
! account of existence. When the knowledge of a
! domain is represented in a declarative formalism, the set of objects
! that can be represented is called the universe of discourse. This
! set of objects, and the describable relationships among them, are
! reflected in the representational vocabulary with which a
! knowledge-based program represents knowledge. Thus, in the context
! of AI, we can describe the ontology of a program by defining a set
! of representational terms. In such an ontology, definitions associate
! the names of entities in the universe of discourse (e.g., classes,
! relations, functions, or other objects) with human-readable text
! describing what the names mean, and formal axioms that constrain
! the interpretation and well-formed use of these terms. Formally,
an ontology is the statement of a logical theory \cite{whatisanontology}.
However as we are using ontology languages'', ontology may
have a very specific meaning that is determined by the particular
--- 121,130 ----
set of descriptors. It may optionally provide
a thesaurus of descriptors,  synonyms, preferred usage terms,
relationships among terms and aids to
! selecting the best terms \cite{whatisacontrolledvocab}.  For example
! Expert Systems have long aided human diagnosis of illness in the
! medical industry.
! \item [Ontology] Formally,
an ontology is the statement of a logical theory \cite{whatisanontology}.
However as we are using ontology languages'', ontology may
have a very specific meaning that is determined by the particular
***************
*** 379,385 ****
vocabulary items in one vocabulary to the other. There are other possible
complexities here: for example subdivisions when
a term in one vocabulary maps onto multiple terms in the other will require
! human intervention to disambiguate.
\end{description}

\subsection{Semantic Validation}
--- 367,373 ----
vocabulary items in one vocabulary to the other. There are other possible
complexities here: for example subdivisions when
a term in one vocabulary maps onto multiple terms in the other will require
\end{description}

\subsection{Semantic Validation}
***************
*** 453,459 ****
\end{itemize}

! There are two issues here. One is "how are things named" and the
obvious answer is "URIs".  The second one is "Should there be
canonical URIs that can be deduced from what you are looking for"?
It is proposed that the answer to the second is no. URIs should be
--- 441,447 ----
\end{itemize}

! There are two issues here.  One is "how are things named" and the
obvious answer is "URIs".  The second one is "Should there be
canonical URIs that can be deduced from what you are looking for"?
It is proposed that the answer to the second is no. URIs should be
***************
*** 465,472 ****
metadata on the object, and any knowledge of how to construct the URL
can then be turned into a specification of its metadata.

One way that two parties can independently
! come up with names for a given resource is to use the MD5 hash of a
collection of bits. This only applies to static resources that are, in
some sense, entirely bits. Still, there are a lot of interesting
resources that could be viewed this way, including audio CDs, DVDs,
--- 453,472 ----
metadata on the object, and any knowledge of how to construct the URL
can then be turned into a specification of its metadata.

+ [[Notwithstanding the argument above, I'm not aware of any successful
+ use of meaningless URI's on the Internet.  Naming systems are inherently
+ non-arbitrary and whenever there have been meaningless names, we have
+ had to add in a layer of meaningful name that maps onto the meaningless
+ name.  Witness UFS for inodes, DNS for IP, also tinyurl.com is rapidly
+ gaining acceptance for complex web URL's.  Finding a resource is not
+ analogous to naming since a search can return ambiguous results, while
+ a name should unambiguously identify the resource it names.  Perhaps
+ this is only an argument that SIMILE needs a layer which maps user
+ legible names to opaque URL's, but that was the point of this naming
+ discussion in the first place.]]
+
One way that two parties can independently
! come up with names for a given resource is to use the SHA1 hash of a
collection of bits. This only applies to static resources that are, in
some sense, entirely bits. Still, there are a lot of interesting
resources that could be viewed this way, including audio CDs, DVDs,
***************
*** 474,493 ****
and less formally email messages and digital
photographs. Dynamic content and non-digital resources, like the
Effiel Tower, cannot be named in this way.
! The nice thing about MD5 URLs is
that they provide a canonical naming rule that reduces the odds of
getting multiple names for the same object, reducing need for

! One problem with MD5 sums is that the contents of the URL become
! immutably linked to the URL itself.
! Invariant documents have lots of nice features; distribution, cache
! control, and cache verification become trivial, but on the down side
! there is no consistent address for the top of tree of a document history.
If you want to be able to modify your document after publishing its
! MD5-sum URL, then you will need other mechanisms to deal with this.

! The other problem is when we are using URLs to describe documents
and their subcomponents i.e.
identifying resources smaller than the atomic document.
Doing this with a URL is arguably convenient, in that it permanently
--- 474,501 ----
and less formally email messages and digital
photographs. Dynamic content and non-digital resources, like the
Effiel Tower, cannot be named in this way.
! The nice thing about SHA1 URLs is
that they provide a canonical naming rule that reduces the odds of
getting multiple names for the same object, reducing need for

! One problem with SHA1 sums is that the contents of the URL become
! immutably linked to the URL itself.  A second problem is that URLs
! that consist solely of content based identifiers are incapable of
! identifying distinct instances of a documents with identical
! content.  Last and perhaps most importantly there is no consistent
! each version will vary uncontrollably with its content.
If you want to be able to modify your document after publishing its
! SHA1-sum URL, then you will need other mechanisms to deal with this.
!
! On the other hand, invariant documents have lots of nice features;
! distribution, cache control, and cache verification become trivial.
! A combination of an instance based reference name with an optional
! content based identifier tacked on the end should provide the best
! of both worlds.

! The other naming problem is when we are using URLs to describe documents
and their subcomponents i.e.
identifying resources smaller than the atomic document.
Doing this with a URL is arguably convenient, in that it permanently

Received on Monday, 16 June 2003 19:04:03 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:13:06 UTC