- From: Kevin Smathers <kevin.smathers@hp.com>
- Date: Mon, 16 Jun 2003 16:03:11 -0700
- To: www-rdf-dspace <www-rdf-dspace@w3.org>
- Message-ID: <3EEE4CAF.70007@hp.com>
Hi all, More to follow, but here are my first set of comments from a quick review of the current document status. -- ======================================================== Kevin Smathers kevin.smathers@hp.com Hewlett-Packard kevin@ank.com Palo Alto Research Lab 1501 Page Mill Rd. 650-857-4477 work M/S 1135 650-852-8186 fax Palo Alto, CA 94304 510-247-1031 home ======================================================== use "Standard::Disclaimer"; carp("This message was printed on 100% recycled bits.");
Index: technologies.tex =================================================================== RCS file: /cvs/simile/docs/relevantTechnologies/technologies.tex,v retrieving revision 1.1 diff -c -r1.1 technologies.tex *** technologies.tex 16 Jun 2003 15:56:42 -0000 1.1 --- technologies.tex 16 Jun 2003 23:01:10 -0000 *************** *** 69,82 **** of a metadata schema. It is the process of adding to or modifying the metadata of a long-term electronic record without degrading the evidentiary status of the metadata \cite{victoriametadata}. \item [Metadata extraction] Metadata extraction refers to the extraction and codification of metadata ! either from other existing metadata and content. An example here is the extraction of embedded track information from an MP3 file. \item [Dynamic Metadata] There is a distinction between extracting dynamic metadata, and copying the metadata out of an asset. Dynamic metadata may change over time, and must be verified by crosscheck with the source for that data for example RSS feeds. Note Haystack allows users to do event ! subscriptions on changes to an underlying RDF statement. If you abstract beyond the statement to where the statement came from, then the statement should be updated any time the data source is updated. Copying out metadata on the other hand is more useful for relatively static assets. Print analogues, --- 69,82 ---- of a metadata schema. It is the process of adding to or modifying the metadata of a long-term electronic record without degrading the evidentiary status of the metadata \cite{victoriametadata}. \item [Metadata extraction] Metadata extraction refers to the extraction and codification of metadata ! from other existing metadata and content. An example here is the extraction of embedded track information from an MP3 file. \item [Dynamic Metadata] There is a distinction between extracting dynamic metadata, and copying the metadata out of an asset. Dynamic metadata may change over time, and must be verified by crosscheck with the source for that data for example RSS feeds. Note Haystack allows users to do event ! subscriptions on changes to an underlying RDF statement. If you abstract beyond the statement to where the statement came from, then the statement should be updated any time the data source is updated. Copying out metadata on the other hand is more useful for relatively static assets. Print analogues, *************** *** 121,142 **** set of descriptors. It may optionally provide a thesaurus of descriptors, synonyms, preferred usage terms, relationships among terms and aids to ! selecting the best terms \cite{whatisacontrolledvocab}. ! \item [Ontology] In general terms, an ! ontology is an explicit specification of a conceptualization. The ! term is borrowed from philosophy, where an ontology is a systematic ! account of existence. When the knowledge of a ! domain is represented in a declarative formalism, the set of objects ! that can be represented is called the universe of discourse. This ! set of objects, and the describable relationships among them, are ! reflected in the representational vocabulary with which a ! knowledge-based program represents knowledge. Thus, in the context ! of AI, we can describe the ontology of a program by defining a set ! of representational terms. In such an ontology, definitions associate ! the names of entities in the universe of discourse (e.g., classes, ! relations, functions, or other objects) with human-readable text ! describing what the names mean, and formal axioms that constrain ! the interpretation and well-formed use of these terms. Formally, an ontology is the statement of a logical theory \cite{whatisanontology}. However as we are using ``ontology languages'', ontology may have a very specific meaning that is determined by the particular --- 121,130 ---- set of descriptors. It may optionally provide a thesaurus of descriptors, synonyms, preferred usage terms, relationships among terms and aids to ! selecting the best terms \cite{whatisacontrolledvocab}. For example ! Expert Systems have long aided human diagnosis of illness in the ! medical industry. ! \item [Ontology] Formally, an ontology is the statement of a logical theory \cite{whatisanontology}. However as we are using ``ontology languages'', ontology may have a very specific meaning that is determined by the particular *************** *** 379,385 **** vocabulary items in one vocabulary to the other. There are other possible complexities here: for example subdivisions when a term in one vocabulary maps onto multiple terms in the other will require ! human intervention to disambiguate. \end{description} \subsection{Semantic Validation} --- 367,373 ---- vocabulary items in one vocabulary to the other. There are other possible complexities here: for example subdivisions when a term in one vocabulary maps onto multiple terms in the other will require ! human intervention or additional metadata to disambiguate. \end{description} \subsection{Semantic Validation} *************** *** 453,459 **** about the same thing? \end{itemize} ! There are two issues here. One is "how are things named" and the obvious answer is "URIs". The second one is "Should there be canonical URIs that can be deduced from what you are looking for"? It is proposed that the answer to the second is no. URIs should be --- 441,447 ---- about the same thing? \end{itemize} ! There are two issues here. One is "how are things named" and the obvious answer is "URIs". The second one is "Should there be canonical URIs that can be deduced from what you are looking for"? It is proposed that the answer to the second is no. URIs should be *************** *** 465,472 **** metadata on the object, and any knowledge of how to construct the URL can then be turned into a specification of its metadata. One way that two parties can independently ! come up with names for a given resource is to use the MD5 hash of a collection of bits. This only applies to static resources that are, in some sense, entirely bits. Still, there are a lot of interesting resources that could be viewed this way, including audio CDs, DVDs, --- 453,472 ---- metadata on the object, and any knowledge of how to construct the URL can then be turned into a specification of its metadata. + [[Notwithstanding the argument above, I'm not aware of any successful + use of meaningless URI's on the Internet. Naming systems are inherently + non-arbitrary and whenever there have been meaningless names, we have + had to add in a layer of meaningful name that maps onto the meaningless + name. Witness UFS for inodes, DNS for IP, also tinyurl.com is rapidly + gaining acceptance for complex web URL's. Finding a resource is not + analogous to naming since a search can return ambiguous results, while + a name should unambiguously identify the resource it names. Perhaps + this is only an argument that SIMILE needs a layer which maps user + legible names to opaque URL's, but that was the point of this naming + discussion in the first place.]] + One way that two parties can independently ! come up with names for a given resource is to use the SHA1 hash of a collection of bits. This only applies to static resources that are, in some sense, entirely bits. Still, there are a lot of interesting resources that could be viewed this way, including audio CDs, DVDs, *************** *** 474,493 **** and less formally email messages and digital photographs. Dynamic content and non-digital resources, like the Effiel Tower, cannot be named in this way. ! The nice thing about MD5 URLs is that they provide a canonical naming rule that reduces the odds of getting multiple names for the same object, reducing need for inference about equivalence. ! One problem with MD5 sums is that the contents of the URL become ! immutably linked to the URL itself. ! Invariant documents have lots of nice features; distribution, cache ! control, and cache verification become trivial, but on the down side ! there is no consistent address for the top of tree of a document history. If you want to be able to modify your document after publishing its ! MD5-sum URL, then you will need other mechanisms to deal with this. ! The other problem is when we are using URLs to describe documents and their subcomponents i.e. identifying resources smaller than the atomic document. Doing this with a URL is arguably convenient, in that it permanently --- 474,501 ---- and less formally email messages and digital photographs. Dynamic content and non-digital resources, like the Effiel Tower, cannot be named in this way. ! The nice thing about SHA1 URLs is that they provide a canonical naming rule that reduces the odds of getting multiple names for the same object, reducing need for inference about equivalence. ! One problem with SHA1 sums is that the contents of the URL become ! immutably linked to the URL itself. A second problem is that URLs ! that consist solely of content based identifiers are incapable of ! identifying distinct instances of a documents with identical ! content. Last and perhaps most importantly there is no consistent ! address for the most recent version of a document as the URL for ! each version will vary uncontrollably with its content. If you want to be able to modify your document after publishing its ! SHA1-sum URL, then you will need other mechanisms to deal with this. ! ! On the other hand, invariant documents have lots of nice features; ! distribution, cache control, and cache verification become trivial. ! A combination of an instance based reference name with an optional ! content based identifier tacked on the end should provide the best ! of both worlds. ! The other naming problem is when we are using URLs to describe documents and their subcomponents i.e. identifying resources smaller than the atomic document. Doing this with a URL is arguably convenient, in that it permanently
Received on Monday, 16 June 2003 19:04:03 UTC