Join message from Bryan Bishop on 2008-07-09 (public-semweb-lifesci@w3.org from July 2008)

From: Bryan Bishop <kanzure@gmail.com>
Date: Tue, 8 Jul 2008 21:00:52 -0500
To: "W3C HCLSIG hcls" <public-semweb-lifesci@w3.org>
Message-Id: <200807082100.53489.kanzure@gmail.com>
Hey all,

I am new to the list, but I really should have known about this group 
years ago. I am running a semantic web project oriented that is best 
summarized as apt-get for physical automation. Today, while writing 
some perl to steal the humancortex data from the Allen Institute, I 
stumbled upon the AJAX + SPARQL + RDF + Google Maps API + Ruby on rails 
implementation that is mentioned in some slides/PDFs. Unfortunately, 
the hcls1 server on CSAIL at MIT seems to be dead -- I'd be willing to 
take over some of that code. I think it was here:
 http://hcls1.csail.mit.edu:8890/map/#Kcnip3@2850,Kcnd1@2800

And the PDF was: http://tinyurl.com/ysqm3z

Anyway, the basis of "SKDB", or the metadata for physical automation:
 http://heybryan.org/exp.html
 http://heybryan.org/new_exp.html
 http://oscomak.net/

I am very happy to see gateways like:
 http://hcls.deri.ie/hcls_demo.html
'The following queries access a SPARQL endpoint hosted at DERI. The 
underlying triplestore contains over 325 million RDF triples of 
biomedical information. The information covers a large array of 
biomedical knowledge: from basic molecular biology over literature 
annotation up to anatomy and physiology.'

Which I suspect is an integration of OBO Foundry, SBML, and other 
related projects. I've noticed, however, that the major drawback of the 
majority of these semantic querying interfaces is that there's no 
plug-and-play functionality that I have found yet -- I hope I am 
completely wrong here -- but it's too bad that these databases have 
their tables and their data structures completely hidden instead of 
floating around as code. It's one of the reasons that I am a user of 
YAML and object serialization.

http://yaml.org/
"YAML: YAML Ain't Markup Language / What It Is: YAML is a human friendly 
data serialization standard for all programming languages."

And the python implementation details (PyYAML):

http://pyyaml.org/

I mention this for two strong reasons. First, has anyone seen PyLab? I 
regret that I keep mentioning python, just because it might show bias, 
but I actually do a significant amount of work not in python, so it's 
really just because I was going through the majority of known 
scientific number crunching packages:

http://heybryan.org/num.html like Axiom, derive, macsyma, maple, 
mathematica, MATLAB, mupad, reduce, R, octave, sage, numpy, scipy, PDL, 
sympy, and next up on my list is CAD/CAM packages, which I suspect will 
have significant cross over with the CFD packages out there.

Anyway, PyLab:

http://scipy.org/PyLab
"SciPy (pronounced "Sigh Pie") is open-source software for mathematics, 
science, and engineering. It is also the name of a very popular 
conference on scientific programming with Python. The SciPy library 
depends on NumPy, which provides convenient and fast N-dimensional 
array manipulation. The SciPy library is built to work with NumPy 
arrays, and provides many user-friendly and efficient numerical 
routines such as routines for numerical integration and optimization. 
Together, they run on all popular operating systems, are quick to 
install, and are free of charge. NumPy and SciPy are easy to use, but 
powerful enough to be depended upon by some of the world's leading 
scientists and engineers. If you need to manipulate numbers on a 
computer and display or publish the results, give SciPy a try!"

And PyLab is the "total lab integration" mostly on the software side. 
But still, as I mentioned, I'm interested in extracting functionality 
out of the web (put it to work for you) instead of static information 
just sitting there. Somehow I ended up finding out about EXPO:

http://expo.sf.net/
"EXPO defines over 200 concepts for creating semantic markup about 
scientific experiments, using the Web Ontology Language OWL. 
We propose the ontology EXPO to formalise generic knowledge about 
scientific experimental design, methodology, and results 
representation. Such a common ontology is both feasible and desirable 
because all the sciences follow the same experimental principles. The 
formal description of experiments for efficient analysis, annotation, 
and sharing of results is a fundamental objective of science."

An interesting example is "The Robot Scientist":

http://www.aber.ac.uk/compsci/Research/bio/robotsci/
"The Robot Scientist is perhaps the first physical implementation of the 
task of Scientific Discovery in a microbiology laboratory. It 
represents the merging of increasingly automated and remotely 
controllable laboratory equipment and knowledge discovery techniques. 
Automation of laboratory equipment (the "Robot" of Robot Scientist) has 
revolutionised laboratory practice by removing the "drudgery" of 
constructing many wet lab experiments by hand, allowing an increase in 
both the scope and scale of potential experiments. Most lab robots only 
require a simple description of the various chemical/ biological 
entities to be used in the experiments, along with their required 
volumes and where these entities are stored. Automation has also given 
rise to significantly increased productivity and a concomitant increase 
in the production of results and data requiring interpretation, giving 
rise to an "interpretation bottleneck" where the process of 
understanding the results is lagging behind the production of results."

So, how are these robots and automation machinery made? Usually in CAD 
programs. Admittedly the open source solutions to CAD have been known 
to be, not the best, but the point is still the same -- designers build 
and generate information, which is then implemented into physical 
machinery. Those designs are packages and could be made accessible in 
an automatic manner. At the same time, the machines that this 
information is being fed into are already automated. So the cybernetic 
loop, as it were, is nearly complete. It's just that the focus on 
static information tends to ignore the instrumentation and automation 
hardware that brought that information in the first place -- the 
programming and such.

So that's how apt-get is interesting (besides being awesome): 
http://en.wikipedia.org/wiki/Debian
http://debian.org/
"Debian (pronounced [ˈdɛbiən]) is a computer operating system (OS) 
composed entirely of software which is both free and open source 
(FOSS). Its primary form, Debian GNU/Linux, is a popular and 
influential Linux distribution.[1] It is a multipurpose OS; it can be 
used as a desktop or server operating system.

Debian is known for strict adherence to the Unix and free software 
philosophies.[2] Debian is also known for its abundance of options — 
the current release includes over twenty-six thousand software packages 
for eleven computer architectures. These architectures range from the 
Intel/AMD 32-bit/64-bit architectures commonly found in personal 
computers to the ARM architecture commonly found in embedded systems 
and the IBM eServer zSeries mainframes.[3] Throughout Debian's 
lifetime, other distributions have taken it as a basis to develop their 
own, including: Ubuntu, MEPIS, Dreamlinux, Damn Small Linux, Xandros, 
Knoppix, Linspire, sidux, Kanotix, and LinEx among others.[4] A 
university's study concluded that Debian's 283 million source code 
lines would cost US$10 billion to develop by proprietary means.[5]"

"Prominent features of Debian are its APT package management system, its 
strict policies regarding its packages and the quality of its releases.
[6] These practices afford easy upgrades between releases and easy 
automated installation and removal of packages. Debian uses an open 
development and testing process. It is developed by volunteers from 
around the world and supported by donations through SPI, a non-profit 
umbrella organization for various free software projects.[7]"

In particular, apt-get allows users to retrieve packages by unique 
identifiers, with automatic installation and configuration to the local 
environment. This involves a significant amount of metadata and lots of 
overhead for actually transmitting the packages, which has been known 
to redline the Cisco root nodes for weeks when debian releases major 
updates. Heh. They are considering (or are they already implementing?) 
debtorrents and debtags to help bring that down to something less 
destructive.

SKDB/OSCOMAK is a pet project of mine and a handful of other programmers 
and machine shop enthusiasts interested in making sure when gEDA and 
OpenCores happens everywhere else, the same infrastructure can be 
deployed, in a functionally useful manner -- like Gershenfeld's group 
over at the MIT Media Lab, the 'FabLab' projects. Basically they are 
quantified shop configurations (much like linux installations) for the 
physical floor space, with downloadable tools that would be implemented 
with whatever tools the system has wired up (obviously some things 
can't make other things). Behind all of this would be a design compiler 
which works just like a regular compiler except for the resolution of 
dependencies between metadata describing projects that are to be 
implemented, whether by hand or by machine.

I'm studying manufacturing engineering down at the University of Texas 
at Austin, also some computational neuroscience. But I've realized that 
the bootstrapping requirements to make all of this happen are difficult 
to the extent that even NIST had some troubles with their Virtual 
Manufacturing projects, to the extent that it would require significant 
funding and a "leap of faith" from others -- which might be somewhat 
unreasonable. So I've been putting most of the work in a specific 
implementation in biology, I put together the do-it-yourself 
biotechnology kit < http://heybryan.org/new_exp.html and 
http://biohack.sf.net/ and http://heybryan.org/biotech.git >. Genes are 
already highly unitized across the semantic web, and could be made to 
do interesting things, see http://partsregistry.org among others in 
synthetic biology. With some buddies I've been detailing the design 
strategies required to make a 'writozyme', a biologically replicable 
system that would allow individuals to very simply synthesize DNA 
without conventional DNA synthesizers like 
http://bioinformatics.org/pogo/ which admittedly already works, but the 
writozyme methodology would (hopefully) inherit the self-replication 
functionality. The metadata aspects are just the same as in OBO, SBML, 
the bioinformatics databases, and so on. And even more importantly it's 
all 'functional' in that it's not "biobricks" that are being sent 
across the web, but instead the tools, machinery, and semantic snowball 
backing it all up and potentially turning into this recursive data 
acquisition process. Bacteria don't just sit there (unless you did your 
plate wrong, ugh). 

Some other interesting guys who are working with me on this:
 http://diybio.org/
 http://openwetware.org/
 http://biopunk.org/
 and a few others that I am forgetting.

I hope I have the right mailing list for talking about these topics :-). 
The project that led me to the Allen Institute, and then to Science 
Commons, and now to W3C's HCLSIG group, was my attention to attention:
 http://heybryan.org/mediawiki/index.php/Sustained_attention

Specifically because of Henry Markram's combined work on computational 
neuroscience (microcolumn simulations of the brain in ~2005) and also, 
surprisingly, on autism:
 http://heybryan.org/intense_world_syndrome.html

"Autism is a devastating neurodevelopmental disorder with a polygenetic 
predisposition that seems to be triggered by multiple envi ronmental 
factors during embryonic and/or early postnatal life. While signiﬁcant 
advances have been made in identifying the neuronal structures and 
cells affected, a unifying theory that could explain the manifold 
autistic symptoms has still not emerged. Based on recent synaptic, 
cellular, molecular, microcircuit, and behavioral results obtained with 
the valproic acid (VPA) rat model of autism, we propose here a unifying 
hypothesis where the core pathology of the autistic brain is 
hyper-reactivity and hyper-plasticity of local neuronal circuits. Such 
excessive neuronal processing in circumscribed circuits is suggested to 
lead to hyper-perception, hyper-attention, and hyper-memory, which may 
lie at the heart of most autistic symptoms. In this view, the autistic 
spectrum are disorders of hyper-functionality, which turns 
debilitating, as opposed to disorders of hypo-functionality, as is 
often assumed. We discuss how excessive neuronal processing may render 
the world painfully intense when the neocortex is affected and even 
aversive when the amygdala is affected, leading to social and 
environmental withdrawal. Excessive neuronal learning is also 
hypothesized to rapidly lock down the individual into a small 
repertoire of secure behavioral routines that are obsessively repeated. 
We further discuss the key autistic neuropathologies and several of the 
main theories of autism and re-interpret them in the light of the 
hypothesized Intense World Syndrome."

When combined with the humancortex datasets from the Allen Institute, 
things start to get very interesting :-). Throw in some metadata 
packaging dynamics, like from SKDB or apt-get, suddenly you're 
programming simulations of neural slices (as we've done for many years 
now) -- or the actual physical tissue plates -- and you're able to 
engineer brains. Sort of :-).
 http://heybryan.org/buildingbrains.html
 http://heybryan.org/recursion.html
'At least' you're able to do some interesting science + neurofeedback, 
one of my intentions.

So, that's the direction that I'm coming from. It looks like I 
completely missed Science Commons when it showed up on the map, and I 
deeply regret this. Are there any other initiatives that I should be 
made aware of? I'm also approaching all of this from the aerospace 
angle: http://heybryan.org/2008-05-09.html It's an email I sent to some 
presenters at ISDC2008 (since I couldn't attend), the National Space 
Society, OpenVirgle ( http://google.com/virgle (humor is healthy)), and 
even some Google Lunar X Prize teams, like Interplanetary Ventures, and 
Team FREDNET, the open source team.

Cheers,
- Bryan
________________________________________
http://heybryan.org/
Received on Thursday, 10 July 2008 05:23:02 UTC