Re: Use Case: Chemical Diagram Navigation

Dear Chaals,

> 
> Thanks for this. Can we use the examples under a license that lets us
> play around with them - cc-by, or cc-0 (public domain) or something?

That would currently be CC BY-NC.

> It is certainly very helpful just to play with this. My first question
> is actually "what do you think is missing from each example?"
> 

It would be good if one were to be able to navigate exclusively on the 
SVG without having to set up a shadow structure.
We have a couple of abstraction layers, which could in principle be 
realised using grouping in SVG. However, single groups sometimes share 
elements, such as lines or a character group.
In many ways what one would like is to reference elements from a group 
that live elsewhere in the SVG tree, without actually having to draw the 
element there. Maybe that's already possible, and I just don't know how 
to do it.

I would imagine this is a general use case: Generating different 
abstractions for complex diagrams that can not necessarily be mapped to 
a common tree structure.

> I would also like to grab the SVG and CML sources that end up being
> part of the rendered HTML. I can see them in the DOM, and it seems to
> me that we should try to look at the serialisation as rendered as the
> "primary use case" rather than how it got there.

The relevant data is at
http://progressiveaccess.com/chemistry/data/
It's being pulled into the DOM with XHR.

> I am assuming that the Javascript collects the CML. Does it draw the
> SVG dynamically or did you do that "manually" somewhere behind the
> scenes and et the script fetch it.

The SVG and the CML are pulled separately. The SVG is automatically 
generated from the pure CML file.
The CML file itself is generated directly from input bitmap images, with 
our image analysis software.

I use my own SVG generator (built as an extension of the CDK).
Although there are plenty of chemical drawing programs around, they 
normally do just that: draw. And loose all the chemical information in 
the process. What is important for the accessible diagrams is that 
components (like double bonds) are correctly grouped together and retain 
the corresponding CML id. Otherwise we can't coordinate highlighting 
etc.

> How did you decide what the navigation through the molecule should be?

That's "decided" by a semantic enrichment process that takes the simple 
CML and produces the enriched CML used in the page.
I've built that with input from a chemistry education specialist and a 
blind chemist.
The basic idea is to identify interesting chemical components (ring 
systems, aliphatic chains, functional groups) thereby forming 3-4 layers 
of abstraction (depending on how complex the ring systems are).

> Is that handled automatically for any molecule - can I provide
> something with 100 atoms and feed it to your system?

Yes, not a problem. I am still ironing out some kinks for fused ring 
systems, but everything else works.

> And "for extra credit" as my teachers used to say, can we apply this
> to something like a chromosome, breaking it down first into "genes" or
> sequences of DNA before we get to actual molecular structures?

The hard bit here would be the image analysis.
But assuming we start straight with the chemical data, I could see how 
that would work.
An easier next step would be chemical reactions, e.g., something like 
https://en.wikipedia.org/wiki/Chemical_reaction#/media/File:Baeyer-Villiger-Oxidation-V1.svg
where both image analysis and semantic enrichment are doable.
Observe how in that example there is no structure retained whatsoever. 
Horrible.

Best,
Volker

Received on Tuesday, 12 May 2015 11:41:05 UTC