CCK-10 Hackathon

TL;DR: CCK-10 Hackathon, Part 1 of the 10th in our series of meetings for computational chemists, cheminformaticians, and molecular modelers, is on Wednesday, December 6th, 2017, at 10:30 am – 5 pm in the Computer Suite, Department of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU. We will be learning about the CSD (Cambridge Structural Database) Python API. Bring your own ideas and programming problems. Pizza provided. Part 2, CCK-10,  will take place after the hackathon. Tickets are limited; please book by November 29th, 2017.


Dear Friends and Colleagues,

Please join us for our next “Comp Chem Kitchen” CCK-10 Hackathon, at 10:30 am – 5 pm on Wednesday, December 6th, 2017, in the Computer Suite, Department of Biochemistry, South Parks Road, Oxford. We are  pleased to announce that Andrew Maloney from Cambridge Crystallographic Data Centre will be leading the CSD Python API Hackathon.

Please register in advance, by November 29th, 2017, as spaces are limited.

The hackathon will follow this format:

  • 10.30 am: Introduction. CSD Python API overview
    • Quick survey of areas of interest and experience.
    • Assemble 4-8 teams with some expertise in each.
  • 11.00 am – 12.30 pm: Hacking – design and tools
  • 12.30 – 1.00 pm: Five min updates / requests for input
  • 1.00 – 2.00 pm: Lunch
  • 2.00 – 5.00 pm: Hacking
  • 5.00 – 6.00 pm: CCK-10, including Lightning summaries of hacked projects. Tickets are free.


  • Clustering and visualisation of data points in protein-ligand interaction space.
  • Using PDB data and some protein-ligand interaction descriptors or fingerprints, what can we learn by visualising using PCA or t-SNE clustering?
  • Molecular interaction descriptor space.
  • Using CSD data (and the CSD Python API) can we visualise interaction patterns using simple descriptors of important intermolecular interactions in the solid state. Do these patterns correspond to standard descriptions of molecular packing?


  • PLIF fingerprints
  • Chem-17 chemical space visualisation
  • dSNAP style interaction scatter plots
  • Interactions as sound
  • Ligplot interaction diagrams
  • Fragment Mapping
  • Constrained conformer generation
  • Hotspot mapping in protein binding sites using IsoStar & SuperStar-like analyses

Pizza and refreshments will be provided.

Get in touch if you would like to give a 5 minute talk at a future CCK on your latest research or give a quick demo your latest programming project, or even to nominate someone (students, postdocs, professionals, PIs, Emeritus Professors). The talks usually resemble one of the following styles:

  • an overview of computational chemistry in your research;
  • a (live!) demonstration of some software that you are developing or using; or
  • a summary of a computational chemistry paper, method, programming language, or tool that you’ve seen recently.

We would like to thank the University of Oxford MPLS Network and Interdisciplinary Fund for making CCK possible.

About CCK

Comp Chem Kitchen is a regular forum and seminar series to hear about and discuss computational methods for tackling problems in chemistry, biochemistry and drug discovery. It focuses principally on cheminformatics, computational chemistry, and molecular modelling, and overlaps with neighboring areas such as materials properties and bioinformatics.

We’re keen to encourage people involved in coding and methods development (i.e. hackers, in the original untarnished sense of the word) to join us. Our hope is that we will share best practices, even code snippets and software tools, and avoid re-inventing wheels.

In addition to local researchers, we invite speakers from industry and non-profits from time to time, and occasionally organize software demos and tutorials.

If you’re interested in giving a talk, here are some possible topics:

  • Software development (e.g.: Python, C, C++, CUDA, shell, Matlab);
  • Optimizing force field parameters & EVB models;
  • Cheminformatics (e.g.: RDKit);
  • X-ray and NMR crystallography, including small molecule and macromolecular;
  • Protein & RNA modeling, including Molecular Dynamics;
  • Virtual screening and Docking;
  • Machine Learning;
  • Quantum Methods, including DFT.

Bring your laptops, by the way, if you have something you’d like to show!


Want to speak? Ideas for speakers?

* If you have ideas for speakers, or would like to give a talk, let us know. We also invite lightning talks of 5 minutes (or fewer) from attendees, so if you have some cool code you’ve been working on and would like to demo, bring your laptop, smartphone, tablet, (wearable?) and tell us all about it. *

Please pass this message on to friends, colleagues, and students who may be interested too!

The main CCK web site is:
Follow us on Twitter: @CompChemKitchen
See you soon! We’re looking forward to seeing and hearing about the diverse range of computational molecular science that you’re cooking up…

—Garrett, Richard, Phil and Rob

[email protected]
[email protected]
[email protected]

CSD Python API example

Number of non-H atoms in molecules reported in the Cambridge Structural Database

An unexplained phenomenon in the CSD collection of molecular crystal structures is shown below.

The code below is included in Jerome Wicker’s talk from CCK-1 as a simple example of how to iterate through and extract information from the CSD using the Python API. The phenomenon has been noted many times by CCDC researchers.

The CSD Python API is used to retreive each crystal structure entry from the database using the EntryReader() iterator. The number of heavy atoms (non-hydrogen atoms) in the heaviest molecular component of organic crystal structure is appended to a list heavy_atoms.

Finally a histogram of these heavy atom counts shows that molecules with even numbers of heavy atoms are observed more frequently than those with odd numbers in the same range.

In [6]:
%pylab inline
Populating the interactive namespace from numpy and matplotlib

This page is a copy of an interactive Python 2 notebook exported from Jupyter. In Jupyter, the %pylab inline command above loads numerical and plotting libraries and ensures that plots appear in the notebook instead of in a separate window. It isn’t required if running python from the command line, and the required libraries (numpy and matplotlib) are reimported below for convenience.

In [7]:
from import EntryReader
from matplotlib import pyplot as plt
import numpy as np

csd_reader = EntryReader('CSD')
heavy_atoms = []

for entry in csd_reader:
    if entry.is_organic:
            mol = entry.molecule.heaviest_component
plt.xlabel('Number of heavy atoms',fontsize=20)
plt.ylabel('Hits in CSD', fontsize=20)

Limiting the x-axis range:

In [11]:
plt.xlabel('Number of heavy atoms',fontsize=20)
plt.ylabel('Hits in CSD', fontsize=20)
In [ ]: