How does our computing stack up?
January 29, 2019
TL;DR We should all be cloning dinosaurs.
Hello, I'm Josh. In 1993 I was in sixth grade and I picked up a copy of Jurassic Park by Michael Crichton in the Scholastic book order. I started it as soon as I got home from school and finished it that same night.
I’ve reread it many times since then and something has always stuck out to me. Jurassic Park is filled with details about computers and software. I’ve always wondered how realistic those parts of the book are, as well as how they stack up with general computing today. Let’s examine how the technology of Jurassic Park compares to what we work worth at Brand New Box.
Spoiler Alert! This post will include plot details from Jurassic Park. You have been warned!
Jurassic Park used three Cray XM-P supercomputers. These three computers ran all of the systems at the park and took care of the decoding, repair, and encoding of the DNA that allowed for the cloning of dinosaurs.
The Cray XM-P was the world’s fastest computer from 1983 to 1985. They cost $15 million each, which only included the computer itself and not any kind of data storage. Data storage was possible on disk pack hard drives. Although the technologies involved were very different from what we have today, each XM-P had roughly 2GB of memory and a clock speed of 105Mhz. The total work output of each unit was 0.8 GFLOPS, or 0.8 billion floating point operations per second.
Here at Brand New Box we write software that runs on public cloud providers like AWS, DigitalOcean, etc. Back in 2008 a study was conducted that showed the potential of using public cloud servers for scientific computing. That study showed that an AWS m1.small instance had a sustained output of 1.96 GFLOPS and the ability to burst to 4.40 GFLOPS. Most of our software runs of servers of comparable size to that m1.small instance. Across our projects, we have about 40 cloud servers up and running. Our sustained output is 78.4 GFLOPS.
Let’s say the combined 2.4 GFLOPS output of the three Cray XM-P computers is one dinosaur. Brand New Box operates enough hardware to run 32.6 dinosaurs!
BNB Score: 32.6 Dinosaurs.
The software in the book, written by Dennis Nedry and his team, had half a million lines of code. They inflated this number to two million when mentioned in the movie. I’m not sure how many lines of code were taken up by the nefarious backdoor that kicks off the collapse of the park systems, but I’m sure the half million would be a smaller number without it. Some relatively contemporary examples at that time were the Hubble telescope which was launched in 1990 and had two million lines of code, and Microsoft Windows 3.1 which had 2.5 million lines of code in 1992.
The project today I spend most of my time on has about 116,000 lines of code. A spot check of a few of our projects shows that this is an outlier. I estimate we’re not far from the half a million lines of code benchmarked by the book. We are working mostly in Ruby, so each of our lines is probably much more powerful than the C variant used in the book. In terms of software I rate Brand New Box at one dinosaur.
BNB Score: 1 Dinosaur.
In the book, the Cray XM-P computers spent most of their time decoding, splicing, and encoding DNA. Today we can run that same kind of genetic work in the cloud. Google has a open source AI assisted genetic tool called DeepVariant that can do much of the work detailed in the book. DeepVariant has won a FDA competition for accuracy, can account for low quality genetic data you might get out of amber, can sequence an entire human genome in less than two hours, and do so at a cost of less than $3. It makes the investment in $45 million worth of supercomputers sound less than great.
The real kicker here is the database that was created for working with the genetic data. Dennis Nedry was asked to design a database that would support three billion fields, like a table with three billion columns.
“Oh, come on, Nobody could be analyzing a DNA molecule.”
“With three billion records, I don’t know what else it could be.”
At Brand New Box we don’t work with bespoke databases. We prefer rock solid databases with millions of trouble free installs around the world and huge support systems to make sure everything stays up all of the time. The problem with these databases is that they top out around 4,000 columns per table at the max. In this respect, it’s difficult to compete with imaginary software from the late 1980’s. Our output is rounded down to zero dinosaurs.
BNB Score: Zero dinosaurs.
However! Luckily data structures for working with genetic data have been developed, as with the example of DeepVariant above. Maybe this one is a toss up.
Revised BNB score: ???
Ultimately, we should all be cloning dinosaurs. If you have any mosquitoes in amber, let us know!