Biotech Beginnings
Biologists and engineers could now use DNA like Lego, taking pieces from different sets (organisms) and combining them to make something brand new.
Using biology to solve problems and make useful products (Biotechnology) stretches back to the earliest human technologies, when brewers and bread-makers used yeast to hone their crafts. The modern biotechnology industry was born much later in the 1970s. The advent of recombinant DNA technologies developed at the Massachusetts Institute of Technology (MIT) in Boston marked a dramatic expansion of the scientific horizons for the field. Biologists and engineers could now use DNA like Lego, taking pieces from different sets (organisms) and combining them to make something brand new. Could pigs fly? Could we engineer bioweapons of mass destruction? What if rogue DNA got into the water supply? These were all very real questions at the time, which unsurprisingly came with a unique tension between progress and safety. The city of Cambridge, Massachusetts eventually placed a two-year moratorium on recombinant DNA research at Harvard and MIT. As a result, it became the first jurisdiction in the US with regulations for recombinant DNA work.
Despite the initial controversy, the potential benefit of recombinant DNA technology was indisputable. Scaled therapies, produced by biological factories, using any combination of DNA that could be isolated. By 1978, the first two modern biotechnology companies, Biogen and Genentech, were up and running, both using E. coli to mass produce therapies. Biogen initially licensed and manufactured immune system proteins to other pharma companies before launching a treatment for relapsing multiple sclerosis in 1996. Genentech was born from the partnership of an enthusiastic venture capitalist. Their lab work was launched in a high-stakes race where they became the first team to synthesize human insulin. They entered clinical trials in a licensing deal with Eli Lilly, reaching market in 1983, and eliminating the need for animal sources of insulin that had led to unpredictable supply chains, product inconsistencies, and allergies in diabetics — problems plaguing the industry since its discovery 60 years before. MIT put out a 30-minute documentary called “From Controversy to Cure” with great archival footage and storytelling of the early biotech industry in Boston.
A Genomic Leap Forward
The mere realization that biology is programmed by 4 base pairs (quaternary code: A-T-C-G) means that computers can read and write this code seamlessly.
Just seven years after Genentech’s Insulin breakthrough, a group of international scientists began work on the Human Genome Project (HGP). It’s important to note that there are 3.2 billion base pairs in the human genome, which at the time, were being deciphered by hand… Craig Venter, credited with leading the publication of the first human genome in 2001 said it was a project not unlike monks copying the bible (at best) when it started.
Two scientists manually read a sequence of DNA fragments. Source: Lessons from the Human Genome Project (YouTube)
Unlike recombinant DNA, the human genome project did not have as defined an endpoint. There is a human reference genome that continues to be updated, but the real value of the project was threefold. First, the HGP required thousands of scientists to work together, creating what is known today as “team science”. Second, the nearly instant sharing of data, pioneered the open-source era, giving access to hard-won genetic sequences. Lastly, the project created the field of genomics, which merged computation with genetics. The mere realisation that biology is programmed by 4 base pairs (quaternary code: A-T-C-G) means that computers can read and write this code seamlessly. The sequencing technology that turns a physical piece of DNA into code gave science and industry the tools to expand their understanding of complex, convoluted genetic information. The first human genome cost $2.7 billion and took 13 years to complete, but today, you can get your own genome sequenced for less than $200 in a matter of days.
Genomic sequencing created a literal blueprint and the “reading” infrastructure to engineer biology. The cutting edge work that took scientists in the 70s months and years to develop, can now be completed in a couple days by undergrads.
Modern Biotechnology
In one fell swoop, the number of protein structures that could be accurately mapped went from 150,000, to 200,000,000.
Today, biotechnology companies continue to apply the principles of recombinant engineering and genetic sequencing as the basis of understanding for performing complex biological engineering, but again, something in the water is changing.
That change of course is Artificial Intelligence (AI). Recombinant DNA gave us the ability to move Lego bricks between organisms, while the genome project gave us the ability to read millions of base pairs. CRISPR started the age of precision genome editing, and AI is now changing the landscape once again, by unlocking a completely new layer of sophistication in the field.
It turns out that only 1-2% of all those genes we have actually code for proteins, the machines, building blocks and processors of our biology. Coupled with the fact that there are nearly infinite ways to fold a protein based on its sequence and again nearly infinite ways for those proteins to interact with each other, it’s no wonder that new computational methods were required to unlock this vast web of 3-D machines.
AlphaFold is the most modern example of the exponential expansion in biotech capacity. Similar to the HGP, it took about 10 years for a significant breakthrough. This breakthrough came with the advent of Alphafold-2, a machine learning model that accurately predicted the structure of a protein with over 90% accuracy. In one fell swoop, the number of protein structures that could be accurately mapped went from 150,000 to 200,000,000. Veritasium just released a great summary of this technology on YouTube, so if you want to learn how we went from Peruvian whale meat to Alphafold, click HERE. Importantly, this is a single example of how AI is unlocking biotechnology. Experts in the field struggle to keep up with the latest developments, but one of note is the David Baker Lab using AI to model enzyme activity.
Molecular landscape of E. Coli - David S. Goodsell
These milestones represent a small percentage of the complexity of modern biotech. To frame this and further discussions in this series, our team came up with the following core principles that define the majority of the work being done in the field. These are matched with companies leading the charge and generic examples to provide context.
The core principles of biotechnology are thus:
The manipulation of genetic code to produce or manipulate traits/products.
Producing biological molecules (proteins, antibodies, nucleic acids)
Genetic modifications (introducing disease resistance in plants/agriculture)
Reducing the need for fertilizer with “self fertilizing” plants
Gene editing/engineering therapeutics including CRISPR and it’s derivatives
Gene therapy to reverse genetic disease like sickle cell anemia
Designing systems to alter pathways and processes innate to biology, towards a desired biological outcome.
Small molecules and Biologics (Advil, Ozempic, etc.)
Targeting pain, appetite and cancer causing pathways
Drug delivery (lipid nanoparticles)
Delivering genetic medicine to specific cells
De novo molecule generation and iteration models
Designing protein variants that can be tested in the lab or in patients
The use of data to design, predict, and explain complex biological phenomena.
Bioinformatics (predictive analytics, multimodal machine learning)
Studying the mutational patterns of a virus to predict the sequence of an emerging variant
With tools that can now operate on the scale of biological systems, biotechnology is entering a golden era. Seemingly basic problems like visualizing proteins, reading DNA and testing interactions on a molecular level have digitized hugely laborious and expensive endeavours, making the early stages of biological processes more transparent and actionable. It has opened up new ways to model enzyme activity, discover the role of non-coding RNA, decipher epigenetic information, and link disparate data sets for better disease classification.
Noubar Afeyan, CEO of Flagship Pioneering and founding member of Moderna, recently wrote about the convergence of polyintelligence as we enter a new golden age of biotechnology.
Our group is most excited about the prospect of harnessing these technologies to unlock the complexity of biology and solve meaningful problems. This series will provide practical insights into the potential, challenges, and opportunities that await a developing biotech ecosystem.
The views, opinions, and ideas expressed on this Substack are solely those of BlueSky Bio and do not represent the views, positions, or policies of our employers, any organization we are affiliated with, or any clients we may serve. Nothing published here should be construed as official communication from or on behalf of my employer or any other entity.
The content provided is for informational and educational purposes only and should not be considered professional advice. Readers should consult with appropriate professionals before making decisions based on any information presented here.
All content is provided "as is" without warranty of any kind, and we assume no responsibility for errors, omissions, or any consequences arising from the use of this information.