AI Model Simulates 500 Million Years of Evolution to create new Proteins! ESM3 is a LLM for Biology.

The Nugget

  • ESM3, a large language model (LLM) for biology, simulates 500 million years of evolution to create novel proteins efficiently, revolutionizing protein engineering and biological research.

Make it stick

  • 🧬 Proteins are the "programs" that run the "system of life."
  • 🏭 Ribosome builds proteins from RNA, likened to tiny factories at the atomic scale.
  • 🌟 ESM3 can create the green fluorescent protein (GFP) with just 58% similarity to natural proteins, a process equivalent to 500 million years of evolution.
  • 🤖 Chain of Thought reasoning in ESM3: step-by-step generation process like solving a math problem step-by-step.

Key insights

The Power of ESM3

  • ESM3 is a large language model developed to simulate protein evolution and generate new proteins.
  • 500 Million Years of Evolution: The generated GFP protein demonstrates a sequence only 58% similar to the closest known natural fluorescent protein, a feat akin to simulating 500 million years of natural evolution.
  • AI Tokenization: Biological properties of proteins are tokenized for AI models to read and understand their sequence, structure, and function.

Biological Complexity and Programmability

  • Proteins as Life’s Building Blocks: Proteins control various life functions, including molecular engines, sensing, and information processing.
  • Natural to Synthetic Transition: Through ESM3, trial-and-error experiments can be supplanted by logical simulation, offering precise control over protein engineering.
  • Chain of Thought: Using similar step-by-step logic reasoning as advanced LLMs like ChatGPT improves the accuracy and plausibility of the protein designs generated.

Practical and Ethical Considerations

  • Open Source Commitment: ESM3’s model and code are open source, fostering collaborative research while adhering to non-commercial use.
  • Potential Impact: Applications include creating proteins to degrade plastics, developing new medicines, and potentially slowing aging.
  • Ethical Concerns: With powerful tech like ESM3, there are both enthusiastic and cautious viewpoints regarding its potential to modify life.

Technological Advancements in AI and Hardware

  • Fastest AI Chip Debate: A new company, Etched, claims its AI chip runs over 500,000 tokens per second, potentially eclipsing Nvidia’s GPUs in performance.
  • ASIC Specialization: The new ASIC chip is specialized for running Transformer models, outpacing multi-use graphic accelerators.

Key quotes

  • "If we could learn to read and write in the code of life, it would make biology programmable."
  • "Generating one by pure chance is astronomically unlikely; there are more possibilities than the number of atoms in the visible universe."
  • "Chain of Thought: it’s the same kind of approach that we would use for ChatGPT, improving its reasoning ability by thinking step by step."
  • "ESM3 can provide feedback to itself to improve the quality of its own generations."
  • "They’re releasing the weights and code for the ESM3 1.4 billion open model... it’s all open source."
This summary contains AI-generated information and may have important inaccuracies or omissions.