ESM3, a large language model (LLM) for biology, simulates 500 million years of evolution to create novel proteins efficiently, revolutionizing protein engineering and biological research.
🧬 Proteins are the "programs" that run the "system of life."
🏭 Ribosome builds proteins from RNA, likened to tiny factories at the atomic scale.
🌟 ESM3 can create the green fluorescent protein (GFP) with just 58% similarity to natural proteins, a process equivalent to 500 million years of evolution.
🤖 Chain of Thought reasoning in ESM3: step-by-step generation process like solving a math problem step-by-step.
Key insights
The Power of ESM3
ESM3 is a large language model developed to simulate protein evolution and generate new proteins.
500 Million Years of Evolution: The generated GFP protein demonstrates a sequence only 58% similar to the closest known natural fluorescent protein, a feat akin to simulating 500 million years of natural evolution.
AI Tokenization: Biological properties of proteins are tokenized for AI models to read and understand their sequence, structure, and function.
Biological Complexity and Programmability
Proteins as Life’s Building Blocks: Proteins control various life functions, including molecular engines, sensing, and information processing.
Natural to Synthetic Transition: Through ESM3, trial-and-error experiments can be supplanted by logical simulation, offering precise control over protein engineering.
Chain of Thought: Using similar step-by-step logic reasoning as advanced LLMs like ChatGPT improves the accuracy and plausibility of the protein designs generated.
Practical and Ethical Considerations
Open Source Commitment: ESM3’s model and code are open source, fostering collaborative research while adhering to non-commercial use.
Potential Impact: Applications include creating proteins to degrade plastics, developing new medicines, and potentially slowing aging.
Ethical Concerns: With powerful tech like ESM3, there are both enthusiastic and cautious viewpoints regarding its potential to modify life.
Technological Advancements in AI and Hardware
Fastest AI Chip Debate: A new company, Etched, claims its AI chip runs over 500,000 tokens per second, potentially eclipsing Nvidia’s GPUs in performance.
ASIC Specialization: The new ASIC chip is specialized for running Transformer models, outpacing multi-use graphic accelerators.
Key quotes
"If we could learn to read and write in the code of life, it would make biology programmable."
"Generating one by pure chance is astronomically unlikely; there are more possibilities than the number of atoms in the visible universe."
"Chain of Thought: it’s the same kind of approach that we would use for ChatGPT, improving its reasoning ability by thinking step by step."
"ESM3 can provide feedback to itself to improve the quality of its own generations."
"They’re releasing the weights and code for the ESM3 1.4 billion open model... it’s all open source."
This summary contains AI-generated information and may have important inaccuracies or omissions.