NEW Open Source AI Video (Multi-Consistent Characters + 30 Second Videos + More)

The Nugget

  • Story Diffusion is a groundbreaking open-source AI video model that generates up to 30-second clips with unprecedented character consistency, adherence to reality and physics, and lifelike animations - a major leap forward in AI video generation.

Make it stick

  • 🎭 Story Diffusion maintains remarkable character consistency in appearance, clothing, and body type across scenes
  • 🌀 It uses consistent self-attention to ensure key attributes are maintained between frames
  • 🎥 Motion prediction is used to animate natural transitions between generated images
  • 💪 Trained on just 8 GPUs (vs 10,000 for SOTA model SORA) yet achieves comparable results

Key insights

Unparalleled character consistency and realism

  • Generates videos up to 30 seconds long with incredible character consistency in face, clothing, body type
  • Characters maintain perfect consistency between shots and scenes, enabling believable AI videos and comics
  • Adheres to reality and physics far better than previous models - e.g. no characters suddenly appearing, objects passing through solid surfaces, etc.
  • Lifelike movement and expressive facial animations - characters appear animated vs wooden in other AI videos

Innovative approach using story splitting and consistent self-attention

  • Story splitting breaks a story into multiple text prompts describing parts of the narrative
  • Prompts are processed simultaneously to produce a sequence of images depicting the story
  • Consistent self-attention ensures each image shares key attributes (e.g. character height, shirt color) to maintain visual coherence
  • A motion predictor model then animates transitions between the generated images to create fluid video

Versatile applications from realistic to animated videos

  • Generates realistic videos of diverse scenes - e.g. handheld tourist footage with natural camera shake, moving and static elements
  • Excels at anime-style animation, enabling full AI-generated animated films
  • Can consistently include multiple characters across different scenes
  • Turns real reference images of people into graphic novel animations

Highly efficient architecture achieves SOTA results with 1250x less compute

  • Trained on just 8 GPUs vs 10,000 for SOTA SORA model from Google yet achieves comparable realism, consistency and fluidity
  • Indicates an extremely efficient architecture that democratizes access to high-quality AI video generation
  • Currently open source but lacks user interface - requires technical setup via GitHub or HuggingFace demo

Key quotes

  • "Story Diffusion is the best open-source video model that we've seen and it's creating videos up to 30 seconds long with an unbelievable level of character consistency and adherence to reality and physics."
  • "Story Diffusion is a real step forward in character consistency. We're not just talking about facial consistency, we're actually talking about consistency in clothing and body type."
  • "AI video is taking huge steps forwards right now and we're getting closer and closer to getting SORA-level videos in our hands and Story Diffusion shows a real evolution in character consistency as well as being able to create scenes that make realistic and cohesive sense."
This summary contains AI-generated information and may have important inaccuracies or omissions.