HyperFrames: Write HTML, Render Video, Built for Agents

Video generation and video production are different problems.

A generative model can create a striking clip from a prompt, but a production workflow still needs precise timing, editable text, reusable layouts, consistent branding, audio mixing, and predictable output. Those requirements usually pull teams back toward timeline editors or custom rendering pipelines.

HeyGen’s open-source project HyperFrames takes a different approach: make the video composition an HTML document, then give AI coding agents the tools to edit, inspect, preview, and render it.

HyperFrames Demo

TL;DR: HyperFrames is most interesting as an authoring model for agent-generated video. Its source files are ordinary HTML, its animations can be seeked frame by frame, and its renderer captures deterministic frames through headless Chrome. The important caveat is that reproducibility depends on both the composition and the rendering environment being deterministic.

This article assumes basic familiarity with HTML, animation timelines, and command-line tools.

The problem HyperFrames is trying to solve

AI coding agents work best with artifacts they can read, modify, diff, and validate. Traditional video project files are poor fits for that workflow:

  • Their formats are often proprietary or difficult to inspect as text
  • Small visual changes may require manual timeline editing
  • Agent changes are difficult to review in Git
  • Rendering behavior may depend on real-time playback
  • Reusing a composition across hundreds of data variants requires extra automation

HyperFrames reframes video production as a software workflow:

Brief → Agent edits HTML → Lint → Preview → Render → Review → HTML diff

The result is not prompt-to-video generation. It is Video as Code: deterministic motion graphics and media compositions authored with web technologies.

The core idea: HTML is the video source file

HyperFrames compositions are plain HTML documents. Timing, layering, duration, and canvas dimensions are declared with data-* attributes.

The following is a simplified composition fragment:

<div
  id="stage"
  data-composition-id="my-video"
  data-start="0"
  data-duration="9"
  data-width="1920"
  data-height="1080"
>
  <video
    id="clip-1"
    data-start="0"
    data-duration="5"
    data-track-index="0"
    src="intro.mp4"
    muted
    playsinline
  ></video>

  <img
    id="overlay"
    data-start="2"
    data-duration="3"
    data-track-index="1"
    src="logo.png"
    alt=""
  />

  <audio
    id="bg-music"
    data-start="0"
    data-duration="9"
    data-track-index="2"
    data-volume="0.5"
    src="music.wav"
  ></audio>
</div>
AttributePurpose
data-startWhen the element enters the composition
data-durationHow long the element remains active
data-track-indexLayer and track ordering
data-width, data-heightComposition resolution
data-composition-idRoot composition identifier

This choice has practical consequences:

  1. Agents already understand the authoring format. They can modify HTML and CSS without learning a proprietary project schema.
  2. Changes are reviewable. A larger title, different color, or later entrance becomes an ordinary Git diff.
  3. Web capabilities remain available. CSS, SVG, Canvas, WebGL, GSAP, Lottie, and Three.js can all participate in a composition.

HTML alone is not the difficult part, however. The real engineering problem is making browser animation seekable and reproducible.

How deterministic rendering actually works

HyperFrames does not record a page playing in real time. Its engine independently positions the composition at every output frame and captures the browser compositor.

For each frame index:
  time = frameIndex / fps
  seek every timeline and media element to time
  wait for the frame to be ready
  capture pixels through Chrome BeginFrame
  send the ordered frame to FFmpeg

At a higher level, the rendering pipeline looks like this:

HTML composition
→ HyperFrames runtime injection
→ asset readiness gates
→ frame-adapter seek
→ Chrome HeadlessExperimental.beginFrame capture
→ ordered frame buffering
→ FFmpeg encoding and audio mixing
→ MP4

The key abstraction is the Frame Adapter. An adapter must be able to receive a target frame and place its animation runtime at the corresponding point without relying on elapsed wall-clock time. HyperFrames includes this behavior for GSAP and supports adapters for other animation systems.

GSAP animation comparison between HyperFrames and Remotion

Above: a GSAP timeline rendered through HyperFrames’ seek-driven runtime.

The determinism contract

The phrase “same input, identical output” needs a boundary around it.

HyperFrames can produce reproducible output when:

  • Every animation can be seeked to an arbitrary frame
  • Assets are fully loaded before capture
  • The composition avoids time-dependent side effects
  • Chrome, FFmpeg, fonts, and other rendering dependencies are pinned

The following patterns can break determinism:

Source of driftWhy it breaks reproducibility
Math.random()Produces different values unless replaced with a seeded generator
Date.now()Makes output depend on render time
setTimeout() / setInterval()Depend on wall-clock scheduling
requestAnimationFrame() loopsAdvance according to real-time browser playback
Late async operationsCan change the DOM after a frame is considered ready
Local fonts and browser versionsCan produce different line breaks or pixels across machines

Local rendering may therefore show small platform-specific differences. HyperFrames recommends Docker mode when exact reproducibility matters because it pins the rendering environment as well as the composition.

That distinction matters: the renderer provides a deterministic mechanism, but composition authors still have to follow the deterministic contract.

A reproducibility check you can run

A useful way to evaluate the central claim is to render the same composition twice and compare hashes:

npx hyperframes lint index.html
npx hyperframes render index.html --docker --output render-a.mp4
npx hyperframes render index.html --docker --output render-b.mp4

On macOS or Linux:

sha256sum render-a.mp4 render-b.mp4

On Windows PowerShell:

Get-FileHash render-a.mp4
Get-FileHash render-b.mp4

Matching hashes demonstrate byte-identical output for that composition and environment. A stronger evaluation would repeat the test after:

  1. Adding an unseeded random value
  2. Rendering locally on two operating systems
  3. Switching back to Docker mode
  4. Changing one HTML property and inspecting the resulting diff

This separates HyperFrames’ documented guarantee from assumptions about arbitrary browser content.

Why the authoring model fits AI agents

HyperFrames is described as agent-native because the full development loop can be driven through non-interactive commands and text files.

npx skills add heygen-com/hyperframes

The installed skills teach supported coding agents how to author compositions, use the CLI, work with GSAP, and add registry blocks. The important capability is not that an agent can generate a first draft. Many tools can do that. The useful part is the correction loop:

Generate composition
→ run hyperframes lint
→ open preview
→ render a draft
→ inspect visual output
→ edit the source
→ render again

Example requests can remain product-level rather than implementation-level:

“Create a 10-second product intro with a fade-in title, background footage, and music.”

“Make the title twice as large, switch to a dark palette, and add a fade-out.”

“Turn this CSV into a 9:16 animated chart with captions and narration.”

Because the result is HTML, each iteration remains inspectable. The agent is not editing an opaque binary project or asking a generative video model to recreate the whole result from scratch.

This does not remove the need for human review. Agents can still produce weak pacing, crowded typography, poor visual hierarchy, or technically valid but uninteresting motion. HyperFrames makes those problems editable; it does not automatically solve them.

Reusable blocks instead of repeated prompting

HyperFrames includes a catalog of reusable blocks and components for common production patterns:

  • Social overlays such as follow cards and lower thirds
  • Shader transitions such as light leaks, glitches, and cinematic zooms
  • Data visualizations and app showcases
  • Captions, notifications, and other interface-inspired motion graphics

Instagram Follow Block

Blocks can be added through the CLI:

npx hyperframes add flash-through-white
npx hyperframes add instagram-follow
npx hyperframes add data-chart

This matters for agent workflows because a reviewed block is more reliable than asking an agent to reinvent the same transition or overlay in every composition. The catalog becomes a design and implementation vocabulary shared by humans and agents.

Browse the current catalog at hyperframes.heygen.com/catalog.

Quick start

HyperFrames requires Node.js 22 or newer and FFmpeg.

npx hyperframes init my-video
cd my-video
npx hyperframes preview

After editing and previewing the composition:

npx hyperframes lint
npx hyperframes render index.html --output demo.mp4

For reproducible CI or cross-machine rendering:

npx hyperframes render index.html --docker --output demo.mp4

The project’s own launch video is also published as a worked example. It combines 17 sub-compositions with CSS, GSAP, Lottie, shaders, Three.js, captions, footage, and sound effects.

HyperFrames vs Remotion: the real distinction

HyperFrames is inspired by Remotion, and both projects use headless Chrome to render deterministic video. The difference is not that one is deterministic and the other is not. The main difference is what authors write and how external animation clocks are controlled.

FeatureHyperFramesRemotion
Primary authoring formatHTML + CSSReact components and TSX
Native animation modelSeekable frame adapters, including GSAPReact values driven by the current frame
External animation librariesDesigned around adapters for seekable timelinesRequire careful integration with Remotion’s frame model
Arbitrary HTML/CSSDirect authoring and passthroughUsually rewritten as JSX
Build stepHTML can play directlyBundler required
Distributed renderingSupports AWS Lambda; ecosystem is newerMature Lambda rendering workflow
LicenseApache 2.0Source-available with commercial licensing thresholds

Remotion GSAP comparison

The GIF above demonstrates what happens when a GSAP timeline runs on its own wall clock during a Remotion render. It is a useful illustration of the integration problem, but it should not be read as proof that Remotion animations are generally non-deterministic. Remotion’s native frame-driven animation model is deterministic.

Choose HyperFrames when ordinary HTML, direct CSS authoring, external animation runtimes, and agent editing are central requirements. Choose Remotion when React is already the team’s production language and its mature ecosystem and rendering infrastructure matter more.

Where HyperFrames fits

HyperFrames is a strong fit for:

  • Template-based marketing and product videos
  • Data-driven video variants generated at scale
  • Animated charts and developer explainers
  • Brand systems expressed as reusable HTML blocks
  • Agent-generated compositions that need to remain editable
  • Teams comfortable reviewing source code and rendered output

It is less compelling for:

  • Editors who primarily work through a visual timeline
  • Long-form footage-heavy projects centered on manual editorial judgment
  • Teams already productive with a mature Remotion pipeline
  • Compositions built around scripts that cannot be deterministically seeked
  • Workflows where nobody can maintain the underlying web code

The source format removes one barrier for agents, but it also moves video production closer to software engineering. That is an advantage only when the team wants that trade-off.

Architecture and package boundaries

PackageRole
hyperframesCLI for initialization, preview, linting, inspection, and rendering
@hyperframes/coreTypes, parsers, generators, linter, runtime, and frame adapters
@hyperframes/engineSeekable page capture through headless Chrome
@hyperframes/producerCapture, encoding, and audio-mixing pipeline
@hyperframes/studioBrowser-based composition editor
@hyperframes/playerEmbeddable <hyperframes-player> web component
@hyperframes/shader-transitionsWebGL shader transition library

The separation is sensible: the engine captures deterministic frames, the producer turns those frames and audio into finished media, and the higher-level CLI and studio provide authoring workflows.

Final assessment

HyperFrames’ strongest idea is not “HTML can become video.” Browsers have been capable visual renderers for years.

Its stronger contribution is a contract between three systems:

  1. An AI coding agent that can author and revise text files
  2. A seekable browser composition that can represent each frame independently
  3. A pinned rendering pipeline that can turn those frames into reproducible media

That combination makes video behave more like a maintainable software artifact. It is particularly compelling when the primary author is an agent and the output must remain reviewable, reusable, and editable as ordinary web code.

HyperFrames will not replace timeline editors or generative video models. It occupies a more specific and useful space between them: programmatic video production where control matters as much as generation.