HyperFrames: Write HTML, Render Video, Built for Agents
Video generation and video production are different problems.
A generative model can create a striking clip from a prompt, but a production workflow still needs precise timing, editable text, reusable layouts, consistent branding, audio mixing, and predictable output. Those requirements usually pull teams back toward timeline editors or custom rendering pipelines.
HeyGen’s open-source project HyperFrames takes a different approach: make the video composition an HTML document, then give AI coding agents the tools to edit, inspect, preview, and render it.

TL;DR: HyperFrames is most interesting as an authoring model for agent-generated video. Its source files are ordinary HTML, its animations can be seeked frame by frame, and its renderer captures deterministic frames through headless Chrome. The important caveat is that reproducibility depends on both the composition and the rendering environment being deterministic.
This article assumes basic familiarity with HTML, animation timelines, and command-line tools.
The problem HyperFrames is trying to solve
AI coding agents work best with artifacts they can read, modify, diff, and validate. Traditional video project files are poor fits for that workflow:
- Their formats are often proprietary or difficult to inspect as text
- Small visual changes may require manual timeline editing
- Agent changes are difficult to review in Git
- Rendering behavior may depend on real-time playback
- Reusing a composition across hundreds of data variants requires extra automation
HyperFrames reframes video production as a software workflow:
Brief → Agent edits HTML → Lint → Preview → Render → Review → HTML diff
The result is not prompt-to-video generation. It is Video as Code: deterministic motion graphics and media compositions authored with web technologies.
The core idea: HTML is the video source file
HyperFrames compositions are plain HTML documents. Timing, layering, duration, and canvas dimensions are declared with data-* attributes.
The following is a simplified composition fragment:
<div
id="stage"
data-composition-id="my-video"
data-start="0"
data-duration="9"
data-width="1920"
data-height="1080"
>
<video
id="clip-1"
data-start="0"
data-duration="5"
data-track-index="0"
src="intro.mp4"
muted
playsinline
></video>
<img
id="overlay"
data-start="2"
data-duration="3"
data-track-index="1"
src="logo.png"
alt=""
/>
<audio
id="bg-music"
data-start="0"
data-duration="9"
data-track-index="2"
data-volume="0.5"
src="music.wav"
></audio>
</div>
| Attribute | Purpose |
|---|---|
data-start | When the element enters the composition |
data-duration | How long the element remains active |
data-track-index | Layer and track ordering |
data-width, data-height | Composition resolution |
data-composition-id | Root composition identifier |
This choice has practical consequences:
- Agents already understand the authoring format. They can modify HTML and CSS without learning a proprietary project schema.
- Changes are reviewable. A larger title, different color, or later entrance becomes an ordinary Git diff.
- Web capabilities remain available. CSS, SVG, Canvas, WebGL, GSAP, Lottie, and Three.js can all participate in a composition.
HTML alone is not the difficult part, however. The real engineering problem is making browser animation seekable and reproducible.
How deterministic rendering actually works
HyperFrames does not record a page playing in real time. Its engine independently positions the composition at every output frame and captures the browser compositor.
For each frame index:
time = frameIndex / fps
seek every timeline and media element to time
wait for the frame to be ready
capture pixels through Chrome BeginFrame
send the ordered frame to FFmpeg
At a higher level, the rendering pipeline looks like this:
HTML composition
→ HyperFrames runtime injection
→ asset readiness gates
→ frame-adapter seek
→ Chrome HeadlessExperimental.beginFrame capture
→ ordered frame buffering
→ FFmpeg encoding and audio mixing
→ MP4
The key abstraction is the Frame Adapter. An adapter must be able to receive a target frame and place its animation runtime at the corresponding point without relying on elapsed wall-clock time. HyperFrames includes this behavior for GSAP and supports adapters for other animation systems.

Above: a GSAP timeline rendered through HyperFrames’ seek-driven runtime.
The determinism contract
The phrase “same input, identical output” needs a boundary around it.
HyperFrames can produce reproducible output when:
- Every animation can be seeked to an arbitrary frame
- Assets are fully loaded before capture
- The composition avoids time-dependent side effects
- Chrome, FFmpeg, fonts, and other rendering dependencies are pinned
The following patterns can break determinism:
| Source of drift | Why it breaks reproducibility |
|---|---|
Math.random() | Produces different values unless replaced with a seeded generator |
Date.now() | Makes output depend on render time |
setTimeout() / setInterval() | Depend on wall-clock scheduling |
requestAnimationFrame() loops | Advance according to real-time browser playback |
| Late async operations | Can change the DOM after a frame is considered ready |
| Local fonts and browser versions | Can produce different line breaks or pixels across machines |
Local rendering may therefore show small platform-specific differences. HyperFrames recommends Docker mode when exact reproducibility matters because it pins the rendering environment as well as the composition.
That distinction matters: the renderer provides a deterministic mechanism, but composition authors still have to follow the deterministic contract.
A reproducibility check you can run
A useful way to evaluate the central claim is to render the same composition twice and compare hashes:
npx hyperframes lint index.html
npx hyperframes render index.html --docker --output render-a.mp4
npx hyperframes render index.html --docker --output render-b.mp4
On macOS or Linux:
sha256sum render-a.mp4 render-b.mp4
On Windows PowerShell:
Get-FileHash render-a.mp4
Get-FileHash render-b.mp4
Matching hashes demonstrate byte-identical output for that composition and environment. A stronger evaluation would repeat the test after:
- Adding an unseeded random value
- Rendering locally on two operating systems
- Switching back to Docker mode
- Changing one HTML property and inspecting the resulting diff
This separates HyperFrames’ documented guarantee from assumptions about arbitrary browser content.
Why the authoring model fits AI agents
HyperFrames is described as agent-native because the full development loop can be driven through non-interactive commands and text files.
npx skills add heygen-com/hyperframes
The installed skills teach supported coding agents how to author compositions, use the CLI, work with GSAP, and add registry blocks. The important capability is not that an agent can generate a first draft. Many tools can do that. The useful part is the correction loop:
Generate composition
→ run hyperframes lint
→ open preview
→ render a draft
→ inspect visual output
→ edit the source
→ render again
Example requests can remain product-level rather than implementation-level:
“Create a 10-second product intro with a fade-in title, background footage, and music.”
“Make the title twice as large, switch to a dark palette, and add a fade-out.”
“Turn this CSV into a 9:16 animated chart with captions and narration.”
Because the result is HTML, each iteration remains inspectable. The agent is not editing an opaque binary project or asking a generative video model to recreate the whole result from scratch.
This does not remove the need for human review. Agents can still produce weak pacing, crowded typography, poor visual hierarchy, or technically valid but uninteresting motion. HyperFrames makes those problems editable; it does not automatically solve them.
Reusable blocks instead of repeated prompting
HyperFrames includes a catalog of reusable blocks and components for common production patterns:
- Social overlays such as follow cards and lower thirds
- Shader transitions such as light leaks, glitches, and cinematic zooms
- Data visualizations and app showcases
- Captions, notifications, and other interface-inspired motion graphics

Blocks can be added through the CLI:
npx hyperframes add flash-through-white
npx hyperframes add instagram-follow
npx hyperframes add data-chart
This matters for agent workflows because a reviewed block is more reliable than asking an agent to reinvent the same transition or overlay in every composition. The catalog becomes a design and implementation vocabulary shared by humans and agents.
Browse the current catalog at hyperframes.heygen.com/catalog.
Quick start
HyperFrames requires Node.js 22 or newer and FFmpeg.
npx hyperframes init my-video
cd my-video
npx hyperframes preview
After editing and previewing the composition:
npx hyperframes lint
npx hyperframes render index.html --output demo.mp4
For reproducible CI or cross-machine rendering:
npx hyperframes render index.html --docker --output demo.mp4
The project’s own launch video is also published as a worked example. It combines 17 sub-compositions with CSS, GSAP, Lottie, shaders, Three.js, captions, footage, and sound effects.
HyperFrames vs Remotion: the real distinction
HyperFrames is inspired by Remotion, and both projects use headless Chrome to render deterministic video. The difference is not that one is deterministic and the other is not. The main difference is what authors write and how external animation clocks are controlled.
| Feature | HyperFrames | Remotion |
|---|---|---|
| Primary authoring format | HTML + CSS | React components and TSX |
| Native animation model | Seekable frame adapters, including GSAP | React values driven by the current frame |
| External animation libraries | Designed around adapters for seekable timelines | Require careful integration with Remotion’s frame model |
| Arbitrary HTML/CSS | Direct authoring and passthrough | Usually rewritten as JSX |
| Build step | HTML can play directly | Bundler required |
| Distributed rendering | Supports AWS Lambda; ecosystem is newer | Mature Lambda rendering workflow |
| License | Apache 2.0 | Source-available with commercial licensing thresholds |

The GIF above demonstrates what happens when a GSAP timeline runs on its own wall clock during a Remotion render. It is a useful illustration of the integration problem, but it should not be read as proof that Remotion animations are generally non-deterministic. Remotion’s native frame-driven animation model is deterministic.
Choose HyperFrames when ordinary HTML, direct CSS authoring, external animation runtimes, and agent editing are central requirements. Choose Remotion when React is already the team’s production language and its mature ecosystem and rendering infrastructure matter more.
Where HyperFrames fits
HyperFrames is a strong fit for:
- Template-based marketing and product videos
- Data-driven video variants generated at scale
- Animated charts and developer explainers
- Brand systems expressed as reusable HTML blocks
- Agent-generated compositions that need to remain editable
- Teams comfortable reviewing source code and rendered output
It is less compelling for:
- Editors who primarily work through a visual timeline
- Long-form footage-heavy projects centered on manual editorial judgment
- Teams already productive with a mature Remotion pipeline
- Compositions built around scripts that cannot be deterministically seeked
- Workflows where nobody can maintain the underlying web code
The source format removes one barrier for agents, but it also moves video production closer to software engineering. That is an advantage only when the team wants that trade-off.
Architecture and package boundaries
| Package | Role |
|---|---|
hyperframes | CLI for initialization, preview, linting, inspection, and rendering |
@hyperframes/core | Types, parsers, generators, linter, runtime, and frame adapters |
@hyperframes/engine | Seekable page capture through headless Chrome |
@hyperframes/producer | Capture, encoding, and audio-mixing pipeline |
@hyperframes/studio | Browser-based composition editor |
@hyperframes/player | Embeddable <hyperframes-player> web component |
@hyperframes/shader-transitions | WebGL shader transition library |
The separation is sensible: the engine captures deterministic frames, the producer turns those frames and audio into finished media, and the higher-level CLI and studio provide authoring workflows.
Final assessment
HyperFrames’ strongest idea is not “HTML can become video.” Browsers have been capable visual renderers for years.
Its stronger contribution is a contract between three systems:
- An AI coding agent that can author and revise text files
- A seekable browser composition that can represent each frame independently
- A pinned rendering pipeline that can turn those frames into reproducible media
That combination makes video behave more like a maintainable software artifact. It is particularly compelling when the primary author is an agent and the output must remain reviewable, reusable, and editable as ordinary web code.
HyperFrames will not replace timeline editors or generative video models. It occupies a more specific and useful space between them: programmatic video production where control matters as much as generation.