Build this with AI in minutes
This tutorial is a great way to understand how CE.SDK works under the hood. But if you just want to get to the result, you can build this entire editor using IMG.LY Agent Skills, no manual setup required.
Install the skills, then run:
/cesdk:build Build a CapCut-like video editor with a dark theme, multi-track timeline, AI video/image/audio generation, background removal, and MP4 export.
Want to understand a specific concept in depth? Use /cesdk:explain — for example:
/cesdk:explain How does the video timeline and block hierarchy work?
Introduction
CapCut has set the standard for accessible video editing. Its dark UI, intuitive timeline, and AI-powered features make it feel like a professional tool that anyone can use. But what if you could build something similar embedded directly in your own web application in minutes? Yes, minutes!
In this tutorial, we'll walk through how we built a CapCut-like video editor using IMG.LY's CreativeEditor SDK (CE.SDK) for React. The finished editor includes:
- A multi-track timeline for video, audio, text, and captions
- Trim, split, and join operations on video clips
- A CapCut-inspired dark theme built with CSS custom properties
- AI video generation (text-to-video, image-to-video) via Minimax, Kling, and Pixverse
- AI image generation (text-to-image, image editing) via RecraftV3, IdeogramV3, and GPT Image
- AI audio generation (text-to-speech, sound effects) via ElevenLabs
- AI text generation (copywriting, translation) via Anthropic Claude
- Background removal — client-side, powered by WebAssembly/WebGPU
- MP4 export directly from the browser
The entire editor is a single React component with roughly 180 lines of JavaScript and 100 lines of CSS.
Architecture Overview
CE.SDK has two main layers:
- CreativeEngine: The headless core that manages scenes, blocks, assets, and rendering. It handles the video timeline, playback, and export entirely client-side.
- CreativeEditor UI: A pre-built, customizable UI layer that wraps the engine with a dock, inspector, timeline, canvas, and navigation bar.
Plugins extend both layers. The AiApps plugin, for example, registers AI providers with the engine and injects UI components (dock buttons, canvas menu items, generation panels) into the editor.
Step 1: Project Setup
We scaffolded a React project with Vite and installed CE.SDK alongside the AI plugin packages:
npm create vite@latest capcut-like-editor -- --template react
cd capcut-like-editor
npm install
# Core SDK
npm install @cesdk/cesdk-js
# AI plugins
npm install @imgly/plugin-ai-apps-web
npm install @imgly/plugin-ai-video-generation-web
npm install @imgly/plugin-ai-image-generation-web
npm install @imgly/plugin-ai-audio-generation-web
npm install @imgly/plugin-ai-text-generation-web
# Background removal (runs locally via WASM/WebGPU)
npm install @imgly/plugin-background-removal-web onnxruntime-web@1.21.0
Why so many packages?
CE.SDK follows a modular plugin architecture. Each AI capability is a separate package with its own provider modules. This means you only bundle what you use if you don't need audio generation, don't install @imgly/plugin-ai-audio-generation-web. The @imgly/plugin-ai-apps-web package is the unifying layer that brings them all together into a single dock panel.
Step 2: The React Component
CE.SDK provides a first-class React wrapper via @cesdk/cesdk-js/react. The <CreativeEditor> component handles mounting, initialization, and cleanup:
import CreativeEditor from '@cesdk/cesdk-js/react';
const config = {
// license: 'YOUR_CESDK_LICENSE_KEY',
};
const init = async (cesdk) => {
// All setup happens here — theme, assets, plugins, UI customization
};
export default function VideoEditor() {
return (
<CreativeEditor
config={config}
init={init}
width="100vw"
height="100vh"
/>
);
}
The config object is passed to the engine at creation time. The init callback fires once the cesdk instance is ready — this is where all our customization lives.
Pitfall: Silent init errors. The<CreativeEditor>component swallows errors thrown insideinit. If something fails, the editor loads but appears broken with no console output. Always wrapinitin atry/catch:
Step 3: Creating the Video Scene
A video editor needs three things at startup: asset sources, a video scene, and a timeline.
// Load built-in asset libraries (stickers, shapes, filters, typefaces, etc.)
await cesdk.addDefaultAssetSources();
// Load demo content (sample videos, images, audio) + enable upload slots
await cesdk.addDemoAssetSources({
sceneMode: 'Video',
withUploadAssetSources: true,
});
// Create a video scene with timeline
await cesdk.createVideoScene();
createVideoScene() sets up the scene hierarchy that powers the timeline:
Scene
└── Page (represents the video canvas — 1920x1080 by default)
├── Track (video track — holds video clips in sequence)
├── Track (overlay track — text, stickers, images)
├── CaptionTrack (subtitles synced to playback)
└── Audio (background music, voiceover, sound effects)
Each element on the timeline is a block with timing properties: timeOffset (when it appears) and duration (how long it plays). The engine handles rendering each frame, compositing layers, and synchronizing audio.
Browser support
Video editing relies on modern web codecs (WebCodecs API), which are available in Chromium-based browsers (Chrome, Edge, Brave). Safari and Firefox support is limited.
Step 4: The CapCut Dark Theme
CapCut's visual identity is defined by its deep charcoal backgrounds, teal accent colors, and subtle elevation layers. CE.SDK's theming system maps perfectly to this through CSS custom properties.
Setting the base theme
cesdk.ui.setTheme('dark');
This activates CE.SDK's built-in dark theme. But we want to go further — we need CapCut's specific color palette.
Custom CSS overrides
CE.SDK scopes all its UI under .ubq-public with data-ubq-theme and data-ubq-scale attributes. We override the CSS custom properties to inject our palette:
.ubq-public[data-ubq-theme='dark'][data-ubq-scale='normal'],
.ubq-public[data-ubq-theme='dark'][data-ubq-scale='modern'] {
/* Deep charcoal backgrounds — darker than CE.SDK's default dark */
--ubq-canvas: hsl(220, 15%, 8%) !important;
--ubq-elevation-1: hsl(220, 13%, 12%) !important;
--ubq-elevation-2: hsl(220, 12%, 15%) !important;
--ubq-elevation-3: hsl(220, 10%, 18%) !important;
/* High-contrast white text */
--ubq-foreground-default: hsla(0, 0%, 100%, 0.92) !important;
--ubq-foreground-light: hsla(0, 0%, 100%, 0.55) !important;
/* CapCut's signature teal/cyan accent */
--ubq-interactive-accent-default: hsl(190, 85%, 48%) !important;
--ubq-interactive-accent-hover: hsl(190, 85%, 42%) !important;
--ubq-interactive-accent-pressed: hsl(190, 85%, 36%) !important;
/* Subtle borders — barely visible, like CapCut */
--ubq-border-default: hsla(0, 0%, 100%, 0.08) !important;
}
The key design decisions:
| Property | Value | Why |
|---|---|---|
--ubq-canvas |
hsl(220, 15%, 8%) |
Near-black with a slight blue tint — matches CapCut's canvas area |
--ubq-elevation-1/2/3 |
12% → 15% → 18% lightness | Subtle elevation steps create depth without harsh contrast |
--ubq-interactive-accent-* |
hsl(190, 85%, 48%) |
CapCut's teal — used for buttons, selections, and progress bars |
--ubq-border-default |
8% opacity white | Nearly invisible borders that only appear on close inspection |
Responsive scale
We also configure responsive scaling so the editor adapts to touch devices:
cesdk.ui.setScale(({ containerWidth, isTouch }) => {
if ((containerWidth && containerWidth < 768) || isTouch) {
return 'large'; // Bigger touch targets on small/touch screens
}
return 'normal'; // Standard desktop sizing
});
Step 5: AI Plugin Integration
This is where the editor transforms from a basic video tool into something that feels like CapCut's AI-powered experience. CE.SDK's plugin system lets us add all AI capabilities through a single unified AiApps plugin.
The unified AiApps approach
Instead of registering each AI plugin separately, @imgly/plugin-ai-apps-web provides a single entry point:
import AiApps from '@imgly/plugin-ai-apps-web';
import FalAiVideo from '@imgly/plugin-ai-video-generation-web/fal-ai';
import FalAiImage from '@imgly/plugin-ai-image-generation-web/fal-ai';
import OpenAiImage from '@imgly/plugin-ai-image-generation-web/open-ai';
import Elevenlabs from '@imgly/plugin-ai-audio-generation-web/elevenlabs';
import Anthropic from '@imgly/plugin-ai-text-generation-web/anthropic';
await cesdk.addPlugin(
AiApps({
dryRun: true, // Simulate for development — no API calls
providers: {
text2text: Anthropic.AnthropicProvider({ proxyUrl: PROXY_URL }),
text2image: [ FalAiImage.RecraftV3({ proxyUrl }), /* ... */ ],
image2image: [ FalAiImage.GeminiFlashEdit({ proxyUrl }), /* ... */ ],
text2video: [ FalAiVideo.MinimaxVideo01Live({ proxyUrl }), /* ... */ ],
image2video: [ FalAiVideo.MinimaxVideo01LiveImageToVideo({ proxyUrl }), /* ... */ ],
text2speech: Elevenlabs.ElevenMultilingualV2({ proxyUrl }),
text2sound: Elevenlabs.ElevenSoundEffects({ proxyUrl }),
},
})
);
Provider categories explained
| Category | What it does | Models we configured |
|---|---|---|
text2text |
AI copywriting — improve text, translate, change tone | Anthropic Claude |
text2image |
Generate images from text prompts | RecraftV3, IdeogramV3, GPT Image |
image2image |
Transform existing images with AI | Gemini Flash Edit, GPT Image |
text2video |
Generate video clips from text descriptions | Minimax Video, Kling Video, Pixverse |
image2video |
Animate a static image into video | Minimax Video, Kling Video |
text2speech |
Convert text to spoken audio with voice selection | ElevenLabs Multilingual V2 |
text2sound |
Generate sound effects from text | ElevenLabs Sound Effects |
When multiple providers are configured in an array (like text2image), the UI automatically shows a provider/model selection dropdown so users can choose which AI model to use.
The proxy server requirement
Every provider takes a proxyUrl parameter. This is critical for production:
Browser → Your Proxy Server → AI Provider (fal.ai, ElevenLabs, etc.)
↑
Injects API keys
server-side
Your API keys should never be in client-side code. The proxy server receives requests from CE.SDK, attaches your API key, and forwards to the AI provider. During development, dryRun: true simulates generation without any API calls.
Background removal
Background removal is a separate plugin because it runs entirely client-side — no proxy needed:
import BackgroundRemovalPlugin from '@imgly/plugin-background-removal-web';
await cesdk.addPlugin(
BackgroundRemovalPlugin({
ui: { locations: ['canvasMenu'] },
})
);
This uses ONNX Runtime (WebAssembly + WebGPU) to run an AI segmentation model directly in the browser. The first run downloads ~40MB of model weights, which are then cached. Select any image on the canvas and click "Remove Background" in the context menu.
Step 6: UI Customization with the Component Order API
CE.SDK's UI is built from five customizable areas: Dock, Inspector Bar, Canvas Menu, Navigation Bar, and Canvas Bar. The Component Order API lets us insert, remove, and reorder components in each area.
Adding the AI button to the dock
We want the AI Apps button to be the first thing users see in the dock:
cesdk.ui.insertOrderComponent(
{ in: 'ly.img.dock', position: 'start' },
'ly.img.ai.apps.dock'
);
insertOrderComponent takes a location specifier and the component(s) to insert. position: 'start' puts it at the top of the dock. Alternative positions include 'end', a numeric index, or relative placement with before/after matchers.
AI options in the canvas context menu
When a user selects a text or image block and right-clicks, we want AI options available:
cesdk.ui.insertOrderComponent(
{ in: 'ly.img.canvas.menu', position: 'start' },
['ly.img.ai.text.canvasMenu', 'ly.img.ai.image.canvasMenu']
);
Passing an array inserts multiple components at once.
Custom export button in the navigation bar
We add an Export button with a real click handler that triggers MP4 export:
cesdk.ui.insertOrderComponent(
{ in: 'ly.img.navigation.bar', position: 'end' },
{
id: 'ly.img.action.navigationBar',
key: 'export',
label: 'Export',
icon: '@imgly/Download',
onClick: async () => {
const engine = cesdk.engine;
const page = engine.block.findByType('page')[0];
if (page) {
const blob = await engine.block.exportVideo(page, {
mimeType: 'video/mp4',
});
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'video.mp4';
a.click();
URL.revokeObjectURL(url);
}
},
}
);
The exportVideo method renders every frame of the timeline — compositing video tracks, overlays, text, captions, and audio — into an MP4 blob entirely in the browser.
Step 7: Wiring Generated Audio into the Asset Library
AI-generated audio (speech and sound effects) is stored in provider-specific history sources. To make this audio browsable alongside regular audio assets, we inject the history source into the audio asset library entry:
const audioEntry = cesdk.ui.getAssetLibraryEntry('ly.img.audio');
if (audioEntry != null) {
const existingSourceIds = Array.isArray(audioEntry.sourceIds)
? audioEntry.sourceIds
: audioEntry.sourceIds({});
cesdk.ui.updateAssetLibraryEntry('ly.img.audio', {
sourceIds: [...existingSourceIds, 'ly.img.ai.audio-generation.history'],
});
}
Now when users open the Audio panel in the dock, they'll see their AI-generated audio alongside the sample library.
Step 8: Custom Labels with i18n
CE.SDK's i18n system lets us customize any UI string. We used it to make the AI prompt fields more inviting:
cesdk.i18n.setTranslations({
en: {
'ly.img.plugin-ai-video-generation-web.fal-ai/minimax/video-01-live.property.prompt':
'Describe your video...',
'ly.img.plugin-ai-image-generation-web.fal-ai/recraft-v3.property.prompt':
'Describe your image...',
},
});
Translation keys follow the pattern {plugin-id}.{provider-id}.property.{field}. You can also add multi-language support by including keys for de, fr, es, etc.
The Complete File Structure
capcut-like-editor/
├── src/
│ ├── App.jsx # Root — imports VideoEditor + theme CSS
│ ├── VideoEditor.jsx # The entire editor (single component, ~180 lines)
│ ├── capcut-theme.css # CapCut dark theme overrides (~100 lines)
│ ├── index.css # Global reset (margin/padding/overflow)
│ └── main.jsx # React entry point
├── index.html
├── package.json
└── vite.config.js
That's it. The entire CapCut-like editor is three files: one React component, one CSS file, and a thin App wrapper.
What We Get Out of the Box
Because CE.SDK's video UI is pre-built, we didn't write any code for these features — they come from createVideoScene() and the default UI:
- Multi-track timeline with drag-to-reorder, drag-to-resize
- Trim and split — drag clip edges or use the split tool
- Join and arrange — drag clips between tracks
- Transform — crop, flip, rotate via the inspector
- Playback controls — play, pause, seek, scrub
- Text overlays — add styled text with the text tool
- Stickers and graphics — from the asset library
- Filters and effects — LUT-based color grading
- Undo/redo — full history stack
- Zoom and pan — standard canvas navigation
The AI plugins add their own UI components (dock buttons, generation panels, context menu items) through the plugin system. We just positioned them where we wanted.
Going to Production
To take this from a prototype to production, you need three things:
1. License key
Get a free trial key at https://img.ly/forms/contact-sales and set it in the config (yes, if you want to ship to production you'll have to talk to our lovely colleagues in sales):
const config = {
license: 'YOUR_CESDK_LICENSE_KEY',
};
2. Proxy server
Set up a server that forwards AI requests with your API keys:
const PROXY_URL = 'https://your-server.com/api/ai-proxy';
See the CE.SDK Proxy Server guide for Express.js and other server examples.
3. Disable dry run
Remove dryRun: true from the AiApps configuration to enable real AI generation.
Key Takeaways
- CE.SDK does the heavy lifting. The timeline, playback, rendering, and export are all handled by the engine. We wrote zero video processing code.
- The plugin system is powerful. Six AI capabilities were added with a single
addPlugin(AiApps({ ... }))call. Background removal was one more call. - CSS custom properties make theming painless. We matched CapCut's aesthetic by overriding ~25 CSS variables. No forking, no patching.
- The Component Order API is the customization backbone.
insertOrderComponentwith position-based placement is the cleanest pattern for adding UI elements. - Wrap
initin try/catch. CE.SDK's<CreativeEditor>swallows errors silently. This is the single most important debugging tip. - Pin your package versions. All
@imgly/*plugins must match the@cesdk/cesdk-jsversion exactly. A version mismatch (like 1.68 vs 1.69) will cause peer dependency conflicts.