How to Merge Videos in iOS with Swift

Learn how to combine videos in iOS with Swift and jump into advanced creations.


6 min read
How to Merge Videos in iOS with Swift

Combining video clips or parts of video clips into a single video may appear complicated, but the things that add complexity also provide flexibility. In this tutorial, you will see how to combine a few short clips into a single video. You will also gain an understanding of building on this basic pattern to make more advanced creations. This tutorial was created and tested with Xcode 13 and Swift 5.

As with most AVFoundation code, the Xcode simulator is not always the best platform for running the code. Test on an actual device if you can. A demo project with code that supports this tutorial is available on Github.

Making a Composition

Whenever you want to edit media or add effects, the first place to look is AVComposition. This class in AVFoundation has the purpose of arranging different assets and types of assets into a single asset for playback or processing. Asset types can include audio and video subtitles, metadata, and text. When working with AVComposition it is helpful to think about AVAsset and AVAssetTrack objects. Unfortunately, because these items are so similar (AVComposition is a subclass of AVAsset) it is easy to get confused. Try to remember that the AVTrack is a wrapper around the actual pixel, sound-wave, or other data and that AVAsset is a collection of AVTrack objects. When we want to combine and manipulate AVAsset objects, an AVComposition helps us do that. Changes made to any AVAsset by an AVComposition will not impact the original media file.

As an extra layer, Apple keeps the editing features of an AVComposition separate from the playback features by creating an AVMutableComposition object. You will work with an AVMutableComposition object in this tutorial.

Each track in our composition will have a start time and a duration. The composition will ensure that all of the data it holds displays at the right time and that the tracks stay in sync.

At the simplest, a video clip is a single AVTrack of audio data and a single AVTrack of video data.

avasset

So, to set up an empty AVComposition for combining clips, use code such as this:

    let movie = AVMutableComposition()
    let videoTrack = movie.addMutableTrack(withMediaType: .video, preferredTrackID: kCMPersistentTrackID_Invalid)
    let audioTrack = movie.addMutableTrack(withMediaType: .audio, preferredTrackID: kCMPersistentTrackID_Invalid)

The code above creates a new, empty AVMutableComposition and then adds two tracks. One track will be able to hold .video assets, and the other will hold .audio assets. AVFoundation supports many different kinds of audio and video formats, but the .video track cannot hold .audio formats. The kCMPersistendTrackID_Invalid signals that you want a new, unique track id generated.

empty-composition

AVFoundation can manipulate any of the ISO media formats. The Quicktime (.mov) and MPEG (.mp4) are the most common, however, .heif, .3gp and other formats are all supported.

Adding Video Clips

After creation, the AVMutableComposition has a start time of CMTime.zero and a duration of CMTime.zero. Use the insertTimeRange(_ timeRange: CMTimeRange, of track: AVAssetTrack, at startTime: CMTime) throws function on the audio and again on the video track to add data from the clip. In order to add data to the tracks, you load an AVAsset, determine what range of time will go into the new composition, and then call the insert function. Your code might look like this:

  let beachMovie = AVURLAsset(url: Bundle.main.url(forResource: "beach", withExtension: "mov")!) //1
  
  let beachAudioTrack = beachMovie.tracks(withMediaType: .audio).first! //2
  let beachVideoTrack = beachMovie.tracks(withMediaType: .video).first!
  let beachRange = CMTimeRangeMake(start: CMTime.zero, duration: beachMovie.duration) //3
  try videoTrack?.insertTimeRange(beachRange, of: beachVideoTrack, at: CMTime.zero) //4
  try audioTrack?.insertTimeRange(beachRange, of: beachAudioTrack, at: CMTime.zero)

Here is what this code does:

  1. Load a video file from a local URL. In the example, the video is in the application bundle, but it could also be a URL that points to a file somewhere else. This will not work with a streaming URL. It needs to be a video file.

  2. Extract the main videos and audio tracks from the video file. There may be multiple video and audio tracks in a file, but this code just grabs the first one.

  3. Create a time range. In this example, the range is the same duration as the clip we loaded.

  4. Insert the video track from the clip at the beginning of the video track of the composition and insert the audio track from the clip at the beginning of the audio track of the composition. A common mistake in this step is to use the .duration property of the composition to insert the audio and video at the end. However, as soon as media is inserted into any track, the duration of the entire composition will change. So the audio and video would be played one after the other and you would see a blank screen when the audio is playing, and have no audio when the video is playing.

Repeat the steps for each clip that you want to add to the final composition.

completed-composition

Using the Composition

After all of the clips have been combined into the composition, it can be played in an AVPlayer or exported using an AVAssetExportSession. The code for these is the exact same as for any other AVAsset.

  self.player = AVPlayer(playerItem: AVPlayerItem(asset: movie)) //1
  let playerLayer = AVPlayerLayer(player: player) //2
  playerLayer.frame = playerView.layer.bounds
  playerLayer.videoGravity = .resizeAspect
  playerView.layer.addSublayer(playerLayer) //3
  player?.play() //4

Here is what the code above is doing:

  1. Create an AVPlayer using a movie object, which is the AVMutableComposition that was created earlier.
  2. Create a playerLayer to display the video and set its aspect ratio
  3. Add the playerLayer to playerView which is a regular UIView in a UIViewController
  4. Play the video clip

Alternatively you may want to export the new composition as a new movie file. Again, because the composition is an AVAsset using a standard AVAssetExportSession will write the movie to disk.

  //create exporter
  let exporter = AVAssetExportSession(asset: movie, 
                                 presetName: AVAssetExportPresetHighestQuality) //1
  //configure exporter
  exporter?.outputURL = outputMovieURL //2
  exporter?.outputFileType = .mov
  //export!
  exporter?.exportAsynchronously(completionHandler: { [weak exporter] in
    DispatchQueue.main.async {
      if let error = exporter?.error { //3
        print("failed \(error.localizedDescription)")
      } else {
        print("movie has been exported to \(outputMovieURL)")
      }
    }
  })

Here is what this code is doing:

  1. Create an export session using the movie composition you made earlier.
  2. Set the save location to some file URL on your device.
  3. Wait for the exporter to finish and display the result.

Don't forget that exporting an AVAsset can take a long time and is asynchronous. You will want to display a spinner or a message to your user.

Going Further

You can now combine clips into a longer clip for playback and export. However, trimming the original clips, adding filters, and rearranging the order will require more work before your application is ready for the store. Using an SDK like the IMG.LY's VideoEditor SDK can help you provide an app that has the features users will expect in addition to stitching clips together.

The VideoEditorSDK offers a Video Composition Controller which will allow the users to stitch clips together. Additionally, it will allow the user to add new clips and edit clips using the other editing tools. Because the Video Composition Controller is a subclass of UIViewController it can just be configured and dropped into your application. Instead of the code in our example, you pass the videos to the controller:

let videoClips = [
  Bundle.main.url(forResource: "stream", withExtension: "mov"),
  Bundle.main.url(forResource: "beach", withExtension: "mov"),
  Bundle.main.url(forResource: "mountain", withExtension: "mov"),
].compactMap { $0.map{ AVURLAsset(url: $0) } }
let video = Video(assets: videoClips)
let videoEditViewController = VideoEditViewController(videoAsset: video, configuration: Configuration())
present(videoEditViewController, animated: true, completion: nil)

The code above loads the same video clips as our demo application and then uses the VideoEditViewController to present them. From there the user can make further edits preview and export the composition.

stitch-clips

Wrapping Up

In this example, you saw how to combine several video clips, each with one video and one audio track into a single movie file with one video and one audio track. To make more complex movie files consider building on this tutorial by:

  • Add a second audio track, AVFoundation will play sounds from both tracks at times specified at equal volume. Use AVAudioMix to adjust the volume of the tracks at different times (the sample project has an example of adding a second audio track).

  • Add more video tracks and use AVMutableCompositionLayerInstruction to determine which track is visible at any time in the final video and to make fancy transitions between tracks.

Thanks for reading! We hope that you found this tutorial helpful. Feel free to reach out to us on Twitter with any questions, comments, or suggestions!

Looking to integrate video capabilities into your app? Check out our Video Editor SDK, Short Video Creation, and Camera SDK!

GO TOP