FFMPEG overview - Mulepedia

FFMPEG containers

✔️ Containers are specific file formats designed to pack muxed audio/video data streams together and be read by media players.

file format	container type
`.mp3`	audio only
`.adts`	audio only
`.opus`	audio only
`.mp4`	audio/video
`.avi`	audio/video
`.webm`	audio/video
`.hls`	streaming
`.dash`	streaming

Note : constraints exist on data streams that can be muxed together in a given container format

FFMPEG codecs

✔️ Codecs are encoders that encode/decode individual media frames into data streams that are then muxed together in a container.

codec name	codec type
`copy`	audio
`aac`	audio
`libopus`	audio
`libfdk_aac`	audio
`copy`	video
`gif`	video
`libx264`	video
`libvpx-vp9`	video

Note : media frames also are characterized by their raw format (e.g. rgb, yuv 420, pcm) on which filtering can be performed

FFMPEG bitstream filters

✔️ Bitstream filters perform bit-level operations on data packets without decoding them into frames (example: aac_adtstoasc)

FFMPEG commands

✔️ The typical FFMPEG command has a general syntax that looks like ffmpeg $1 $2 -i $3 $4 $5, where:

# an input / output can be either a file path or an url
ffmpeg \
    $1 \                                        # global command options
    $2 \                                        # input options
    -i $3 \                                     # input
    $4 \                                        # output options
    $5 \                                        # output

✔️ Basic usage (inputs can be files or urls) :

ffmpeg \
  -i <input1> -i <input2> \                       # inputs                            (files/urls)
  -map <input1>:<stream> \                        # select input stream 1             (-map <input1>:v to select all video streams)
  -map <input2>:<stream> \                        # select input stream 2             (-map <input2>:a to select all video streams)
  -c:a:0 <encoder.audio> -strict experimental \   # output first audio stream encoder (-c:a selects all output audio streams)
  -c:v:0 <encoder.video> -strict experimental \   # output first video stream encoder (-c:v selects all output video streams)
  -f <container.format> \                         # output container format
  <output.file>                                   # output file path

Note : audio and video streams are 0-indexed in a container

✔️ extract an audio file snippet to a mp3 file :

ffmpeg \
  -ss <timestamp> \                               # start time timestamp (00:00:00.000 format)
  -t <duration> \                                 # snippet duration (in seconds)
  -i <input> \                                    # input file
  -c:a:0 copy \                                   # copy first audio stream bits
  -f mp3 \                                        # output container format
  <output.file>                                   # output file path

✔️ Fine tune HTTP headers for url inputs :

ffmpeg \
  -user_agent <agent> \                           # user agent for input 1
  -headers <header> \                             # additional HTTP header for input 1
  -i <input1> \                                   # input 1 (url)
  -user_agent <agent> \                           # user agent for input 2
  -headers <header> \                             # additional HTTP header for input 2
  -headers <header> \                             # additional HTTP header for input 2
  -i <input2> \                                   # input 2 (url)
  -f <container.format> \                         # output container format
  <output.file>                                   # output file path

✔️ make pepe dance :

tracklist=$(find -P <mp3.files.directory> -maxdepth 1 -regextype 'posix-extended' -regex '^.+\.mp3$' | tr \\n \| | sed 's/.$//') && \
ffmpeg \
  -i "concat:$tracklist" \                        # input 1 (uses the concat protocol)
  -i /path/to/pepeds.gif \                        # input 2
  -map 0:0 \                                      # select input 1 audio stream
  -map 1:0 \                                      # select input 2 video stream
  -shortest \                                     # stop transcoding at the end of the shortest stream
  -c:a:0 aac \                                    # output audio stream encoder
  -b:a 320k \                                     # set output audio stream bitrate 
  -ar:a:0 44100 \                                 # set output audio stream sample rate (relevant as to normalization filter)
  -filter:a:0 "loudnorm" \                        # use filter : normalize output audio stream loudness
  -c:v:0 libx264 \                                # output video stream encoder
  -b:v:0 750k \                                   # set output video stream bitrate
  -s 498x357 \                                    # set video frame size (width x height)
  -r 20 \                                         # set video frame rate (Hz)
  -filter:v:0 "loop=loop=-1:size=32767:start=0" \ # use filter : repeat the video frames from 0 to 32767, to no end
  -f <container.format> \                         # output container format
  <output.file>                                   # output file path

✔️ Use a complex filtergraph with acopy and scale to resize a video (6) :

ffmpeg \
  -i <input> \                                    # input
  -filter_complex \                               # filtergraph definition
  '[0:a:0] acopy [audio];
  [0:v:0] scale=320:180 [scaled]' \
  -map '[scaled]' \                               # map filtergraph video output
  -map '[audio]' \                                # map filtergraph audio output
  -r 20 \                                         # set video frame rate (Hz)
  -f <container.format> \                         # output container format
  <output.file>                                   # output file path

✔️ Use a complex filtergraph with concat and scale to resize and concatenate two videos (6) :

ffmpeg \
  -i <input1> \                                   # input 1
  -i <input2> \                                   # input 2
  -filter_complex \                               # filtergraph definition
  '[0:a:0] [1:a:0] concat=v=0:a=1 [audio];
   [0:v:0] scale=320:180 [scaled0];
   [1:v:0] scale=320:180 [scaled1];
   [scaled0] [scaled1] concat=v=1:a=0 [scaled]' \ # use the scale and concat filters
  -map '[scaled]' \                               # map filtergraph video output
  -map '[audio]' \                                # map filtergraph audio output
  -r 20 \                                         # set video frame rate (Hz)
  -f <container.format> \                         # output container format
  <output.file>                                   # output file path

✔️ Output HTTP live streaming manifest and segments (compatible with adaptative streaming) :

ffmpeg \
  -i <input> \                                    # input
  -c:a aac \                                      # output audio streams encoder
  -b:a <bitrate> \                                # output audio streams bitrate
  -c:v libx264 \                                  # output video streams encoder
  -s <size> \                                     # set video frame size (width x height)
  -r <rate> \                                     # set video frame rate (Hz)
  -bufsize <size> \                               # bitrate control buffer size in bits/s, 2x maxrate         (global) 
  -maxrate 1M \                                   # max bitrate in bits/s, approx 1-10 Mbps                   (global) 
  -keyint_min <interval> \                        # min interval between IDR frames (2)                       (global)
  -g <size> \                                     # group of pictures size - max distance between key frames  (global)
  -sc_threshold <threshold> \                     # threshold for scene change detection                      (global)
  -crf <factor> \                                 # set the quality for constant quality mode (1)             (libx264)
  -preset <preset> \                              # encoding speed preset                                     (libx264)
  -f hls \                                        # use apple http live streaming muxer/container
  -hls_time <length> \                            # segments length (usually 2 to 12 seconds)
  -hls_playlist_type vod \                        # vod (static playlist) / event (append segments on the fly)
  <output.file>                                   # output path for m3u8 playlist and segments

✔️ Use HTTP live streaming to create a live playlist using the "sliding window" method :

ffmpeg \
  -re \                                           # read and process inputs in real time (VITAL FOR LIVE STREAMING)
  -i <input> \                                    # input
  -f hls \                                        # use apple http live streaming muxer/container
  -hls_time <length> \                            # segments length (usually 2 to 12 seconds)
  -hls_segment_type mpegts \                      # segments file format (use MPEG-2 Transport Stream by default)
  -hls_segment_filename <segment.file> \          # segments output path (3) (4)
  -hls_list_size <size> \                         # max number of segments present in the playlist at a given time (defaults to 5)
  -hls_flags <flag1+flag2...> \                   # flags as options for playlist creation (5)
  -hls_delete_threshold <threshold> \             # number of unreferenced segments to keep on disk before deletion by hls_flags delete_segments (set to 5)
  -hls_start_number_source generic \              # sets the #EXT-X-MEDIA-SEQUENCE tag in the playlist (defaults to start_number)
  -start_number 0 \                               # initial segment index in the playlist when using hls_start_number_source generic
  -master_pl_publish_rate <period> \              # updates the master playlist after <period> new segments are added
  -hls_allow_cache 1 \                            # allows the client to cache media segments (more fluid)
  -hls_enc 0 \                                    # disables segments encryption
  <output.file>                                   # output path for the m3u8 playlist

✔️ Other relevant options :

ffmpeg \
  -itsoffset <offset> \                           # read input 1 from offset <offset> in seconds
  -i <input1> \                                   # input 1
  -i <input2> \                                   # input 2
  -map 0:a \                                      # select all audio streams from input 1
  -map 1:v \                                      # select all video streams from input 2
  -filter:a "atempo=<rate>" \                     # use filter : adjusts audio streams speed by <rate>
  -c:v copy \                                     # copy all video streams
  -y \                                            # no-confirm overwrite output file
  -f <container.format> \                         # output container format
  <output.file>                                   # output file path

Footnotes

(1) Some details on -crf

(2) *For hls playlists, GOP size must be equal to keyint_min and match the segment duration : -g 100 -keyint_min 100 -hls_time 4 will stream at 100/4 = 25 fps

(3) The segment file path specified here will not be included in the manifest files, only the segment file name will

(4) Also, segments names must be unique so the file name specified can be expanded to include its index in the playlist by using a pattern such as segment%03d.ts as a segment file name

(5) Useful hls flags are :

delete_segments: auto delete segments that are no longer in the playlist after (segment duration + playlist duration) seconds - use by default
temp_file : write segment initially to segment.tmp and rename to segment.ts once processing is complete - use by default
omit_endlist : do not append the EXT-X-ENDLIST tag once inputs are exhausted and playlist finishes
split_by_time : allow segments to start on frames other than keyframes (jeopardizes videojs as of now)

(6) Remember that filters operate on streams and not on containers

(7) When using the concat demuxer, keep in mind that all the input files streams (audio or video) MUST HAVE THE SAME TIMEBASE which usually is 1 / frame rate for video streams and 1 / sample rate for audio streams.

Thus, THE AUDIO SAMPLE RATE AND VIDEO FRAME RATE HAVE TO BE THE SAME FOR ALL INPUT FILES. Also, -r <framerate> has to be passed as an input option to the concat demuxer so it ignores the timestamps stored in the source files and recalculates new timestamps for the source assuming a constant framerate.