Keyframes and encoding

What keyframes are and how they optimize video encoding

view on github

✔️ An 'I-frame picture' (MPEG-2) (also known as an 'intra-only frame' (VP9) or a 'coded picture in which all slices are I or SI slices' (H.264)) is a compressed frame that doesn’t depend on the contents of any earlier decoded frame.

✔️ In video coding, a 'key frame' (VP8, VP9) (also known as an 'IDR picture' (H.264) or a 'stream access point' (MPEG-4 part 12)) is a place in the video where the decoder can start decoding. It will start with a coded I-frame, but with additional restrictions to make sure it and subsequent frames can be decoded.

✔️ Whether the coded frame depends on the contents of any previously decoded frame :

  • Video encoding schemes save bits by taking advantage of the similarities between nearby frames. Generally speaking, each frame gets divided into little rectangular pieces (known as macroblocks). For each macroblock, the coded frame first instructs the decoder on how to form a prediction about what the pixel values will be, and then (optionally) tells the decoder what the difference is between the prediction and the intended output.

  • The prediction can generally be formed in one of two ways: (1) based on some part of the same frame (usually required to be above or to the left of the macroblock in question, so it will have already been encoded), or (2) based on some part of a decoded frame that the decoder has already received and saved for later use.

  • A macroblock that is predicted in the first way (within the same frame) is called an 'intra-coded macroblock.' And a coded picture that contains only intra-coded macroblocks (i.e., none of the macroblocks depend on any picture contents outside the current frame) is known as an 'intra-coded picture' or 'I-picture' in MPEG-2 part 2 (H.262) video, or an 'intra-only frame' in VP9. A similar concept is known as an 'I-slice' in MPEG-4 part 10 (H.264) video.

✔️ Bottom line :

  • All of these terms mean what people say when they say 'I-frame'—a coded frame that (unlike most frames in a compressed video) isn’t predicted from the pieces of any earlier-coded frame.

✔️ Ultrapedantic note :

  • No video format that I’m aware of has a precise thing called an 'I-frame' — unfortunately, interlacing makes the terminology more complicated.
  • In H.262 / MPEG-2 part 2, they want to be clear about whether they’re referring to a single intra-coded picture that happens to code a frame [a frame that might be interlaced or might be progressive], or if they just mean a frame coded in a way that doesn’t depend on any decoded pictures outside itself, whether that means one intra-coded picture for the whole frame, or two coded pictures each coding one field.
  • MPEG-2 uses 'I-frame picture' for the first concept, and 'coded I-frame' for the second concept. H.264 gets rid of these terms and defines 'I-slice' instead. VP9 calls it an 'intra-only frame.'

✔️ Whether a decoder can start playing the video at a given point :

  • Just because a coded frame is completely intra-coded i.e. it’s an I-frame picture (H.262 / MPEG-2 part 2) or intra-only frame (VP9) or consists solely of an I-slice (H.264 / MPEG-4 part 10), it doesn’t mean that:

    1. it stands alone (i.e., that frame can be correctly decoded without the decoder having seen some earlier part of the bitstream)
    2. it represents an entry point in the video, so that if the decoder starts at the intra-coded frame, it can then successfully decode all the subsequent frames.
  • Even though an intra-only frame doesn’t depend on the image contents of a previously decoded frame, it can still depend on the decoder having been set up with a particular state (e.g. probability tables for compression or quantization matrices) by earlier parts of the bitstream. So even in MPEG-2, an I-frame picture cannot necessarily be decoded all by itself.

  • Furthermore, just because a bitstream has one I-frame picture, subsequent frames can continue to depend on the contents of earlier frames—even ones decoded long before the I-frame picture was inserted into the bitstream. So even if the decoder can decode the I-frame picture, that doesn’t mean it can decode any subsequent frames in the video.

  • To represent a place where the decoder can actually join the video and start decoding, MPEG-2 uses the term 'point of random access,' H.264 uses the term 'instantaneous decoding refresh (IDR) picture,' and VP8 and VP9 use the term 'key frame.' These are places in the video that not only start with an intra-only frame, but also make sure that the intra-only frame can be decoded with the 'default' decoder state, and that every subsequent frame can be decoded without reference to anything that came before the decoder joined.

  • The general term (defined in MPEG-4 part 12) is a 'stream access point.'