Another great post from Telestream’s Paul Turner!
How do you compress video?
In a previous blog, we talked over the techniques for compressing a still image. Now its time to apply that to moving images. There are a couple of techniques that are in common use, each designed for a specific set of use cases. In broad terms, these fall into the camps of “Intraframe compression” and “Interframe compression”. Those 2 terms look and sound similar, but they are quite different in practice. Let’s start with the simplest one – Intraframe compression.
Intraframe compression basically takes the path of greatest simplicity: as you know, video is made up of a sequence of individual frames, each flashed in front of your eyes for a short amount of time. It makes use of the phenomenon of image retention in the human visual system (flashed images remain on your retina for a small amount of time – look very briefly at a very bright object, and then close your eyes – you’ll still see the outline of the bright object). If you flash pictures frequently enough, our brains interpret the differences in the pictures as movement. So we can compress video by simply compressing each of the frames of the video as if it were a still image. That’s it, in a nutshell: the term Intraframe comes from the Latin “intra”, which means “within” – so Intraframe compression performs compression “within the frame”. Examples of Intraframe compression are Motion JPEG (MJPEG), DV (and all of its variants), I-frame MPEG-2 (no surprise, the “I” stands for “intra”). AVC-I, ProRes, DNxHD and even JPEG2000. Intraframe codecs are generally used in image processing activities, as they generally offer higher quality images, and each frame is encoded individually, making it easy to create cut points.
There’s a practical limit to the amount of compression you can achieve with Intraframe compression before the artifacts become noticeable, so Intraframe codecs tend to run at fairly low compression ratios.
There is one glaring omission in an Intraframe codec: If you actually look at successive frames of video, it is quite common that significant amounts of the current frame are exactly the same as in previous frames. Wouldn’t it be great if we just sent that data once, and then told the receiver “just use the video I sent you last time” for those regions that aren’t changing? That is the idea behind “Interframe compression” – “Inter” also coming from Latin, and meaning “between”
The images below emphasize the point: if you look at much of the frame, there is no difference from one frame to the next – only the car is changing. So if you can identify parts of the first frame that can be reused, you don’t have to resend them.
But you’ll notice that when the car moves, it exposes part of the bridge that we didn’t see in the previous frame. So we either have to encode that separately, or we can grab this part of the image from a later frame. Why would we use a later frame for this content? Because not all frames are created equal: they are compressed differently depending on circumstances. In technical jargon, we end up with I-frames, P-frames and B-frames, with I-frames being the largest, and B-frames being the smallest.
I, P, and B Frames
I frames are, in fact, Intra-compressed frames. They are complete in and of themselves, and don’t rely on any of the frames around them. As such they’re quite large, but you have to have them to ensure any transmission error doesn’t affect all subsequent frames for ever. You see – if you were to lose a macroblock on any frame (like the first one in the pair above), then any subsequent frames that use that part will also have the error – and so an error in the large oval on the left image will magically show up in the right image (because you told the decoder to just use the left image again. Any frame that referenced the oval part of the right picture later in the clip will therefore also have the error – the error propagates for ever. To fix this, every so often you send a full new frame, to reset all of the references. That’s the I-frame (for completeness, in AVC and HEVC, this is called an “IDR frame”, but the principle is the same).
P frames are predicted from previous frames (either the I-frame, or any previous P-frame). So they can be smaller/more heavily compressed than the I-frame, as you can re-use parts of previously transmitted frames, so you don’t have to re-transmit them.
B frames are bi-directionally predicted (hence the “B”) – that’s the term for using part of either a previous frame, or one later in the video to make up parts of this frame (like grabbing the new image content on the right picture above) from an I-frame or P-frame that already has it). So you now have an I-frame, followed by P frame, followed by a B frame, etc. The number of frames from one I-frame to another is called a “Group of Pictures”, or “GOP”, and the number of P and B frames is adjustable by encoder parameters, so you can set the quality level that you want depending on the content itself.
Phew – that’s a lot of information to absorb in one blog entry! But that’s the basics of all forms of video compression in use today. Now you can impress your friends and family with your in-depth technical knowledge! (careful though – this has been a very shallow discussion of a very complex topic…..)
If you’re interested in a deeper discussion of video compression, I’d recommend “Digital Video Compression” by Peter Symes – available from Amazon and Google.
Next time, we’ll talk about audio compression.