All video that is transmitted is somehow first compressed to make it fit within the transmission medium. An easy analog to how this works is to look at the pictures that come out of a digital camera. I once had a good camera and it captured enough detail to let you have a clear print even after cropping to a small part of the picture or blowing the whole picture up to poster size. But I remember that the size of the file created for one picture was immense. If you wanted to email the picture you had to first compress it and save it as a jpeg, and doing so chopped off a lot of the detail of the picture.
Video compression works the same way. Video cameras capture a lot more detail than can ever be retransmitted over a cable TV network or a satellite TV path, and so the detail needs to be edited in some way to fit into the size of the video transmission path. The industry has developed compression standards that define certain parameters for compressing video. The most commonly used standards include Motion JPEG, MPEG-4 Part 2 (or simply referred to as MPEG-4) and H.264.
The process of compressing video is to apply an algorithm to the raw video signal to reduce its size and then apply an inverse algorithm to the compressed file in the viewing device (your TV settop box, computer, or smartphone) to play the video. This combination of coding and then decoding a video signal is called video codec (coding/decoding). Each of the standards is unique and the same standard must be used at both ends of the transmission path to function. For example, you can’t view an MPEG4 video using an H.264 decoder.
The two major techniques utilized by a video codec are image compression and video compression.
Image compression uses intraframe techniques to reduce the size of video files. This means that unnecessary information is removed from each frame of the video, with unnecessary defined as things that cannot be noticed by the human eye. These are the same techniques that are used in my example above of compressing an image from my digital camera. For instance, there might be very tiny nuances of different shades of blue in the captured image of a sky and the compression would simplify this to a few shades of blue in order to simplify the information that will be saved.
Video compression, on the other hand, works by using interframe compression techniques. This compares the images from adjacent frames of a video and will try to capture only the pixels that have changed from one frame to the next. For example, if the camera spent a few seconds looking out into a sunny backyard, most of the parts of all of the frames might be identical except maybe for leaves or flowers that might be moving in the wind. These techniques would only transmit the pixels that change with a note to keep the others the same.
These techniques are more sophisticated than that simple example. For instance, if a camera was panning across a view of a house, the details of the house would remain the same even though the house would be shifting across the video image. A technique called block-based motion compensation effectively draws a ‘box’ around the moving image and again treats it the same from frame to frame.
The techniques also make use of schemes to classify frames of different types to make it faster to decode them. For example, an I-frame is a unique frame and would be the first frame that shows a new scene where everything is different than the previous frame. A P-frame is a predictive inter frame, meaning that it makes reference to and relies of the frame before it. A B-frame is a bi-predictive frame that makes reference to the frame both before and after it.
Next in this blog series I dig a bit deeper into the specific techniques used to make these standards function. And I will start to explain how not all versions of any one of the standard compression techniques are the same.