Central issue of the JPEG or MPEG decoders is the discrete cosine transform. Very often discussed, quite a variety of different methods exist.
By a lot of mathematical optimizations, its computational costs could be reduced from 1024 multiplication's and 896 additions for a full 8x8 DCT down to 54 multiplication's, 464 additions and 6 left shifts required by Feig's true 2-D method based on the discrete Fourier transform.
Many programs, like the Berkeley MPEG Player, even some MPEG Hardware use a 1-D Algorithm with 12 multiplication's and 32 additions per row/column. This method could be optimized to take advantage of the sparse nature of the dct matrices, so that the question now is "Can the Feig's scaled 2-D DCT method still compete ?".
Data compression methods in general try to compromise between processing
speed and disk space usage/network loading.
Just because of the large amount of data that has to be dealt with when
graphical information are to be stored or transmitted, picture compression
methods have been subject of special interest since ever.
Thus, quite a few people have been working on that field in since 1982. In order to avoid the creation of competing, independently developed standards a joint collaboration between CCITT and ISO took the initiative to establish an international research group back in 1986.
This international group, called JPEG (Joint Photographic Experts Group) selected by competitive contests the Adaptive Discrete Cosine Transform approach as being the basis for their image compression standard out of 12 other different proposals, that have been contributed by the participating teams.
Applying this method, the JPEG Data compression today (which is now officially admitted by ISO, no 10918) can reduce picture data from 24 bits/pixel, in true color mode, to lower than 0.75 bit/pixel, thereby still providing an excellent quality.
In easy words, the basic idea behind the JPEG Compression Standard is to figure out, which information's are worth storing and which can be omitted (which are for example the, for human eyes invisible, high contrast frequencies, the same color information's in a part of a picture etc.).
Readers unfamiliar with the above mentioned Adaptive Discrete Cosine Transform,
it's theory and implementation are encouraged to read
In addition to the JPEG compression, MPEG compares each picture (frame) during the encoding process, making a decision whether the whole picture must be encoded or just the difference (e.g. unchanged parts of pictures or even moving objects inclusive translation vector).
The result of this hybride method is usually 3 times better then a plain JPEG compression.
On the one hand, it is now clear that MPEG decoding means a lot more than
just running the discrete cosine transform on each macroblock.
On the other, this work attempts to show that even after years there still
can be room for improvement (concerning the DCT code) and tries to give
some creative inspirations for further work on the mpeg decoding software.
When taking a first look into the source code, the entire program appears as a product of the programming style of the late 80's. Someone who expected a clear and a practical design is rather confronted with a mess.
Modules are full of global variables (like flags, switches, file pointers,), where it is hard to guess where they are related to. At this level of complexity, for instance, it is strongly recommended to add a configuration structure (or object) and its manager.
The User Interface, ctrlbar was integrated dispassionately, the dither modules as well.
From deep inside the decoder it has to be jumped back into the main.c module to branch crisscross into a dither module right after and back in this circle stack to main.c .
Some people badly played with #ifdef's and #endif's every time they added functionality, maybe not loose the 0.3% speed with this flexibility and all the way they programmed like this:
vid_stream->future->locked |= FUTURE_LOCK;
vid_stream->current = vid_stream->past;
#ifndef NOCONTROLS
ExecuteDisplay(vid_stream, 1, xinfo);
#else
ExecuteDisplay(vid_stream, xinfo);
#endif /* !NOCONTROLS */
}
} else {
#ifndef NOCONTROLS
ExecuteDisplay(vid_stream, 1, xinfo);
#else
ExecuteDisplay(vid_stream, xinfo);
#endif /* !NOCONTROLS */
}
}
Finally no coding convention at all seems to be valid. No one knows, whether
it should be programmed in ANSI or K&R standard. Some rattled people
did both and wrote very readable things like:
#ifdef __STDC__
int MakeFloatClockTime(unsigned char hiBit, unsigned long low4Bytes,
double * floatClockTime)
#else
int MakeFloatClockTime(hiBit,low4Bytes,floatClockTime)
unsigned char hiBit;
unsigned long low4Bytes;
double *floatClockTime;
#endif
{
if (hiBit != 0 && hiBit != 1) {
*floatClockTime = 0.0;
return 1;
}
*floatClockTime
= (double)hiBit*FLOAT_0x10000*FLOAT_0x10000 + (double)low4Bytes;
Result of the 'reengineering' is the picture below. The upper part of the picture shows the basic video decoder block diagram taken from ISO 11172-2,page VIII. The part of the mpeg_play call graph below shows where (which function and which module) the components of the video decoder can be found in mpeg_play:
In readfile.c a buffered bitstream is implemented. It provides the functions get_more_data(), get_bits() ... used by the parsers that are spred over a couple os different modules.
The main() function calls mpegVidRsrc() in video.c, the function which implements the demultiplexer MUX-1 and where the stream is parsed first. Depending on the sequence code (Group Of Pictures,Picture Start, Sequence Start) the related subparsers are called, for instance ParseMacroBlock().
ParseMacroBlock() calls ParseReconBlock() in parseblock.c 6 times, 4 times for the luminance blocks and 2 times for the chrominance blocks (*).
(*) = not in gray dithering.
ParseReconblock() uses the macros in decode.h, which implements the variable length decoder
VLD for sequence headers, motion vectors, dct values, et cetera.
After the dct values are decoded, ParseReconBlock() does the dequantisation
Q-1 and calls j_rev_dct() or j_rev_dct_sparse() in jrevdct.c, the functions
that have to be replaced in this experiment.

To make it short, we have to change at least parseblock.c and jrevdct.c.
But, it turned out that the Feig's scaled 2-D DCT won't work any more, neither with 16 bit DCT values
nor with 8 bit quantization tables. Because the the intra- and nonintra quantization tables are multiplied
with the DFT constants, the result does no more fit into unsigned char
- and after the next multiplication (that is the dequantization step) the
DCT values will not fit into 16 bit.
Because of mpeg_play's design, these major changes will affect not just
parseblock.c and jrevdct.c, but files like proto.h video.h video.c, as
well.
./mpeg_play -framerate 0 -dither none -no_display videos/paris.mpgA possible result would be:
148 Done! Real Time Spent (After Initializations): 1.698300 secs. Avg. Frames/Sec: 87.145969First, it is of course interesting to know, how a speedup of the DCT code would affect the behavior of the entire program, how much time the DCT costs, respectively.
A very easy and effective method is to use the GNU-Profiler. It must be
said that it is not the exactest for fast executed functions, because the
times, which are displayed, are propagated along the edges of the estimated
call graph.
To use it, mpeg_play must be compiled with the '-pg' option and linked with the profiler code in the gmon library.
In the Makefile the following two definitions must be changed:
CC = gcc -pg
LIBS = -L/usr/lib/X11 -lXext -lX11 -lgmonTo get the output from the profiler the program must terminate correctly, but:
#ifndef NOCONTROLS
if (ControlShow == CTRLBAR_NONE) {
while (TRUE) {
for (i=0;i < numInput; i++) {
while (theStream[i]->film_has_ended != TRUE) {
mpegVidRsrc(0, theStream[i], 0, &xinfo[i]);
}
if (loopFlag) {
rewind(theStream[i]->input);
to
#ifndef NOCONTROLS
if (ControlShow == CTRLBAR_NONE) {
while (workToDo) {
workToDo=FALSE;
for (i=0;i < numInput; i++) {
while (theStream[i]->film_has_ended != TRUE) {
workToDo=TRUE;
mpegVidRsrc(0, theStream[i], 0, &xinfo[i]);
}
if (loopFlag) {
After running mpeg_play the results can be examined at with:
gprof mpeg_playThe following table is a summary of several tests with the same video paris.mpg:
| Dither Type | Display video ? | Time Dither (%) | Time DCT-1 (%) | Links to Results |
| Floyd Steinberg 2 | no | 65.84 | 11.91 | Click here for details |
| Floyd Steinberg 2 | yes | 64.11 | 10.98 | Click here for details |
| Ordered | no | 26.48 | 21.92 | Click here for details |
| Mono | no | 49.40 | 19.28 | Click here for details |
| Gray | yes | 9.49 | 34.18 | Click here for details |
| None | no | - | 35.71 | Click here for details |
Conclusions:
![]()
patch -d mpeg_play < patch_dctAfter running the two versions of the program on an Intel PII with Linux 2.0.33, the following table could be filled in:
| Stream | Stream Size | Frames | Resolution | calls to jrevdct() | calls to jdct_sparse | Sparse factor % |
Frames/s Berkeley DCT |
Frams/s Feigs DCT |
Speedup |
| alien.mpg | 366123 | 253 | 160 x 128 | 99826 | 21678 | 21.7 | 297.16 | 323.85 | 1.089 |
| b0.mpg | 719188 | 100 | 368 x 240 | 93786 | 32251 | 34.4 | 77.67 | 95.86 | 1.234 |
| bike.mpg | 642590 | 150 | 352 x 240 | 99352 | 24041 | 24.2 | 104.71 | 121.17 | 1.157 |
| dreh.mpg | 50456 | 15 | 256 x 192 | 5733 | 1309 | 22.8 | 150.47 | 204.40 | 1.358 |
| hula.mpg | 148076 | 40 | 352 x 240 | 34296 | 9712 | 28.3 | 103.22 | 111.40 | 1.079 |
| iicm.mpg | 1761522 | 801 | 160 x 128 | 258738 | 125806 | 48.6 | 263.06 | 310.73 | 1.181 |
| paris.mpg | 690185 | 148 | 320 x 240 | 72251 | 12910 | 17.9 | 103.82 | 122.16 | 1.177 |
As seen on the other picture, the dithering, huffman decoding and file-I/O
consumes a lot of CPU cycles, so that even when no dithering is done, the
mpeg_play works only 35 % on the DCT. The more it is surprising to see,
that mpeg_play can run at a 1.35 times higher frame rate, when changing
the DCT approach and doing some additional code rearrangements. This is
also a very good result, since the quantization tables changed from char[][]
to short [][] and the dct matrices changed from short[][] to long[][].
![]()