image compression algorithms for video stream

This is more of an algorithm question then a JMF question (maybe).

Scenario: User connects to webpage and loads applet. Applet makes a connection to a servlet. Servlet starts webcam and sends image data (in the form of bytes) from a webcam to the applet. The applet displays the image data.

Goals:

1. Try to achieve a fairly high frame rate (ideally >= 10 fps).

2. Keep the bandwidth consumption low (ideally so a person with a 56K modem could use it).

3. Allow the quality to be adjusted.

What I am wondering what is a good process of sending the data to achieve the above. Right now, I see two possible approaches:

1. Encode each frame as an jpeg, compress and send.

2. For each frame, compare to last frame. Record changes (if significant). Organize changes into byte array, compress and send.

Which of 1 or 2 will produce better results? Or is there a better way to do this?

[942 byte] By [rkippena] at [2007-9-29 17:02:50]
# 1

Interesting.

The webcam produces every n seconds a JPEG.

No need for an applet, a HTML meta refresh would do fine.

An applet would be nice though.

There are more than one users out there.

So you have to place the JPEG, and for longer visitors

quicker delta files.

This requires the server to do some intensive work.

Paid back in hopefully less traffic for longer visitors.

In effect the server is playing an MPEG movie, with still frames

and deltas.

You can put every pixel group in a priority queue with factors:

- change in cumulative color

- duration of time not updated

Furthermore it makes sence to let the applet reload the entire frame

once in a while; when the queue would grow too large, or a long period

did pass.

joop_eggena at 2007-7-15 15:39:38 > top of Java-index,Other Topics,Algorithms...
# 2

try here

http://www.mpeg.org/MPEG/video.html#video-software

for a lot of mpeg 1,2,4 encoders/decoders.

This might be a way to go. Perhaps more efficient then trying to write

your own compression standard.

That way you could also stream to "standard" clients such as IE with

MediaPlayer or Real.

Mpeg 1 is pretty much a jazzed up version of what you propose.

10) Encode frame using JPEG.

20) Encode frame difference using JPEG

30) if reconstructed image differs by > errorMargin from inputImage goto 10.

40 goto 20.

This allows an arbitrary number of diff frames (which should be smaller)

between the static frames without the error growing to large (for

example when things move rapidly.

MPEG 1 is a bit more complex then this but not a huge amount.

matfud

matfuda at 2007-7-15 15:39:38 > top of Java-index,Other Topics,Algorithms...
# 3

> The webcam produces every n seconds a JPEG.

> No need for an applet, a HTML meta refresh would do

> fine.

Well, I tried this approach before. The problem was that each time the image is downloaded, the browser had to make a new connection to the server, which produces a limitation on the number of fps. Having an open connection is the way to go.

> An applet would be nice though.

> There are more than one users out there.

> So you have to place the JPEG, and for longer

> visitors

> quicker delta files.

>

delta files? do you mean byte stream?

There are no "files". Actually, writing to the server hard-drive would be more difficult to deal with because you would have to perform some sort of synchronization between writing to a file and the applet making a request to read the file.

rkippena at 2007-7-15 15:39:38 > top of Java-index,Other Topics,Algorithms...
# 4

> try here

> http://www.mpeg.org/MPEG/video.html#video-software

> for a lot of mpeg 1,2,4 encoders/decoders.

>

Here are a couple of links for anyone else:

http://www.hpl.hp.com/techreports/2002/HPL-2002-260.pdf

http://archive.dstc.edu.au/RDU/staff/jane-hunter/video-streaming.html

> This might be a way to go. Perhaps more efficient then

> trying to write

> your own compression standard.

>

Yes, I don't want to write a "new" compression algorithm. However, I want something simple that works with applets without downloading a new JRE or additional JMF packages (ideally the default browser java 1.1 applet would be all that is required). However, the com.sun.jpeg.codec.blah whatever package isn't available until 1.2. So I might have to implement the JPEG compression algorithm (help?).

> That way you could also stream to "standard" clients

> such as IE with

> MediaPlayer or Real.

>

I didn't consider this, but it is probably worthwhile, thanks.

> Mpeg 1 is pretty much a jazzed up version of what you

> propose.

> 10) Encode frame using JPEG.

> 20) Encode frame difference using JPEG

> 30) if reconstructed image differs by > errorMargin

> from inputImage goto 10.

> 40 goto 20.

>

Where does the data get sent? at step 10?

rkippena at 2007-7-15 15:39:38 > top of Java-index,Other Topics,Algorithms...
# 5

I was assuming that the encoded data would be pushed into a stream

if there is a client(s) waiting. Disposed of otherwise. Of course

you would need a decoder on the client. With a scheme like this you

just drop the data if no one is listening. If someone is listening you

go into the pseudo code described above.

You could just stream JPEGs but they tend to be quite large. If not much

is changing in the image then sending a difference image can save

a large number of bytes. The problem is that if you send only difference

images cumulative errors can build up (as JPEG is not lossless). Therefore

the encoder encodes an image [i1] and sends it. this is a marker frame

(an 'I' frame in MPEG I think). Each time this happens a new client can join the stream. After this you find the difference between image [i1]

and [i2] and compress it. This should produce a better compression

ratio as the difference images should contain less info then a full

frame. However you have to decompress the diff image and build the

complete image as the client would see it. If this reconstructed image

differs too much from the original image obtained from the webcam then

you will need to send a new complete frame so that subsequent diff

frames do not cause too much error to creep into the image.

Im not sure how you would do this using MPEG itself cos I havn't read

the specs for a long while. However it should be possible to allow new

clients to join the stream.

This means that each client can have a single connection to the server

(UDP perhaps). This is not ideal but multicast is not reliable on

the internet (gets dropped by most routers)

matfud

matfuda at 2007-7-15 15:39:38 > top of Java-index,Other Topics,Algorithms...
# 6

> If not much is changing in the image then sending a difference image can save a large number of bytes.

This is the part I am most interested in. How is the difference image done?

For example, suppose the following is the image:

aaaa

aaaa

aaaa

aaaa

Then suppose in the next frame the image is:

aaaa

abba

abba

aaab

Then how is the difference image stored? Is it also a JPEG?

rkippena at 2007-7-15 15:39:38 > top of Java-index,Other Topics,Algorithms...
# 7

Yes. In its simplest form it is just a pixel by pixel subtraction of the

two images. In areas where nothing has changed the difference between

the pixel values is 0.

(By the way JPEG can be lossless it is just remarkably rare to find

an implementaion that provides this aspect of the standard (lots of

issues with patents and all)

JPEG's baseline sequntial, which is the most common form of lossy compression, finds

the Discrete Cosine Transform of blocks of the image (blocks of size

8x8 pixels for example). This represntation contains spatial frequency

information. The precision of these blocks is reduced (they are quantized) by dividing by the values in a

a template block (quantisation table of which there can be many) which

reduces the magnitude of the values in the block).

The aim of this stage is to optimise the data for the second stage which is

coding the image. The coding is perfomed by runlength encoding (diagonal zigzag across the block). As this block is the DCT of the

image the last values tend to be zeros and can be compressed well.

The result of this run length encoding is then Huffman encoded (or

artimetically encoded for lossless) which minimises the number of bits

required to store each value.

This therefore is the basic outline of JPEG: Find frequency components

(DCT). Divide result by a quntisation table that minimises the residual.

Runlength encode this in such a way that you minimise the number of

bytes required to represent the residual. Huffman encode the result.

Huffman encoding finds the minimum number of bits that can represent the

data with which it is presented. So it is a case of optimise, optimise,

optimise.

If you diff two images that are similar then most of values in the 8x8 blocks in

the diff image will be close to zero. The spatial frequency components

of the block will therefore be mostly zero. Quantisation with a suitable

table will reduce the residuals even more. The block will be pretty

much all zeros. The run length encoding should

therefore "see" very long runs of the same value which it should be

efficient at compressing. This all results in a diff image being far more compressible

then the original image (as long as not much changed between one image

and the next).

So you transmit the JPEGed original image[i1] then you find the diff

between it and its sunsequent image (image[i2]-image[i1]). This is

encoded and transmitted. you keep transmitting encoded diff images until

the reconstruction of the original (done on the server of course for

just this reason) has to great an error when compared to the image it

should be. then you transmit another "full" image to resynch the clients

(taken from "Graphics File Formats: Reference and Guide". Wayne, C. and

Shepherd, B., 1995 chp 10. ISBN 1-884777-00-7)

MPEG actually uses images called I, P, B and D.

I is Intra-coded: encoded using only information from itself.

P is predictve-coded: picture encoded using motion compensated

prediction from past I or P pictures. (a fancy diff image that improves

compression more then diff does)

B is Bidirectional predictive-coded: picture encoded using motion

compensated prediction from past and/or future I and/or P pictures.

D is DC-coded: coded using only info from itself. Stores only the DC

components of the DCT blocks. This is the odd one out. MPEG streams are

either all D frames or contain none. Streams that are all D frames are

very lossy but fast to process (good for video editing previews).

matfud

matfuda at 2007-7-15 15:39:38 > top of Java-index,Other Topics,Algorithms...
# 8

> If you diff two images that are similar then most of values in the

> 8x8 blocks in the diff image will be close to zero. The spatial

> frequency components of the block will therefore be mostly zero.

> Quantisation with a suitable table will reduce the residuals even

> more. The block will be pretty much all zeros. The run length

> encoding should therefore "see" very long runs of the same value

> which it should be efficient at compressing.

Ok, this is the part that is difficult. I have done some experimentation using a 24-bit 320x240 picture (of a person looking into a webcam) and converting it to a JPEG. The resulting JPEG is about 10 KB. I then use the eraser tool and white-out about half the pixels (basically the back-ground is whited out, leaving only the head). I then save that as a JPEG and the resulting file size is 12 KB!

So, I must be missing something on how the new image is computed, if it is still a JPEG. Is the quality of the difference image lower than the quality of the original image?

rkippena at 2007-7-15 15:39:38 > top of Java-index,Other Topics,Algorithms...
# 9

Are you sure you got them the right way round. When I tried this the

size of the image dropped from 12k to about 8k.

It must be noted that whiting (or blacking) out chuncks of the image

is not quite the same as diffing the image. (but there shouldn't be

too much difference between the two)

It could be an artifact of the quality value you provide or the

shape of the image that has not changed etc. Anyway, as I said, MPEG

actually uses predictive encoding which is a gloryfied version of

diffing.

matfud

matfuda at 2007-7-15 15:39:38 > top of Java-index,Other Topics,Algorithms...
# 10

> Are you sure you got them the right way round. When I

> tried this the

> size of the image dropped from 12k to about 8k.

>

Yes, but as you said it is probably a high quality factory with sharp lines (not good for JPEG).

I have started writing an all 100% Java JPEG decoder (to work with Java 1.1). I found a link for one, but unfortunately it was broken. I also found the source for an encoder, so that will make things easier.

I will post when it's finished, until then if somebody has a (working) link that would be appreciated.

rkippena at 2007-7-15 15:39:38 > top of Java-index,Other Topics,Algorithms...
# 11

Also you can use buffering as in MS Media Player.

1. Applet sets cache-refresh time, for example 5 sec.

2. Initially applet gets 5 sec movie(JPEG or whatever).

3. When applete received JPEG and plays movie, you get another

5 sec movie.

The main relation here is network speed(NS) and cache time(CT).

CT = K / NS, where K - some coefficient.

Good bye, see ya.

grudnevkva at 2007-7-15 15:39:38 > top of Java-index,Other Topics,Algorithms...
# 12

> I will post when it's finished, until then if somebody

> has a (working) link that would be appreciated.

>

I am nearing completion. However, I have noticed that the

decoding of the JPEG image seems to be slower than the encoding because of all the prefix matching that has to be done to decode the Huffman codes.

Is there a simple way to improve the speed of the matching? Suppose the buffer contains 111101101011001. Also, suppose the AC Huffman table is about 256 in length. I don't want to have to check the prefix of 256 code words to decode a single entry. However, I am worried that building a tree won't be that more efficient, plus it will take a lot of time to implement. Does anyone have experience with the speed gain?

rkippena at 2007-7-15 15:39:38 > top of Java-index,Other Topics,Algorithms...
# 13
A tree should offer a large performance increase overa search. howeveryou will have to be careful about the way you implement the tree to achieve the benefit.
matfuda at 2007-7-15 15:39:38 > top of Java-index,Other Topics,Algorithms...
# 14

try this

public class HuffmanTree {

private TreeNode root = new TreeNode(0);

private TreeNode previousState=null;

private int shiftCount=0;

private int numBits=32;

private int code;

private int[] results = new int[32]; // larger then needed

private int resultCount = 0;

/**

* clears any state the tree has

*/

public void clearState() {

code=0;

previousState=null;

}

/**

* adds a node to the tree. All nodes in the path will be created if they don't

* already exist.

*

* @param value the value of the leaf

* @param c the code associated with the value

* @param numBits how many of the high order bits are part of the code

*/

public void addNode(int value,int c, int numBits) {

if(numBits<1) throw new IllegalArgumentException("numBits must be greater then 0");

if(numBits>32) numBits = 32;

System.out.println("Add node called");

TreeNode tn = null;

TreeNode parent = root;

boolean goRight=false;

while(numBits>0) {

goRight=c<0;

if(goRight)

tn=parent.right;

else

tn=parent.left;

if(tn==null) {

tn = new TreeNode(0);

if(goRight)

parent.right=tn;

else

parent.left=tn;

}

if(numBits==1)

tn.value=value;

parent=tn;

c=c<<1;

numBits--;

}

}

/**

* finds the numbers associated with the codes in the first numBits of the <code>code</code>

* parameter

*/

public int[] get(int code, int numBits){

if(numBits<1) throw new IllegalArgumentException("numBits must be greater then 0");

if(numBits>32) numBits = 32;

shiftCount=0;

resultCount = 0;

this.numBits=numBits;

this.code = code;

// if I was partway through a search an ran out of bits then

// continue from where I left off

if(previousState!=null){

TreeNode ps = previousState;

previousState=null;

int tmp = findMatch(ps); //start search at old tree position

//still not enough bits. This should never happen as the

// tree construction only allows a max of 32 bits to be used

if(previousState!=null) return new int[0];

results[resultCount++]=tmp;

}

// general search loop

while(shiftCount<this.numBits) {

System.out.println("finding resultCount "+resultCount);

int tmp = findMatch(root);

if(previousState==null) {

results[resultCount++]=tmp;

}

}

// create new array populate with results and return it

int[] r = new int[resultCount];

for(int i=0;i<resultCount;i++)

r[i]=results[i];

return r;

}

/**

* traverses the tree and returns the value at the terminal leaf. If not enough

* bits remain to find the leaf it saves its state in the previousState variable

*

*/

protected int findMatch(TreeNode node) {

while((node.left!=null)&&(node.right!=null)) {

if(shiftCount==numBits) {

previousState = node;

return 0;

}

boolean goRight = code<0;

if(goRight)

node=node.right;

else

node=node.left;

code=code<<1;

shiftCount++;

}

return node.value;

}

public static void main(String[] args) {

int[][] table = {{'e',0,3},

{'r',64,3},

{'i',112,4},

{'b',160,5},

{'t',224,3},

{'a',32,4},

{'v',96,5},

{'s',128,4},

{'g',168,5},

{' ',192,3},

{'d',48,4},

{'y',104,5},

{'n',144,4},

{'h',176,4}};

HuffmanTree ht = new HuffmanTree();

// populate tree

for(int i=0;i<table.length;i++) {

int code = table[i][1]<<(24);

//System.out.println(Integer.toBinaryString(code));

ht.addNode(table[i][0],table[i][1]><<24,table[i][2]);

}

//set up two ints with a variety of values. The first int cannot

// contain all required bits (has an extra 00 at the end). The

// tree remembers this state and when the second int is passed in it

// continues so the first letter of the second it appear to be "a" (a=0010)

Long bin = Long.valueOf("11101000100110010101110001100000",2); //"travg etc."

Long bin2 = Long.valueOf("10011101001110010011001000010010",2);

int v = bin.intValue();

int[] values = ht.get(v,32);

System.out.println("\nlength of results = "+values.length);

for(int i=0;i<values.length;i++){

System.out.println((char)values[i]);

}

v = bin2.intValue();

values = ht.get(v,32);

System.out.println("\nlength of results = "+values.length);

for(int i=0;i<values.length;i++){

System.out.println((char)values[i]);

}

}

class TreeNode {

protected TreeNode left;

protected TreeNode right;

protected int value;

protected TreeNode(int value) {

this.value = value;

}

}

}

This tree is designed so you can pull ints straight from a stream an

get the encoded values for that int. (each int can contain multiple

values). in general you call the get() method with a bitNum of 32 unless

you know that the huffman encoded stream ends before this. Beware that

the data must be in the MSbits of the int.

The tree allows searches for strings that are over 32 bits in length

(32 levels deep) however there is no facility to create such a tree

as the addNode method only allows 32 bit codes to be used. The addnode

method adds a single code. The code must be in the MSBs of the int.

The test table is taken from

http://www.la.unm.edu/~mbrettin/algorithms/huffman.html

I just modified to codes so they are byte aligned (therefore a 24 bit

left shift ensures the values are in the MSB's for easy addition to the

tree.

matfud

">

matfuda at 2007-7-15 15:39:38 > top of Java-index,Other Topics,Algorithms...
# 15
Any use?
matfuda at 2007-7-19 12:29:19 > top of Java-index,Other Topics,Algorithms...
# 16

Getting there! Here is an overview of the structure:

public static void decodeJPEGData(byte[] data, int offset, JPEGTable jt, ... some additional buffers);

//data is the raw data

//offset is the starting index

int buffer = 0;

// load buffer with data from byte

int[] ret = new int[3]; // ret[0] stores the index, ret[1] stores the num bits to read, ret[2] stores the code

jt.match(buffer, ret); // goes to the jpeg table to find a match, result is stored in ret

if (ret[0] == -1) // no match (error)

else {

// ...

}

Actually, the JPEGTable contains 4 match methods, matchLumAC, matchLumDC, matchChromAC, matchChromDC. It will be interesting to see the difference in speed, but first I must finish doing the inv DCT...

rkippena at 2007-7-19 12:29:19 > top of Java-index,Other Topics,Algorithms...
# 17

for the Huffman tree decoder I posted you can sinply use byte arrays by casting

a byte to an int, right shifting by 24 and calling the get method with the result

and a numBits arg of 8.

It should be a bit faster if you can compose an int from 4 bytes and pass that in

though.

How are you implementing the Cosine Transform. A simple (but not efficient) method

is to use a fourier transform. However there are many techniques. Just interested

to know which one you chose.

matfud

matfuda at 2007-7-19 12:29:19 > top of Java-index,Other Topics,Algorithms...
# 18
Finally I finished the decoder. The JFIF specification is to use the DCT (Discrete Cosine Transform).
rkippena at 2007-7-19 12:29:19 > top of Java-index,Other Topics,Algorithms...
# 19

Here is the implementation, not sure if it is the same one you are talking about:

public class DCT {

private static double[][] c = new double[8][8];

static {

for (int i = 0; i < 8; i++)

c[0][i] = 1.0 / Math.sqrt(8);

for (int i = 1; i < 8; i++)

for (int j = 0; j < 8; j++)

c[i][j] = Math.sqrt(0.25) * Math.cos((2.0 * j + 1.0) * i * Math.PI * 0.0625);

}

/**

All arrays are 8x8. The result is stored in the "out" array. The temp array is used for

temporary storage.

*/

public static void IDCT(double[][] in, double[][] out, double[][] temp) {

for (int i = 0; i < 8; i++) {

for (int j = 0; j < 8; j++) {

temp[i][j] = 0.0;

for (int k = 0; k < 8; k++) {

temp[i][j] += in[i][k] * c[k][j];

}

}

}

for (int i = 0; i < 8; i++) {

for (int j = 0; j < 8; j++) {

out[i][j] = 0.0;

for (int k = 0; k < 8; k++) {

out[i][j] += c[k][i] * temp[k][j];

}

}

}

}

}

Actually, this implementation is considered slow. There is a faster implementation that comes with the

source code provided by the "Independent JPEG Group's software". However, it is written in C and is a

little cryptic. I will have to work on it later.

rkippena at 2007-7-19 12:29:19 > top of Java-index,Other Topics,Algorithms...
# 20

> Finally I finished the decoder. The JFIF

> specification is to use the DCT (Discrete Cosine

> Transform).

That is the spec. I was saying that if you can't find a pre written DCT but can find a pre written FFT algorithm

then you can persuade the FFT to perform the DCT for you. However it is not an efficient approach (but

can save coding time which is irrelevent now as you have written a DCT)

How much more of the decoder you you need to implement?

Did the huffman tree end up being faster then the tables?

matfud

matfuda at 2007-7-19 12:29:19 > top of Java-index,Other Topics,Algorithms...
# 21

> How much more of the decoder you you need to

> implement?

The decoder is finished. But there are some improvements to make, such as implementing the faster IDCT and looking for other ways the overall decompression can be made faster.

> Did the huffman tree end up being faster then the

> tables?

Yes. For a JPEG file of about 125 KB, the decoding part is 10-20 milliseconds faster. However, there were some changes to the HuffmanTree. The HuffmanTree you provided allows for multiple values to be associated with a prefix (it looks like) .However, when dealing with a stream of bytes there can only be one way to decompress the stream (the prefix property).In short, here is the modified Huffman tree.The "index" variable is really just the "value" variable. The "size" variable is the same as the "numBits" variable. I was deciding whether or not to store the size in the node or just use a counter when traversing the tree.

public class HuffmanTree {

private Node root = new Node();

private class Node {

public Node leftChild = null;

public Node rightChild = null;

public int index = 0;

public int size = 0;

public Node() {}

public Node(int index, int size) {

this.index = index;

this.size = size;

}

}

public void add(int index, int size, int code) {

code = code << (32 - size);

Node newNode = new Node(index, size); // create node here because size is changed

Node n = root;

while (size > 1) {

if (code < 0) {

if (n.rightChild == null)

n.rightChild = new Node();

n = n.rightChild;

}

else {

if (n.leftChild == null)

n.leftChild = new Node();

n = n.leftChild;

}

size--;

code = code << 1;

}

if (code < 0)

n.rightChild = newNode;

else

n.leftChild = newNode;

}

public void match(int buffer, int[] ret) {

Node n = root;

while (true) {

if (buffer < 0) { // 1 at start of buffer

n = n.rightChild;

}

else { // 0 at start of buffer

n = n.leftChild;

}

if (n == null) break;

if (n.leftChild == null && n.rightChild == null) {

ret[0] = n.index;

ret[1] = n.size;

//ret[2] == don't care

return;

}

buffer = buffer << 1;

}

ret[0] = -1; // no match found

}

}

rkippena at 2007-7-19 12:29:19 > top of Java-index,Other Topics,Algorithms...
# 22

The tree I posted maps one value to one prefix. Its more complex then your tree as it assumes that there

will be multiple pretfixs in the buffer variable (the int) and that the last prefix in that variable may or may not

be complete. It may require the next int of the buffer to allow the prefix to be fully decoded.

for example say a four bit prefix starts at the 30th bit of the buffer int. It would require 2 bits from the

next int to be able to complete the lookup. A fair bit of the code is there to ensure that a partial state

can be held between calls to the get method.

Also another reason the tree is a bit complicated is that it allows arbitrary length codes to be looked up

(although the tree addition method does not allow codes more then 32 bit long to be entered.)

The reason I chose to handle the lookup in this wierd way (rather then just passing in a byte buffer for

example) is that it keeps the code to represent the tree fairly small by getting rid of the array handling stuff.

It also is suitable for streaming systems as it does not need to store the entire array in memory to start

decoding.

Using an int rather then a byte allows the get method to be called less often (the int stores more prefixes

then the byte could)

To use it properly you will need a method that can take four bytes (from your socket) and squeeze them

into an int.

public static int makeInt(byte b3, byte b2, byte b1, byte b0) {

return (int)((((b3 & 0xff) << 24) |

((b2 & 0xff) << 16) |

((b1 & 0xff) << 8) |

((b0 & 0xff) << 0)));

}

However you might want a slightly more efficient one that takes a byte[] that way you can improve the

performance obtained reading from your socket by using the read(byte[]) style methods.

matfud

matfuda at 2007-7-19 12:29:19 > top of Java-index,Other Topics,Algorithms...