A Problem with My Run-Length Encoding Algorithm
I made a program that performs run-length encoding. Its algorithm is like this. Each letter represents one byte.
AAAAAABCDDCCC will become AA4BCDD0CC1.
I guess this is pretty simple. But I have a problem when the number of repeated letters exceeds 255 since one byte can only hold the maximum value of 255. Are there any ways to solve this problem?
[367 byte] By [
Mjordana] at [2007-11-27 1:01:29]

Say you have 257 'A' chars. Then the sequence would be something like the following (where I use '<number>' to represent a byte value)
AA '255' A '1'
As an alternative you can exclude some characters from the accepted character set and use them for flags. Zero is a good choice
AA '0' '256'
In the above '256' is a two byte value, rather than one byte. The zero('0') indicates that the actual length follows as two bytes.
Thanks, but I have a problem again.
An apostrophe is represented as 27 in ascii. So what happens if there are 29 'A's?
For example, this could be my raw data.
A A A A A A A A A A A A A A A A A A A A A A A A A A A A A ' 255 ' A ' 1 '
If I encoded this, it would become like this since there are 2+27 'A's.
A A ' 255 ' A ' 1 '
This is supposed to represent 2+255 'A's, but my raw data has only 29 'A's. So I guess this doesn't solve the problem.
As for the excluding some characters, my raw data could contain all letters, symbols, and numbers, so it wouldn't work either.
Where did the apostrophe come from?
jschell suggested 256 'A' be represented as 255 + 1 by hex sequence 40 40 ff 40 , or using zero to indicate two byte run as hex 40 40 00 01 00. The beginng of a repeated sequence is indicated by the repeated byte 40. "A\'" would be 40 1d. The sequences are not ambiguous, the notation is.