# 2
I know of nothing that does this, but I think you could build something in an afternoon that would work reasonably well with out too much difficulty.
I based this design on a couple of data items, I looked at the two letter abbrievations for the states developed by the Post Office, and I also looked at the abbrievations for the most common male first names from census data. JIM(James), JOHN, BOB(Robert), MIKE(Michael), BILL(William), DAVE(David), ****(Richard), CHUCK(Charles),JOE(Joseph)...
First design point. Abbrievations should not be developed in isolation. In the interests of not introducing ambiguity ou do not want you two different words to abbrievate to the same string. One could develop some method that looks at a set of words/phrases and develops all the abbrievations simultaneously, or one could simply develop them one at a time, add them to a list and require the abbrievator to come up with an alternative whenever there is a collision.
Furthermore, in the interests of coding theory, where one tends to use short strings to encode the most frequently occuring information, it would be wise to stuff the strings into the abbrievator in frequency order if those frequencies are know. Thus the most frequent words can grab the short strings. It is possible that this concern does not apply in your application.
I view the system as having two components, A - the abbrievator that generates abbrievations and D, the dictionary that looks at candidates and rejects those that don't fit. In the interests of keeping the communication between these two components simple, I assume that A, actually returns not a String, but rather a list of Strings in its perfered order of desirability. D is then permitted to filter and shuffle these strings if it so desires. For example in developing the two letter codes for the states, you wouldn't modify the abbrievator, rather the Dictionary simply throws out all suggestions that were not two letters long.
This split into A and D allows you to deal with essentially linguistic things in A and exteraneous concerns in D. Other examples, in the candidates for two letter state names, two letters was key, pronouncability was not, yet in human names pronouncability is importand. If you were developing user names introduction of numbers, like Marlin314 or Tom17 may be required, in which case, it makes far more sense for the Dictionary that is looking at all the existing Toms to append a number to insure uniqueness than it is to expect A to generate all possible numeric appendages. So both components A and D have their role to play in the design.
Of course in its simplist form, D is nothing other than, grab the first element off of the list supplied by A that does not match another entry, and if that is not possible (where all entries matched) choose an entry near the front of the list and append a number for uniqueness.
Now for A itself I propose you do something simple like this:
You may want to build some elementary linguistic filters along these lines:
respell: internal CH to K, TH to T, PH to F, DoubleVowels to first Vowel, etc. Create a list of these sorts of transformations so that you can add to them when you realsize that you forgot one like OUGH -> UF
Break up word/phrase into list of syllables. Make assumption that your respellings eliminated double consenants that should be kept together and follow simple if linguistically incorrect rules, like if there is only one consenant between two vowels, the second vowel gets it, if there are two consenants, the syllable split is between the two, if there are three give first two to the first syllable and third to the second syllable, etc. Special case silent-E (a 4 letter word like consenant-vowel-consenant-E is a single syllable not two)
Then using those linguistic routines you can generate your abbrievations in some simple order like this:
1: If input is single word and single syllable return word unchanged
2: If input is phrase return first letter from each word ingnoring stop words
3: if input is phrase return first syllable from first word and first syllable from second word
4: return initial portion up to second vowel cluster. i.e. return initial C-V-C or V-C
5: return first syllable
6: If input is single word return first letter
7: return first 2 letters
8: return first 3 letters
9: if input is word return first letter and first letter from second syllable
10: if input is word return all combinations of first letter followed by first letter from some subsequent syllable.
11: if input is phrase return first syllable followed by first syllable from any subsequent words.
12: if input is word return first and last letter
13: if input is word strip all vowels
14: if input is word return last syllable
15: return word - unmodified
16: return phrase - unmodified
clearly you can add any other transformations that you want to the list. The order that I listed may seem strange but it was choosen like this. I assme that first you want to pronounce the abbrievation, so even though those are longer, I put them at the front of the list. Then I list the short abbrievations. IF your D filter is looking for 2 or 3 letter abbrievations it will filter out the longer pronuncable ones and grab the shorter ones further down the list.
In this form my list of names, James, John, Robert, Michael, William, David, Richard, Charles, Joseph, Thomas, Christopher, Daniel, Paul, Mark, Donald, George, Kenneth, Steven, Edward, Brian, Ronald, Anthony, Kevin, Jason, Matthew... becomes
JAM, JOHN, ROB, MICH, WILL, DAV, RICH, CHARL, JOS, THOM, CHRIST, DAN, PAUL, MARK, DON, GEORG, KENN, STEV, EDW, BRIAN, RON, ANTH, KEV, JAS, MATTH,...
Now we test this with some new short name spaces, like days of the week.
MOND,TUESD,WEDN,THURSD,FRID,SAT,SUND
Too long - Dictionary can filter to 2 letters
MO,TU,WE,TH,FR,SA,SU
Still too long D can filter to 2 or less
M,T,W,TH,F,S,SU
months
JAN,FEBR,MARCH,APR,MAY,JUNE,JUL,AUG,SEPT,OCT,NOV,DEC
filter to 2 or less
J,F,M,A,MA,JU,JL,A,S,O,N,D
filter to 2
JA,FE,MA,AP,MY,JU,JL,AU,SE,OC,NO,DE
filter to 3
JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC
zodiac - aries, taurus, gemini, cancer, leo, virgo, libra, scorpio, sagittarius, capricorn, aquarius, pisces
raw
ARIES,TAUR,GEM,CANC,LEO,VIRG,LIBR,SCORP,SAG,CAPR,AQ,PISC
2 or less
A,T,G,C,L,V,LI,S,SA,CA,AQ,P
exactly 2
AR,TA,GE,CA,LE,VI,LI, SC,SA,CR,AQ,PI
Anyway, that is my suggestion for a quick and dirty abbrievation generator.
Enjoy!