Paragraph fill

I hate to reinvent the wheel so I figured if anyone had a decent algorithm for doing a reformat on a string to fill a certain width. By this I mean if given a string and a width of 80, I want to get back a list of lines back <= 80 characters in width, breaking at whitespaces in the original string. It sounds pretty straight forward but the algorithm must handle the following cases:

1) A section of the text with 81+ consecutive non-whitespace characters.

2) Use of the Systems line.separator value.

3) Removal of pre-existing separators.

4) Trimming excess whitespace from front and back

5) Deal with cases where lines turn out to be all white space. - not sure what to do with this. I'm also not sure whether to collapse whitespaces not at the front or end of a line.

Anyway, anyone know where there is something close to this, or can figure out good search parameters I can use to find such algorithms on the web?

[967 byte] By [mitekea] at [2007-11-26 21:38:07]
# 1

Here's some (untested!) code:/**

* @param text, the text to divide in seperate lines

* @param width, the maximum width of a line

* @return a List of Strings representing the lines with

* a certain width.

*/

List<String> linesOfWidth(final String text, final int width) {

String[] allTokens = text.split("\\s+");

List<String> lines = new ArrayList<String>();

StringBuilder builder = new StringBuilder();

for(String token : allTokens) {

if((token.length() + builder.length() - 1) > width) {

lines.add(builder.toString().trim());

builder = new StringBuilder();

}

builder.append(token);

builder.append(' ');

}

lines.add(builder.toString().trim());

return lines;

}

Note that there can probably go a lot of things wrong (what happens with words > 80 chars long?), you need to test, and adjust it properly.

Good luck.

prometheuzza at 2007-7-10 3:20:52 > top of Java-index,Other Topics,Algorithms...
# 2
> 1) A section of the text with 81+ consecutive> non-whitespace characters.What should happen in that case?
kajbja at 2007-7-10 3:20:52 > top of Java-index,Other Topics,Algorithms...
# 3

> 1) A section of the text with 81+ consecutive

> non-whitespace characters.

> What should happen in that case?

If it were my algorithm, I'd allow the user to provide a parameter to specify whether to

1) Cut it off at the edge and continue on the next line.

2) Look for the next whitespace and allow it to go over the limit.

But I'm just interested in finding a sophisticated algorithm out there that someone has come up with. Don't bother trying to whip something up. I can do that myself.

mitekea at 2007-7-10 3:20:52 > top of Java-index,Other Topics,Algorithms...
# 4
prometheuzz : I had considered something like that, though I wasn't sure what regular expression to use, or whether to preserve white space in the midst of a line or to collapse it. You chose to convert things like tabs to blanks. That would work for my purposes, though!
mitekea at 2007-7-10 3:20:52 > top of Java-index,Other Topics,Algorithms...
# 5
You might want to consider java.text.BreakIterator ( http://java.sun.com/javase/6/docs/api/java/text/BreakIterator.html)
pbrockway2a at 2007-7-10 3:20:52 > top of Java-index,Other Topics,Algorithms...
# 6
I wouldn't have thought that Java came bundled with one. Wow, a nice surprise, though I don't see any way to set the line width!
mitekea at 2007-7-10 3:20:52 > top of Java-index,Other Topics,Algorithms...
# 7

> I wouldn't have thought that Java came bundled with

> one. Wow, a nice surprise, though I don't see any way

> to set the line width!

Well, no, as the documentation states: "The mechanism correctly handles punctuation and hyphenated words. Actual line breaking needs to also consider the available line width and is handled by higher-level software." Meaning you have to take care of that.

So what you would do is to create a suitable BreakIterator, set the text, then scan through the boundaries. The last boundary where the text fits in your width is the boundary you want to use.

DrClapa at 2007-7-10 3:20:52 > top of Java-index,Other Topics,Algorithms...