Requesting Help with Writing Code
While I am not totally new at coding in java, I have been away from coding for some time and therefore would definitely put myself in the newbie level or below.
I am trying to write a code which would input a word document, allow me to run a bunch of if statements to count the number of times certain words were mentioned, and then output the number each word was mentioned into an excel spreadsheet.
I know how to write the if statements (which means you are right that I am saying I know the easy part and am asking for the hard part).
I promise that if someone could provide me with the java code, that would allow me to open a word document, lets call it "worddocument" and than run a bunch of loops and than output the values I have for a bunch of values to be outputted into an excel file, I will get back in touch with you and either thank you profusely, cite you in the paper I am trying to write, etc. If nobody has time to provide a sample code, than I would still appreciate if someone could lead me to specific sources on this subject.
Thank you for any help you can provide, and I just want to note I think it is great how much I have seen everyone on this forum helping each other out.
I am far from expert in this particular area, but I do believe that Microsoft Word documents are not simple text files; rather, they have a specific internal structure that must be navigated in order to get at the text. If by "word document" you didn't mean one created by Microsoft Word, then... never mind.
No, you were correct, by word document I was referring to a document created by word. The reason for that was because a friend of mine who knows a lot about computer science but not specifically java said it may be easy because I might be able to break down the lines into individual strings or something like that, not sure how that would be done though.
I should specify that it may not have to be a word document, but could be something like a word document, since all I am doing to create the document that has to be read, is cutting a piece of literature from a website and pasting it into a document, and then running this code, to read that document (whether it is word or something else) and then output the numbers of times each word was mentioned into different excel boxes. So please let me know if I should simply paste the literature into a word perfect document, or maybe into notepad or wordpad. And thank you for pointing that out to me, all of this is very helpful.
It will be much easier if you paste the piece of literature into Notepad or equivalent in order to make a "plain" text file. Given that, your friend is right.Whether someone here will write the program for you, I don't know; but it's possible.
Even if someone doesn't write the code for me, I still appreciate your advice that it would be easier if the literature was posted into Notepad. I will do that instead of posting it into a word document.
And I understand if no one is willing to write some code for me, even if just pieces as helpful examples.I saw that there were a few postings in which people did help out by providing code, and figured I would ask. That said - I hope me asking for help in the way of asking for someone to help by providing the code is not proper for this forum. But I have been trying to learn this for while through websites and intro java books and I seem to still be having difficulty finding how to do what I am looking for.
Regardless, thank you again for your help.
I entered "java word count program" into Google and found this: http://www.faqs.org/docs/javap/source/WordCount.javaIt's probably overkill, but it should be of some help. You could modify it at the point it isolates a word to throw away all but the words you're looking for.
This is trivial if you use text files. If you want to read from a .doc (MS Word format) and output the results into a .xls (MS Excel format) it is not so easy. For output you should settle for a .csv (comma separated values).
This code reads from the command line a word and a path to a file (like C:\MyFile.txt ) and outputs the number of occurrences of that word in the specified file. The comparison is case insensitive so Word==word==WORD . I hope it helps:
package test;
import java.io.*;
public class Main {
public static void main(final String[] args) {
String word = null;
String path = null;
int wordcount = 0;
BufferedReader reader = new BufferedReader(new InputStreamReader(
System.in));
try {
System.out.println("Input the word: ");
word = reader.readLine();
System.out.println("Input the path: ");
path = reader.readLine();
} catch (IOException e) {
e.printStackTrace();
}
File f = new File(path);
try {
BufferedReader fileReader = new BufferedReader(new FileReader(f));
StreamTokenizer tokenizer = new StreamTokenizer(fileReader);
int type = tokenizer.nextToken();
while (type != StreamTokenizer.TT_EOF) {
if (type == StreamTokenizer.TT_WORD) {
if (tokenizer.sval.toLowerCase().equals(word.toLowerCase()))
wordcount++;
}
type = tokenizer.nextToken();
}
System.out.println("The word \"" + word + "\" appears " + wordcount
+ " times");
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
> I am trying to write a code which would input a word document, allow me to
> run a bunch of if statements to count the number of times certain words were
> mentioned, and then output the number each word was mentioned into an
> excel spreadsheet.
Wouldn't VB be the right language for this? You could just embed a macro in
the document to produce the Excel analysis.
http://jakarta.apache.org/poi/It is possible to read/write to/from MS formats.The new office 2007 file formats are open and XML based so you can even write your own tool to do this.