Parse string from many lines.

[nobr]I have a file which contains many lines such as.

<!--

document.writeln("\x3cdiv style=\"text-align:right; font-size:80%\"\x3eYour \x3cb\x3e\x3ca href=\"http://wikimediafoundation.org/wiki/Fundraising\" class=\"extiw\" title=\"wikimedia:Fundraising\"\x3econtinued donations\x3c/a\x3e\x3c/b\x3e keep Wikipedia running!\x26nbsp;\x26nbsp;\x26nbsp;\x26nbsp;\x3c/div\x3e\n");

-->

</script></div><h1 class="firstHeading">God</h1>

<div id="bodyContent">

<h3 id="siteSub">From Wikipedia, the free encyclopedia</h3>

<div id="contentSub">Revision history<br /><a href="/w/index.php?title=Special:Log&page=God" title="Special:Log">View logsforthis page</a></div>

<div id="jump-to-nav">Jump to: <a href="#column-one">navigation</a>, <a href="#searchInput">search</a></div><!-- start content -->

(Latest | <a href="/w/index.php?title=God&dir=prev&limit=500&action=history" title="God">Earliest</a>) View (previous 500) (<a href="/w/index.php?title=God&offset=20061113024837&limit=500&action=history" title="God">next 500</a>) (<a href="/w/index.php?title=God&limit=20&action=history" title="God">20</a> | <a href="/w/index.php?title=God&limit=50&action=history" title="God">50</a> | <a href="/w/index.php?title=God&limit=100&action=history" title="God">100</a> | <a href="/w/index.php?title=God&limit=250&action=history" title="God">250</a> | <a href="/w/index.php?title=God&limit=500&action=history" title="God">500</a>).<div id="histlegend">For燼ny爒ersion爈isted燽elow,燾lick爋n爄ts燿ate爐o爒iew爄t. For爉ore爃elp,爏ee?lt;a href="/wiki/Help:Page_history" title="Help:Page history">Help:Page爃istory</a>燼nd?lt;a href="/wiki/Help:Edit_summary" title="Help:Edit summary">Help:Edit爏ummary</a>.<br/>

(cur)?燿ifference爁rom燾urrent爒ersion,?last)?燿ifference爁rom爌receding爒ersion, <b>b</b>?lt;a href="/wiki/Wikipedia:Bot_policy" title="Wikipedia:Bot policy">bot爀dit</a>,?lt;b>m</b>?lt;a href="/wiki/Help:Minor_edit" title="Help:Minor edit">minor爀dit</a>,犫啋?lt;a href="/wiki/Help:Section#Section_editing" title="Help:Section">section爀dit</a>,犫啇?lt;a href="/wiki/Wikipedia:Automatic_edit_summaries" title="Wikipedia:Automatic edit summaries">automatic爀dit爏ummary</a></div>

<form action="/w/index.php?title=God&" method="get"><input type='hidden' name='title' value="God" />

<input class="historysubmit" type="submit" accesskey="v" title="See the differences between the two selected versions of this page. [v]" value="Compare selected versions" /><ul id="pagehistory">

<li>(cur) (<a href="/w/index.php?title=God&diff=116678140&oldid=116676472" title="God">last</a>) <input type="radio" value="116678140" style="visibility:hidden" name="oldid" /><input type="radio" value="116678140" checked="checked" name="diff" /> <a href="/w/index.php?title=God&oldid=116678140" title="God">01:36, 21 March 2007</a> <span class='history-user'><a href="/wiki/User:Roy_Brumback" title="User:Roy Brumback">Roy Brumback</a> (<a href="/wiki/User_talk:Roy_Brumback" title="User talk:Roy Brumback">Talk</a> | <a href="/wiki/Special:Contributions/Roy_Brumback" title="Special:Contributions/Roy Brumback">contribs</a>)</span> <span class="comment">(Well then why leavethis in, as it's not exactly what he said and its rebutting a point no longer made.)</span></li>

<li>(<a href="/w/index.php?title=God&diff=116678140&oldid=116676472" title="God">cur</a>) (<a href="/w/index.php?title=God&diff=116676472&oldid=116456238" title="God">last</a>) <input type="radio" value="116676472" checked="checked" name="oldid" /><input type="radio" value="116676472" name="diff" /> <a href="/w/index.php?title=God&oldid=116676472" title="God">01:28, 21 March 2007</a> <span class='history-user'><a href="/w/index.php?title=User:BernardZ&action=edit" class="new" title="User:BernardZ">BernardZ</a> (<a href="/wiki/User_talk:BernardZ" title="User talk:BernardZ">Talk</a> | <a href="/wiki/Special:Contributions/BernardZ" title="Special:Contributions/BernardZ">contribs</a>)</span> <span class="comment">(<span class="autocomment"><a href="/wiki/God#Theological_approaches" title="God">鈫?lt;/a>Theological approaches -</span> Marx analysis is irrelevantforthis article.)</span></li>

<li>(<a href="/w/index.php?title=God&diff=116678140&oldid=116456238" title="God">cur</a>) (<a href="/w/index.php?title=God&diff=116456238&oldid=116455891" title="God">last</a>) <input type="radio" value="116456238" name="oldid" /><input type="radio" value="116456238" name="diff" /> <a href="/w/index.php?title=God&oldid=116456238" title="God">04:49, 20 March 2007</a> <span class='history-user'><a href="/wiki/User:Bikeable" title="User:Bikeable">Bikeable</a> (<a href="/wiki/User_talk:Bikeable" title="User talk:Bikeable">Talk</a> | <a href="/wiki/Special:Contributions/Bikeable" title="Special:Contributions/Bikeable">contribs</a>)</span> <span class="comment">(<a href="/wiki/WP:AES" title="WP:AES">鈫?lt;/a>Undid revision 116455891 by <a href="/wiki/Special:Contributions/Pgaertner" title="Special:Contributions/Pgaertner">Pgaertner</a> (<a href="/wiki/User_talk:Pgaertner" title="User talk:Pgaertner">talk</a>))</span></li>

<li>(<a href="/w/index.php?title=God&diff=116678140&oldid=116455891" title="God">cur</a>) (<a href="/w/index.php?title=God&diff=116455891&oldid=116374069" title="God">last</a>) <input type="radio" value="116455891" name="oldid" /><input type="radio" value="116455891" name="diff" /> <a href="/w/index.php?title=God&oldid=116455891" title="God">04:47, 20 March 2007</a> <span class='history-user'><a href="/w/index.php?title=User:Pgaertner&action=edit" class="new" title="User:Pgaertner">Pgaertner</a> (<a href="/wiki/User_talk:Pgaertner" title="User talk:Pgaertner">Talk</a> | <a href="/wiki/Special:Contributions/Pgaertner" title="Special:Contributions/Pgaertner">contribs</a>)</span> <span class="comment">(<span class="autocomment"><a href="/wiki/God#Names_of_God" title="God">鈫?lt;/a>Names of God</span>)</span></li>

There are a lot trash strings in it, if I just want to get the valueable string such as <input type="radio" value="116456238" name="diff">,

how can I sort it?[/nobr]

[11024 byte] By [ardmorea] at [2007-11-26 22:25:33]
# 1

You can parse it line by line:

BufferedReader br = new BufferedReader(new FileReader(file));

String line;

while ((line=br.readLine())!=null){

//searching for interesting occurences and performing respective actions

}

Or You can try using already predefined parser like XMLParser

hellbindera at 2007-7-10 11:26:30 > top of Java-index,Java Essentials,Java Programming...
# 2
how to search for interesting occurences
ardmorea at 2007-7-10 11:26:30 > top of Java-index,Java Essentials,Java Programming...
# 3
http://java.sun.com/javase/6/docs/api/java/lang/String.html
hellbindera at 2007-7-10 11:26:30 > top of Java-index,Java Essentials,Java Programming...
# 4
how can I define a string that contains double quotes.<a href="/w/index.php?title=God&oldid=..">? String s1 = "<a href="/w/index.php?title=God&oldid=...."";>
ardmorea at 2007-7-10 11:26:30 > top of Java-index,Java Essentials,Java Programming...
# 5
String s1 = "<a href=\"/w/index.php?title=God&oldid=....\"";>
hellbindera at 2007-7-10 11:26:30 > top of Java-index,Java Essentials,Java Programming...
# 6

compile:

File input error

BufferedReader br = new BufferedReader(new FileReader(args[0]));

String line;

String s1 = "<a href=\"/w/index.php?title=God&oldid=\"";

while ((line=br.readLine())!=null){

//searching for interesting occurences and performing respective actions

>

ardmorea at 2007-7-10 11:26:30 > top of Java-index,Java Essentials,Java Programming...
# 7
as far as I remember you pass a file into FileReader constructorand You are passing a Stringisn't that suprising?
hellbindera at 2007-7-10 11:26:30 > top of Java-index,Java Essentials,Java Programming...
# 8
strike my last.You can give filename. what is Your agrument when launching the app?
hellbindera at 2007-7-10 11:26:30 > top of Java-index,Java Essentials,Java Programming...
# 9
just one filename. suppose it is named as fname.datSo you mean?BufferedReader br = new BufferedReader(new FileReader("fname.dat"));
ardmorea at 2007-7-10 11:26:30 > top of Java-index,Java Essentials,Java Programming...
# 10
is file fname.dat in the same folder as is Your .class file?
hellbindera at 2007-7-10 11:26:30 > top of Java-index,Java Essentials,Java Programming...
# 11
yes.
ardmorea at 2007-7-10 11:26:30 > top of Java-index,Java Essentials,Java Programming...
# 12

I think your simplest option is to use Scanner's findWithinHorizon() method: import java.io.*;

import java.util.*;

public class Test

{

public static void main(String... args)

{

String regex = "<input type=\"radio\" value=\"\\d+\" name=\"\\w+\" />";

try

{

// Test.dat contains the sample data from your post

Scanner scan = new Scanner(new File("Test.dat"));

String str = null;

while ((str = scan.findWithinHorizon(regex, 0)) != null)

{

System.out.println(str);

}

}

catch (Exception ex)

{

ex.printStackTrace();

}

}

}

I don't know how specific you want your regex to be; if you just want to find all INPUT tags, you could use this: String regex = "<input [^>]++>";

uncle_alicea at 2007-7-10 11:26:30 > top of Java-index,Java Essentials,Java Programming...