Internet search engine

hello iam trying to get the simple Java applet that searches multipleInternet search engines. can any one help mehow to do in simple way....
[169 byte] By [7tr7tr] at [2007-9-26 1:38:44]
# 1
Hi,Please refer this URL. http://www.babbage.demon.co.uk/HomePageSearch.htmlI hope this will help you.ThanksBakrudeen
bakrudeen_indts at 2007-6-29 2:27:03 > top of Java-index,Archived Forums,Java Programming...
# 2
Hi,Also refer this URL. http://freewarejava.com/applets/search.shtmlWhich contains various applet search engines.ThanksBakrudeen
bakrudeen_indts at 2007-6-29 2:27:03 > top of Java-index,Archived Forums,Java Programming...
# 3
hi bakrudeenindtsthx for ur quick reply i need code on this part can u help me
7tr7tr at 2007-6-29 2:27:03 > top of Java-index,Archived Forums,Java Programming...
# 4

Hi,

Please go through the above site you will find excellent codes.

Anyhow here is the sample Applet Search engine code.

import java.awt.*;

import java.applet.*;

import java.net.*;

import java.io.*;

import java.util.*;

// v1.5 Home Page Search applet

// 15th February 1998

/*

* This applet provides search facilities for Web sites with no CGI access

*

* Copyright (c) 1997 Richard Everitt G4ZFE

*richard@babbage.demon.co.uk

*

* This program is free software; you can redistribute it and/or modify it

* under the terms of the GNU General Public License as published by the

* Free Software Foundation; either version 2 of the License, or (at your

* option) any later version.

*

* This program is distributed in the hope that it will be useful, but

* WITHOUT ANY WARRANTY; without even the implied warranty of

* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

* General Public License for more details.

*

* You should have received a copy of the GNU General Public License along

* with this program; if not, write to the Free Software Foundation, Inc.,

* 675 Mass Ave, Cambridge, MA 02139, USA.

*

*

*/

/* Applet parameters:

* This applet takes two parameters

* a. hostname, the name of the Demon Home Page (e.g babbage). The name is

* converted to lower case and used to create the URL of the

* pages to search i.e. http://www.<hostname>.demon.co.uk.

* (this parameter is required for Demon Internet users

**only*)

* b. IPaddress, the corresponding IP address for the Home Page. I plan to

*use it as it will allow the search applet to run from

*behind a firewall. Demon have stated in the HomePage AUP

*that the IP address should not be directly used. I do not

*recommend its use (the www.babbage.demon.co.uk IP address

*has already changed once)

*(this parameter is optional)

* c. maxSearch, the maximum number of pages to search. If your site is vast

*then the search will take a long time so people will give

*up. This parameter limits the number of pages to be searched

*to a sensible value reducing the search time (but also

*reducing its usefullness)

*(this parameter is optional. Defaults to 100)

* d. debug,this parameter is for my use. Set to true to display

*parameter and debug information.

*(this parameter is optional)

* e. server,this parameter allows the search applet to be used on non-

*Demon home pages. This parameter should point to the name

*of the site, e.g "http://www.myisp.com/me/" (note use of

*trailing "/" character.

*(this parameter is required for Non-Demon Internet users)

* f. indexName, this parameter allows the search applet to be used on non-

*Demon home pages. This parameter should point to the name

*of the index page (e.g home.htm). If not set then

*"index.htm" ot "index.html" is asummed.

*(this parameter is optional)

* g. bgColour, The background colour for the applet in RGB hex format

*(rrggbb). The default is light grey.

* h. fgColour, The foreground colour for the applet in RGB hex format

*(rrggbb). The default is black.

*

* Example of applet use on a Demon Home Page - www.babbage.demon.co.uk

* <APPLET CODE="HomePageSearch.class" WIDTH=650 HEIGHT=400>

* <PARAM NAME="hostname" VALUE="babbage">

* <EM>Sorry but the Search applet requires a Java aware

* Web browser </EM>

* </APPLET>

*

* Example of applet use on a non-Demon Home Page - www.myisp.co.uk/fred/

* and the "index" page is called home.htm

* <APPLET CODE="HomePageSearch.class" WIDTH=650 HEIGHT=400>

* <PARAM NAME="server" VALUE="http://www.myisp.co.uk/fred/">

* <PARAM NAME="indexName" VALUE="home.htm">

* <EM>Sorry but the Search applet requires a Java aware

* Web browser </EM>

* </APPLET>

*/

/* Modification history

* xxxx 12th February - alpha version

* v0.9 19th February - first beta version

* v0.91 26th February - tidy up

* - added maxSearch functionality

* - added debugMode parameter

* v0.92 03rd March- added server and indexname parameters to

*allow use on non-Demon home pages

* v0.93 09th March- fixed bug with lowercase filenames

*added case insensitive/sensitive/match whole

*word functionality

* v0.94 12th March- fixed bug which resulted in "cannot connect"

*error on non-demon sites.

* v0.95 15th March- Removed some uses of debugMode. Server parameter

*can be set to http://localhost/ to simulate this.

* - Added support for working behind proxy servers/

*firewalls. This uses the IP Address rather than

*the hostname of the server for connections.

* v0.96 17th March- Corrected code to parse HREFs. It was not

*understanding framed format or spaces.

* - Match whole word not working properly

* - HREF="http://server/" was not being followed

*correctly

* v0.97 20th March- fixed bug where incorrect page name was being

*displayed for a match. This was due to the use

*of a global variable for the page name. As the

*stack unwound then this variable was lost. Stack

*used to stored page name instead.

* v0.98 23rd March- if match found on index page (using HTTP) then URL to

*jump to was created incorrectly.

* v0.99 25th March- allow to be resized < 600 pixels

*allow handling of links such as

*"/www/page.html"

* v1.0 8th April- Added bgColour, fgColour applet parameters

*Set default of 100 for maximum number pages to search

*Added option menu for number of pages to search

*Allow handling of framed links such as

*<FRAME SRC="framepage.html">

* v1.1 18th April- Display Page title rather than page name in list of

*matches.

*If match found on index page (using FILE://) then

*URL to jump to was created incorrectly.

*Broke the <= 600 pixels code by adding the "Max

*Pages" option menu. Size of buttons adjusted to

*allow all widgets to be display in < 600 pixles

* v1.2 9th May- Removed hard limits by using vectors rather than

*arrays.

*Search the index page and index page links first.

*Added internalisation support for titles. A subset

*of the special character entity names (e.g è)

*are converted into Unicode characters so that they

*are displayed correctly.

*Fixed bug - "match word" did not match the last word

*on a line.

* v1.3 8th July- Bug fix release

*Links with single quotes e.g

*<a href='page.html'>Test Page</a>

*were not being searched

*fgColour and bgColour only worked with UNIX browsers!

*Fixed to allow useage with MS Windows browsers,

*although due to limitations in Win32 AWT the colour

*of buttons and their text cannot be changed.

* v1.4 4th August- Bug fix release

*Single quote HREF fix in v1.3 broke some normal

*HREF link code (no </A> on same line as HREF).

* v1.5 15th February - Applet now searches .txt files

*Fixed bug for demon internet users who use index

*pages other than index.htm and index.html

*Added further lower case localisation support

*

*/

public class HomePageSearch extends Applet

{

final int MAX_NUMBER_PAGES = 100;// default limit of number

// pages to read

final int BACKSPACE_CHARACTER = 8;// ASCII backspace

final int NUMBER_SPECIAL_CHARS = 45;// Number of special character

// entity names supported

Button search, clear, abort;// GUI buttons

TextField inputArea;// TextField used to enter

// search text in

TextField statusArea;// TextField used to display

// search status

List resultsArea; // List to display matches in

public String hostName; // Host name paramter read by

// applet (required)

public String IPAddress;// IP address parameter read by

// applet (optional)

public int maxSearch = MAX_NUMBER_PAGES;

// Maximum number of pages to

// search (optional)

public boolean debugMode;// TRUE = localhost

// FALSE = on-line

Vector pageNames;// Pages that have been visited

public String server;// Non-Demon home page starting point

public String indexName;// Name of index page (defaults to

// index.html or index.htm)

SearchPages cp = null; // Search thread

Checkbox caseSensitive;

Checkbox caseInsensitive;

Checkbox matchWholeWord;

public boolean matchCase = false;// Flag to indicate if we

// need to match case.

public boolean matchWord = false;// Flag to indicate if we need

// to match the whole word

String versionNumber = "v1.5";

boolean packComponents;// Set to true if size < 600

ColorbgColour;// Background colour of applet

ColorfgColour;// Foreground colour of applet

Choice numPagesChoice;// Option menu to select max

// number of pages to search

Vector pageMatch;// Pages that contain the

// search word

public void init ()

{

Panel p;

getParameters ();// Read the applet parameters

setLayout (new BorderLayout ());

// If applet size is <= 600 pixels then reduce the length

// of text fields, labels etc so that the applet will

// display OK

if (size().width <= 600)

packComponents = true;

else

packComponents = false;

// This panel consists of a input text field where the

// user enters the text to search for. The buttons allow

// the search to be started, aborted and clear the applet's

// output fields.

p = new Panel();

p.setLayout (new FlowLayout());

Label lab = new Label ("Search for: ");

lab.setFont (new Font ("Helvetica", Font.PLAIN, 12));

p.add (lab);

if (packComponents)

inputArea = new TextField ("",15);

else

inputArea = new TextField ("",20);

p.add (inputArea);

if (packComponents)

{

search = new Button ("search");

search.setFont (new Font ("Helvetica", Font.BOLD, 12));

}

else

{

search = new Button (" search ");

search.setFont (new Font ("Helvetica", Font.BOLD, 14));

}

p.add (search);

if (packComponents)

{

clear = new Button ("clear");

clear.setFont (new Font ("Helvetica", Font.BOLD, 12));

}

else

{

clear = new Button (" clear ");

clear.setFont (new Font ("Helvetica", Font.BOLD, 14));

}

p.add (clear);

if (packComponents)

{

abort = new Button ("stop");

abort.setFont (new Font ("Helvetica", Font.BOLD, 12));

}

else

{

abort = new Button (" stop ");

abort.setFont (new Font ("Helvetica", Font.BOLD, 14));

}

abort.disable();

p.add (abort);

if (packComponents)

lab = new Label ("Pages");

else

lab = new Label (" Max. Pages:");

lab.setFont (new Font ("Helvetica", Font.PLAIN, 12));

p.add (lab);

numPagesChoice = new Choice();

p.add (numPagesChoice);

p.setForeground (fgColour);

p.setBackground (bgColour);

add ("North",p);

// This panel lists the results. When an item from the list

// box is double clicked the URL is opened up.

p = new Panel();

p.setLayout (new GridLayout(0,1));

resultsArea = new List (10,false);

p.add (resultsArea);

p.setForeground (fgColour);

p.setBackground (bgColour);

add ("Center",p);

p = new Panel();

Label labVersion = new Label (versionNumber);

labVersion.setFont (new Font ("Helvetica", Font.PLAIN, 12));

p.add (labVersion);

CheckboxGroup caseSense = new CheckboxGroup();

// This textfield shows the status of the search to provide

// some feedback to the user. The page count is displayed.

if (packComponents)

statusArea = new TextField ("",14);

else

statusArea = new TextField ("",20);

statusArea.setEditable (false);

p.add (statusArea);

if (packComponents)

caseInsensitive = new Checkbox ("in-sensitive");

else

caseInsensitive = new Checkbox ("case in-sensitive");

p.add (caseInsensitive);

caseInsensitive.setCheckboxGroup (caseSense);

if (packComponents)

caseSensitive = new Checkbox ("sensitive" );

else

caseSensitive = new Checkbox ("case sensitive" );

p.add (caseSensitive);

caseSensitive.setCheckboxGroup (caseSense);

caseSense.setCurrent (caseInsensitive);

if (packComponents)

matchWholeWord = new Checkbox ("whole word");

else

matchWholeWord = new Checkbox ("match whole word");

p.add (matchWholeWord);

p.setForeground (fgColour);

p.setBackground (bgColour);

add ("South",p);

disableButtons ();// Disable buttons until text entered

// Create vector to hold pages that have been found

// and pages that contain the search text

pageNames = new Vector();

pageMatch = new Vector();

// Now that we know what the maxSearch parameter is fill

// in sensible page numbers

for (int i=maxSearch / 5; i<= maxSearch; i += maxSearch / 5)

{

numPagesChoice.addItem (Integer.toString(i));

}

numPagesChoice.setFont (new Font ("Helvetica", Font.PLAIN, 12));

// Set the default number of pages to be searched

numPagesChoice.select (0);

maxSearch = maxSearch / 5;

// Set the background + foreground applet colours

// setBackground(bgColour);

// setForeground(fgColour);

// Always set text input field to white for readability

inputArea.setBackground (Color.white);

}

// Function enableButtons

// Purpose - enable use of buttons in GUI

public void enableButtons ()

{

search.enable();

clear.enable();

}

// Function disableButtons

// Purpose - disable use of buttons in GUI

final void disableButtons ()

{

search.disable();

clear.disable();

}

// Function getParameters

// Purpose - read applet parameters

final void getParameters ()

{

hostName = getParameter ("hostname");

IPAddress = getParameter ("IPAddress");

String num = getParameter ("maxSearch");

String arg = getParameter ("debug");

server = getParameter ("server");

indexName = getParameter ("indexName");

String colour = getParameter("bgColour");

if (colour == null)

{

// I wish this could be locali[sz]ed so that I could

// write lightGrey !!

bgColour = Color.lightGray;

}

else

{

try

{

bgColour = new Color(Integer.parseInt(colour, 16));

}

catch (NumberFormatException e)

{

bgColour=Color.lightGray;

}

}

colour = getParameter("fgColour");

if (colour == null)

{

fgColour = Color.black;

}

else

{

try

{

fgColour = new Color(Integer.parseInt(colour, 16));

}

catch (NumberFormatException e)

{

bgColour=Color.black;

}

}

// Check for missing parameters

if (hostName == null && server == null)

{

statusArea.setText ("Error - no host/server");

System.out.println (" Error: No hostname specified");

hostName = "none";

}

maxSearch = (num == null) ? MAX_NUMBER_PAGES : Integer.parseInt(num);

debugMode = (arg == null) ? false : true;

if (debugMode)

{

System.out.println ("hostname is " + hostName);

System.out.println ("IPAddress is " + IPAddress);

System.out.println ("maxSearch is " + maxSearch);

System.out.println ("debugMode is " + debugMode);

System.out.println ("server is " + server);

System.out.println ("indexName is " + indexName);

System.out.println ("bgColour is " + bgColour);

System.out.println ("fgColour is " + fgColour);

}

}

// Display parameter information

public String[][] getParameterInfo()

{

String[][] info =

{

{"hostname","String","hostname of site"},

{"IPAddress","String","IP address of site"},

{"maxSearch","String","maximum number of pages to search"},

{"debug","String","debug mode"},

{"server","String","Home Page URL"},

{"indexName","String","Name of index page"},

{"bgColour","Color","Background colour of applet"},

{"fgColour","Color","Foreground colour of applet"}

};

return info;

}

// Display applet information

public String getAppletInfo()

{

return "Home Page Search Applet v1.5";

}

// Function keyDown

// Purpose - enable or disable buttons. When search text is entered

// the search and clear buttons are enabled. When no search text has

// been entered the buttons are disabled.

public boolean keyDown (Event e, int nKey)

{

boolean boolDone = true;

String text;

text = inputArea.getText();// Read the search text

int n = text.length(); // Count number of chars

if (nKey == BACKSPACE_CHARACTER)// catch backspace character

{

boolDone = false;

n--;

}

else

{

boolDone = false;

n++;

}

if (n > 0)

{

enableButtons ();

}

else

{

disableButtons ();

}

return (boolDone);

}

// Purpose - this function handles all the GUI events

public boolean action (Event e, Object o)

{

String text;// Search text entered by user

String searchText;// Lower case version of above

URL newURL = null;

// Check to see if the option menu has been selected

if (e.target instanceof Choice)

{

Choice c = (Choice) e.target;

try

{

maxSearch = Integer.parseInt(c.getSelectedItem(), 10);

}

catch (NumberFormatException ex)

{

maxSearch = MAX_NUMBER_PAGES;

}

if (debugMode)

System.out.println ("maxSearch is now " + maxSearch);

}

// Check to see if a checkbox has been pressed

if (e.target instanceof Checkbox)

{

if (caseSensitive.getState() == true)

matchCase = true;

else

matchCase = false;

if (matchWholeWord.getState() == true)

matchWord = true;

else

matchWord = false;

}

// A button has been pressed - determine which

if (e.target instanceof Button)

{

if (e.target == search)

{

// Search button pressed - read in

// search text entered

text = inputArea.getText();

// Make sure ther's somthing to search for

if (text.length() == 0)

return (false);

// New search so clear the GUI out

if (resultsArea.countItems() > 0)

resultsArea.clear();

disableButtons ();

abort.enable();

statusArea.setText("");

// Clear out previous search data

pageNames.removeAllElements();

pageMatch.removeAllElements();

// We're off - start the search thread

cp = new SearchPages (this, hostName, text, maxSearch);

cp.start();

}

else if (e.target == abort)

{

// Abort button pressed - stop the thread

if (cp != null)

cp.stop();

cp = null;

// Enable buttons for another search

enableButtons();

abort.disable();

}

else

{

// Clear button pressed - clear all the fields

// and return

inputArea.setText("");

statusArea.setText("");

// Clear radio buttons

caseSensitive.setState(false);

caseInsensitive.setState(true);

matchWholeWord.setState(false);

// Clear option menu

numPagesChoice.select (0);

try

{

maxSearch = Integer.parseInt(numPagesChoice.getSelectedItem(), 10);

}

catch (NumberFormatException ex)

{

maxSearch = MAX_NUMBER_PAGES;

}

if (debugMode)

System.out.println ("maxSearch is now " + maxSearch);

if (resultsArea.countItems() > 0)

resultsArea.clear();

cp = null;

}

}

// Selection made from the list of matches

if (e.target instanceof List)

{

List list = (List) e.target;

int index = list.getSelectedIndex();

// Extract the page name from the list

if (index < pageMatch.size())

{

String URLSelected = (String)pageMatch.elementAt(index);

try

{

// If URL stored then use it

if (URLSelected.startsWith ("http:") ||

URLSelected.startsWith ("file:"))

newURL = new URL(URLSelected);

else if (server == null)

newURL = new URL("http://www." + hostName + ".demon.co.uk/" + URLSelected);

else

newURL = new URL (server + URLSelected);

}

catch(MalformedURLException excep)

{

System.out.println("action(): Bad URL: " + newURL);

}

if (debugMode)

System.out.println (" Jumping to ... " + newURL.toString());

// Display the document

getAppletContext().showDocument(newURL,"_self");

}

}

return true;// We're done

}

// Purpose - checks to see if a page has already been

// visited by the search thread

boolean checkAlreadyFound (String page)

{

if (pageNames.size() == 0)

return (false);

// Check this is a new one

for (int i=1; i < pageNames.size() ;i++)

{

String pageName = (String) pageNames.elementAt(i);

if (pageName.equalsIgnoreCase (page))

return (true);

}

return (false);

}

// Purpose - adds a page visited by the search thread to

// the list of visited pages

// This prevents the same link from being followed if it

// is on multiple pages.

public void incrementPages (String page)

{

// Check if page already indexed

if (checkAlreadyFound (page))

return;

pageNames.addElement (page);

// Provide feedback to the user

statusArea.setText ("Searching page: " + pageNames.size());

}

// Purpose - returns the number of pages that the search

// thread has visited

public int getTotalPages ()

{

return pageNames.size() - 1;

}

// Purpose - convert special characters in the page title

// to Unicode characters so they are displayed properly

final protected String translateSpecialChars (String title)

{

int start;

int i;

// HTML representation of selected extended chars

String rawString[] = {"á","â","æ",

"à","ä","ç",

"é","ê","è",

"ë","î","ï",

"ô","ö","ß",

"ü","ÿ","©",

"£","®","<",

">","&",""",

"ã","å","ì",

"í","ð","ñ",

"ò","ó","õ",

"÷","ø","ù",

"ú","û","ý",

"þ","×"," ",

"§","¢","°"};

// Unicode representation of above

char translatedChar[] = {'\u00e1','\u00e2','\u00e6',

'\u00e0','\u00e4','\u00e7',

'\u00e9','\u00ea','\u00e8',

'\u00eb','\u00ee','\u00ef',

'\u00f4','\u00f6','\u00df',

'\u00fc','\u00ff','\u00a9',

'\u00a3','\u00ae','\u003c',

'\u003e','\u0026','\u0022',

'\u00e3','\u00e5','\u00ec',

'\u00ed','\u00f0','\u00f1',

'\u00f2','\u00f3','\u00f5',

'\u00f7','\u00f8','\u00f9',

'\u00fa','\u00fb','\u00fd',

'\u00fe','\u00d7','\u00a0',

'\u00a7','\u00a2','\u00b0'};

StringBuffer translated = new StringBuffer ("");

String titleString = title;

// Check the title for each of the above HTML special chars

for (int loop=0; loop < NUMBER_SPECIAL_CHARS; loop++)

{

if (translated.length() > 0)

{

titleString = translated.toString();

translated = new StringBuffer ("");

}

start = titleString.indexOf (rawString[loop]);

if (start != -1)

{

// HTML special character found so replace it

// with the Unicode equivalent for display

for (i=0; i < start; i++)

translated.insert (i,titleString.charAt(i));

translated.append (translatedChar[loop]);

for (i=start+rawString[loop].length(); i < titleString.length(); i++)

translated.append (titleString.charAt(i));

}

}

return (translated.length() == 0) ? titleString : translated.toString();

}

// Purpose - adds a page to the list of matches in the results

// ListBox. The page title and matching text are displayed.

// The page name is also stored so that the URL can be jumped

// to.

public void addToList (String Page, String line, String title)

{

String translatedTitle = title;

String translatedLine = line;

if (title.indexOf("&") != -1 &&

title.indexOf(";") != -1)

{

// check for HTML special characters

// e.g " ç etc.

translatedTitle = translateSpecialChars (title);

}

if (line.indexOf("&") != -1 &&

line.indexOf(";") != -1)

{

// check for HTML special characters

// e.g " ç etc.

translatedLine = translateSpecialChars (line);

}

resultsArea.addItem ("Title:\" " + translatedTitle + "\" Text: " + translatedLine);

pageMatch.addElement(Page);

}

}

//=========================================================================

//Class SearchPages

//=========================================================================

// This thread performs the search. The search starts with the index.html or

// index.htm page and then follows all local links

// (e.g. <A HREF="fred.html">link to fred</A> or

// <A HREF="http://www.<hostname>.demon.co.uk/fred.html">link to fred</A>.

// Note external links are ignored.

class SearchPages extends Thread

{

// Search state transitions

// First find top level pages (from the index page)

// Search the above pages first

// Search all other pages

final byte FIND_TOP_LEVEL_PAGES = 0;

final byte SEARCH_TOP_LEVEL_PAGES = 1;

final byte SEARCH_OTHER_PAGES = 2;

String hostName;// Host name of site e.g babbage

HomePageSearch app;// Parent applet

String textToFind;// String to find

int maxPages;// Maximum number of pages to visit

int hitsFound = 0;// No of occurrences of search string

static final byte URLCOUNT = 2;

boolean pageOpened = false;// Flag to indicate if index page

// opened OK

boolean proxyDetected = false; // Flag to indicate if a proxy server

// or firewall has been detected

int topLevelSearch; // Search the index page links first

Vector topLevelPages;// Page names in the index page

Vector nextLevelPages; // Lower level pages

// Constructor

SearchPages (HomePageSearch applet, String hn, String text, int maxSearch)

{

app = applet;

hostName = hn;

textToFind = text;

maxPages = maxSearch;

}

public void run()

{

// State 1: search the index page, remembering all links on

// the index page

topLevelSearch = FIND_TOP_LEVEL_PAGES;

topLevelPages = new Vector();

nextLevelPages = new Vector();

// Check to see if a proxy is being used. If so then we use

// IP address rather than hostnames

proxyDetected = detectProxyServer();

startSearch();

app.enableButtons();

app.abort.disable();

if (hitsFound == 0 && pageOpened == true)

app.statusArea.setText ("No matches found");

else if (hitsFound == 1)

app.statusArea.setText (hitsFound + " match found");

else

app.statusArea.setText (hitsFound + " matches found");

}

// Function: detectProxyServer

// Purpose: attempt to see if a proxy server or firewall is blocking

// a connection back to the originating server. If so then the

// variable proxyDetected is set to true and all future connections

// to the server will use the IP Address (if passed as a parameter)

final boolean detectProxyServer ()

{

DataInputStream dis = null;

String url = "";

// Allow for non-Demon Home Page

if (app.server == null)

{

if (app.indexName == null)

url = "http://www." + hostName + ".demon.co.uk/index.html";

else

url = "http://www." + hostName + ".demon.co.uk/" + app.indexName;

}

else

{

if (app.indexName == null)

url = app.server + "index.html";

else

url = app.server + app.indexName;

}

// Attempt to connect to this URL

try

{

URL doc = new URL (url);

dis = new DataInputStream (doc.openStream());

}

catch (Exception e)

{

// Unable to connect. This may be an incorrect applet

// parameter. Lets assume though it's a proxy server

// that's stopping use using the hostname.

return true;

}

return false;

}

final void startSearch()

{

DataInputStream dis = null;

String [] url = {"",""};

String currentPageName="";// HTML page currently being searched

// Allow for non-Demon Home Page

if (app.server == null)

{

if (app.indexName == null)

{

url[0] = "http://www." + hostName + ".demon.co.uk/index.html";

url[1] = "http://www." + hostName + ".demon.co.uk/index.htm";

}

else

{

url[0] = "http://www." + hostName + ".demon.co.uk/" + app.indexName;

url[1] = "";

}

}

else

{

if (app.indexName == null)

{

url[0] = app.server + "index.html";

url[1] = app.server + "index.htm";

}

else

{

// Allow for an index page other than

// "index.html"

url[0] = app.server + app.indexName;

url[1] = "";

}

}

// If a proxy server/firewall has been detected then use the

// IP address (if supplied) of the originating server rather

// than the hostname.

if (proxyDetected && app.IPAddress != null)

{

if (app.indexName == null)

{

url[0] = "http://" + app.IPAddress + "/index.html";

url[1] = "http://" + app.IPAddress + "/index.htm";

}

else

{

url[0] = "http://"+ app.IPAddress + "/" + app.indexName;

url[1] = "";

}

}

for (int i=0; i < URLCOUNT; i++)

{

try

{

currentPageName = url;

URL doc = new URL (url);

dis = new DataInputStream (doc.openStream());

}

catch (Exception e)

{

System.out.println ("StartSearch(): Exception: " + e + " Page= " + url);

continue;// Try next page

}

if (dis != null)// Check page opened OK

{

pageOpened = true;

i = URLCOUNT; // Exit the loop

}

}

if (pageOpened == false)

{

app.statusArea.setText ("Cannot connect to server");

System.out.println ("StartSearch(): No pages to search");

return;// Nothing to do

}

else

{

// Search the first page. Any links on the index page

// are saved and searched next.

searchPage (dis,currentPageName);

}

// State 2: search links found on the index page

topLevelSearch = SEARCH_TOP_LEVEL_PAGES;

for (int i=0; i < topLevelPages.size(); i++)

{

checkLink ((String)topLevelPages.elementAt(i));

// Check that the maximum number of pages to be

// searched has not been reached

if (app.getTotalPages () >= maxPages)

return;

}

// State 3: spider all other pages

topLevelSearch = SEARCH_OTHER_PAGES;

for (int i=0; i < nextLevelPages.size(); i++)

{

checkLink ((String)nextLevelPages.elementAt(i));

// Check that the maximum number of pages to be

// searched has not been reached

if (app.getTotalPages () >= maxPages)

return;

}

}

// Purpose - read all lines on a page - extracting local links

// and checking for the presence of the search string

final void searchPage (DataInputStream dis, String url)

{

try

{

String input;// Raw line read in

String upperCaseInput; // Uppercase version of

// above

String link;// HTML link found

String temp;

String title = "";// Page title

// Read a line at a time

while ((input = dis.readLine()) != null)

{

// Convert to upper case (makes comparisons

// easier)

upperCaseInput = input.toUpperCase();

// check for document title

temp = parseForTitle (input, upperCaseInput, dis);

// If a title has been found then remember it

// so that it can be displayed in the list box

if (temp != null && temp.length() > 0)

title = temp;

// check for match after title has been found

// (Don't bother searching the title though)

if (title.length() > 0 && temp == null)

checkMatch (input, url, title);

// check to see if this line contains

// a link

link = parseForLink (upperCaseInput, input);

if (link != null)

{

// Check if the maximum number

// of pages to search has been

// reached

if (app.getTotalPages () >= maxPages)

return;

if (topLevelSearch == FIND_TOP_LEVEL_PAGES)

topLevelPages.addElement (link);

else if (topLevelSearch == SEARCH_TOP_LEVEL_PAGES)

nextLevelPages.addElement (link);

else

checkLink (link);

}

}

}

catch (IOException e)

{

System.out.println ("searchPage(): Exception: " + e + " on Page: " + url);

}

}

// Purpose - scan a line of text looking for the title of the page

// e.g <TITLE> My Page </TITLE>

// Titles may be multi-line so this routine reads from the document

// until the </TITLE> tag has been read or 25 characters read (max

// meaningful length of a title) (same as Alta Vista!)

final String parseForTitle (String rawInput, String input, DataInputStream dis)

{

int i,j,k,l;// Loop counters

int titleLength = 0;// Keep track of title length

// as only first 25 characters

// are displayed

int start = 0;// Start of title text

String temp;

StringBuffer title = new StringBuffer ("");

boolean foundTag = false;

try

{

// Search for <TITLE> tage

// Can the TITLE tag have spaces? e.g < TITLE > (assume not!)

i = input.indexOf ("<TITLE");

if (i != -1)

{

// Allow for ><HTML><HEAD><TITLE>Title</TITLE></HEAD>

j = input.indexOf (">",i);

if (j != -1)

{

while (titleLength <= 25 && foundTag == false)

{

start = j + 1;

for (k=start; k < rawInput.length(); k++)

{

if (foundTag == false && rawInput.charAt(k) != '<')

{

titleLength++;

title.append (rawInput.charAt(k));

}

else

foundTag = true;

}

// Continue reading from doc

// if </TITLE> not found

if (foundTag == false)

{

rawInput = dis.readLine();

j = -1;

}

}

// Remove leading and trailing spaces

temp = title.toString();

return (temp.trim());

}

}

}

catch (IOException e)

{

System.out.println ("parseForTitle(): Exception: " + e);

}

return (null);// No title found

}

// Purpose - scan a line of text looking for links to other

// pages. The following types of links are currently supported

// 1. Normal links, e.g <A HREF="page.html">Text</A>

// 2. Frames, e.g <FRAME scrolling=yes SRC="contents.html">

final String parseForLink (String upperCaseInput, String input)

{

int i,j,k,l;

String temp = null;

String link = null;

// Look for links to other pages

// 1. Normal links, e.g <A HREF="page.html">Text</A>

i = upperCaseInput.indexOf ("HREF");

if (i != -1)

{

// Locate position of quote marks

j = upperCaseInput.indexOf ("\"",i);

k = upperCaseInput.indexOf ("\"",j+1);

// Locate position of </a>

l = upperCaseInput.indexOf ("</A>",i);

// If double quotes were not found then try using

// single quote marks

if (j == -1 || k == -1 || (j > l && k == -1))

{

j = upperCaseInput.indexOf ("\'",i);

k = upperCaseInput.indexOf ("\'",j+1);

}

// Remove leading and trailing spaces

if (j != -1 && k != -1)

{

// Extract the link name

temp = input.substring (j+1,k);

// Remove leading and trailing spaces

link = temp.trim ();

return (link);

}

}

// 2. Frames, e.g <FRAME scrolling=yes SRC="contents.html">

i = upperCaseInput.indexOf ("FRAME");

if (i != -1)

{

// Locate position of SRC tag

l = upperCaseInput.indexOf ("SRC",i);

if (l != -1)

{

// Locate position of quote marks

j = upperCaseInput.indexOf ("\"",l);

k = upperCaseInput.indexOf ("\"",j+1);

// If double quotes were not found then try

// single quote marks

if (j == -1)

{

j = upperCaseInput.indexOf ("\'",i);

k = upperCaseInput.indexOf ("\'",j+1);

}

// Remove leading and trailing spaces

if (j != -1 && k != -1)

{

// Extract the link name

temp = input.substring (j+1,k);

// Remove leading and trailing spaces

link = temp.trim ();

return (link);

}

}

}

return (null);

}

// Purpose - scan a line of text to see if the search string is

// present. If so then add the line to the list of matches.

final void checkMatch (String input, String url, String title)

{

// remove HTML tags before search

String searchLine = removeHTMLTags (input);

// If the line contains some non-HTML text

// then search it

if (searchLine.length() > 0)

{

// Check if case-sensitive search

if (app.matchCase)

{

// Check if attempting to match whole word

if (app.matchWord)

{

if (searchLine.indexOf (" " + textToFind + " ") != -1 ||

(searchLine.indexOf (textToFind) != -1 && searchLine.length() == textToFind.length()) ||

(searchLine.indexOf (" " + textToFind) != -1 && textToFind.charAt(textToFind.length()-1) == searchLine.charAt(searchLine.length()-1)))

{

// Found it! Display the match

app.addToList (url, searchLine, title);

hitsFound++;

}

}

else if (searchLine.indexOf (textToFind) != -1)

{

// Found it! Display the match

app.addToList (url, searchLine, title);

hitsFound++;

}

}

else

{

String lower1 = searchLine.toLowerCase();

String lower2 = textToFind.toLowerCase();

// Check if attempting to match whole word

if (app.matchWord)

{

if (lower1.indexOf (" " + lower2 + " ") != -1 ||

(lower1.indexOf (lower2) != -1 && lower1.length() == lower2.length()) ||

(lower1.indexOf (" " + lower2) != -1 && lower2.charAt(lower2.length()-1) == lower1.charAt(lower1.length()-1)))

{

// Found it! Display the match

app.addToList (url, searchLine, title);

hitsFound++;

}

}

else if (lower1.indexOf (lower2) != -1)

{

// Found it! Display the match

app.addToList (url, searchLine, title);

hitsFound++;

}

}

}

}

// Purpose - remove HTML tages from a line (e.g
). The

// algorithm is a bit simplistic in that it cannot handle

// HTML tags spilt over one line.

final String removeHTMLTags (String inputLine)

{

StringBuffer outputLine = new StringBuffer ("");

boolean foundTag = false;

for (int i=0; i < inputLine.length(); i++)

{

if (inputLine.charAt (i) == '<')

foundTag = true;

else if (inputLine.charAt(i) == '>')

foundTag = false;

else if (foundTag == false)

outputLine.append (inputLine.charAt(i));

}

return (outputLine.toString());

}

// Purpose - checks validity of a link. If the link is valid

// it's added to the list of visited links and then followed

final void checkLink (String link)

{

URL doc;// URL of link

DataInputStream dis = null;

int i;

boolean qualifiedLink = false;

// Skip the link if it's just an offset in this document

if (link.startsWith("#"))

return;

// Strip #offset tag off

if ((i = link.indexOf ("#")) != -1)

{

String substr =link.substring (0,i);

link = substr;

}

// Check that this link hasn't already been followed

if (app.checkAlreadyFound (link))

return;

// Ignore non HTML links and start page

if ((link.startsWith ("mailto:")) ||

(link.startsWith ("wais:")) ||

(link.startsWith ("gopher:")) ||

(link.startsWith ("newsrc:")) ||

(link.startsWith ("ftp:")) ||

(link.startsWith ("nntp:")) ||

(link.startsWith ("telnet:")) ||

(link.startsWith ("news:")) ||

(link.equalsIgnoreCase (app.indexName)) ||

(link.equalsIgnoreCase ("index.html")) ||

(link.equalsIgnoreCase ("index.htm")))

return;

// Check that it is not an outside link (e.g www.mycom.com)

if (link.indexOf ("http:") != -1)

{

String pageName;

if (app.server == null)

pageName = "http://www."+ hostName + ".demon.co.uk";

else

pageName = app.server;

// Allow for local host being displayed as an

// IP address rather than host name

if (proxyDetected && app.IPAddress != null)

pageName = "http://" + app.IPAddress;

// This is a "fully qualified link"

// e.g "http://www.babbage.demon.co.uk/link.html"

qualifiedLink = true;

// If the link doesn't contain the local host name

// or IP address then it's an external link - so

// ignore it

if (link.indexOf (pageName) == -1)

return;

}

// Check that it's a HTML page

if (link.indexOf (".htm") == -1 &&

link.indexOf (".HTM") == -1 &&

link.indexOf (".TXT") == -1 &&

link.indexOf (".txt") == -1 &&

link.indexOf (".phtml") == -1 &&

link.indexOf (".PHTML") == -1)

return;

// Valid link - add it to the array of visited links

app.incrementPages (link);

// Follow link and read its contents

try

{

if (app.server == null)

doc = new URL ("http://www."+ hostName + ".demon.co.uk/" + link);

else

{

if (link.startsWith ("/"))

{

// Remove the "/" from the link as the

// server name has a terminating "/"

String temp = link.substring (1, link.length());

link = temp;

}

doc = new URL (app.server + link);

}

// Link may be absolute

// (e.g www.babbage.demon.co.uk/fred.html")

if (qualifiedLink)

doc = new URL (link);

// If a proxy server/firewall has been detected then use the

// IP address (if supplied) of the originating server rather

// than the hostname.

if (proxyDetected && app.IPAddress != null)

doc = new URL ("http://" + app.IPAddress + "/" + link);

if (app.debugMode)

System.out.println ("Found link " + link);

dis = new DataInputStream (doc.openStream());

// Start searching this new link

searchPage (dis, link);

}

catch (IOException e)

{

System.out.println ("checkLink(): Exception: " + e + " Page: " + link);

}

}

}

I hope this will be helpful for you.

Thanks

Bakrudeen

bakrudeen_indts at 2007-6-29 2:27:03 > top of Java-index,Archived Forums,Java Programming...