Source Code for http://www.brettb.com/BuildingAJavaScriptSearchEngine.asp
<html>
<head>
<title>Creating a JavaScript Search Engine for your Website</title>
<link REL="stylesheet" HREF="BrettbDotCom.css" TYPE="text/css">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<meta name="description" content="Tutorial explaining how to create a client-side JavaScript search engine for your website. Only
a basic knowledge of HTML and a little JavaScript is required">
<meta name="keywords" content="javascript, search, engine, facility, page">
</head>
<body>
<script language="JavaScript">
function SubmitHeaderSearchForm() {
document.FormSubmitHeaderSearch.submit();
}
</script>
<div align="center"><table border="0" cellpadding="0" cellspacing="0" width="800"
style="border: 1px solid rgb(0,0,0)">
<tr>
<td width="167"><a href="Default.asp" title="Brettb.Com (Back to Home Page)"><img src="Images/BrettbDotCom_Logo.jpg" width="167" height="72"
alt="Brettb.Com" border="0"></a></td>
<td width="633"><img src="HeaderImages/Torbay_Nights.jpg" width="633" height="72" alt=""></td>
</tr>
</table><table border="0" cellpadding="0" width="802" bgcolor="#DD3333"
style="border-left: 1px solid rgb(0,0,0); border-right: 1px solid rgb(0,0,0); border-top: 1px none rgb(0,0,0); border-bottom: 1px solid rgb(0,0,0)">
<tr>
<td class="TDHeader" valign="middle" align="left"> <a href="Default.asp"
title="HOME">HOME</a> | <a href="web.asp" title="ABOUT ME">ABOUT ME</a> | <a
href="Biotechnology.asp" title="BIOTECHNOLOGY">BIOTECHNOLOGY</a> | <a
href="technicalwriting.asp" title="ARTICLES">ARTICLES</a> | <a
href="DeveloperTools.asp" title="DEVELOPER TOOLS">TOOLS</a> | <a href="Gallery.asp"
title="GALLERY">GALLERY</a> | <a href="contact.asp" title="CONTACT">CONTACT</a></td>
<form method="POST" action="SearchResults.asp" name="FormSubmitHeaderSearch"><td align="right" class="TDHeader" valign="middle">Search: <input type="text" name="query"
size="20" maxlength="100"> <a href="javascript:SubmitHeaderSearchForm();">Go</a></td></form>
</tr>
</table>
</center></div>
<div align="center"><center>
<table border="0" cellpadding="8" cellspacing="0" width="802"
style="border-left: 1px solid rgb(0,0,0); border-right: 1px solid rgb(0,0,0)">
<tr>
<td bgcolor="#E0E0E0" width="151" valign="top" align="left" class="TDLeftPanel" nowrap><strong><a href="DeveloperTools.asp" title="DEVELOPER TOOLS">DEVELOPER TOOLS</a><br>
</strong> <a href="ASPDocumentationTool.asp" title="ASP Documentation Tool">ASP Doc
Tool</a><br>
<a href="ASPNetDocumentationTool.asp" title="ASP.NET Documentation Tool">ASP.NET Doc
Tool</a><br>
<a href="http://www.winnershtriangle.com/w/Products.SQLDocumentationTool.asp" title="SQL Documentation Tool" target="_blank">SQL Doc
Tool</a><br>
<a href="IndexServerCompanion.asp" title="Index Server Companion">Index Server
Companion</a><br>
<a href="TheWebsiteUtility.asp" title="The Website Utility">The Website Utility</a><p><strong>TECHNICAL
ARTICLES <br>
</strong> <a href="ASPWatchArticles.asp" title="ASP Articles">ASP</a><br>
<a href="ASP.NETArticles.asp" title="ASP.NET Articles">ASP.NET</a><br>
<a href="JavaScriptArticles.asp" title="JavaScript Articles">JavaScript</a><br>
<a href="SQL_Help.asp" title="Transact SQL Articles">Transact SQL</a></p>
<p><strong>PHOTO GALLERIES</strong><br>
<a href="CanonEOS300D_Gallery1.asp" title="Canon EOS 300D Samples">Canon EOS 300D
Samples</a><br>
<a href="Red_Arrows_2004.asp" title="Red Arrows 2004">Red Arrows 2004</a><br>
<a href="Living_Coasts_Photos.asp" title="Living Coasts">Living Coasts</a><br>
<a href="backgrounds.asp" title="Web Page Backgrounds">Web Page Backgrounds</a><br>
<a href="gallery.asp" title="More Galleries...">More Galleries...</a></p>
<p><strong>NEW STUFF</strong><br>
<a href="TransactSQLColorCoder.asp" title="Transact SQL Color Coder">SQL Color Coder</a><br>
<a href="CanonEOS300D_Gallery3.asp" title="Canon EOS 300D Samples">Canon EOS 300D
Samples</a><br>
<a href="TheWebsiteUtility.asp" title="The Website Utility">The Website Utility</a><br>
<a href="Website_Search_Engine_Optimisation.asp" title="Useful Search Engine Optimization Techniques">Search Engine Optimisation</a><br>
<a href="BuildingAnASPSearchEngine.asp" title="Creating an ASP Search Engine for your Website">Build an ASP Search Engine</a><br>
<a href="MyTropicalFishtank.asp" title="My Tropical Freshwater Fishtank">My Tropical Fishtank</a><br>
<a href="Investments_ISAs.asp" title="Savings & Investments">Savings & Investments</a><br>
<a href="what's_new.asp" title="What's New?">Other New Stuff...</a></p>
<p><strong>POPULAR STUFF</strong><br>
<a href="VBScriptRegularExpressions.asp" title="VBScript Regular Expressions">Regular
Expressions</a><br>
<a href="ASPDocumentationTool.asp">ASP Documentation Tool</a><br>
<a href="SearchingIndexServerWithASP.asp" title="Index Server & ASP">Index
Server & ASP</a><br>
<a href="js_banner_ad_rotator.asp" title="JavaScript Banner Ad Rotator">JavaScript Ad Rotator</a></p>
<p><strong>LINKS</strong><br>
<a href="http://www.winnershtriangle.com/w/Default.asp" title="Business Website"
target="_blank">Business Website</a><br>
<a href="http://authors.aspalliance.com/brettb/" title="ASPAlliance Articles"
target="_blank">ASPAlliance Articles</a><br>
</p>
<p><img alt="Microsoft Certified Professional"
src="images/MCP_c_smaller.gif" width="151" height="38"></p>
<p align="center">
<script type="text/javascript"><!--
google_ad_client = "pub-7044749527879330";
google_alternate_color = "FFFFFF";
google_ad_width = 120;
google_ad_height = 90;
google_color_border = "000000";
google_color_bg = "DDDDDD";
google_color_link = "FF0000";
google_color_url = "DD3333";
google_color_text = "CCCCCC";
google_ad_format = "120x90_0ads_al_s";
google_ad_channel ="6714185213";
//--></script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
</p>
<p align="center">
<SCRIPT type='text/javascript' language='JavaScript' src='http://xslt.alexa.com/site_stats/js/s/a?amzn_id=brettbcom&url=www.brettb.com'></SCRIPT>
</p>
</td>
<td width="551" valign="top" align="left">
<p><a href="Default.asp" title="Home">Home</a> > <a href="JavaScriptArticles.asp"
title="JavaScript Articles">JavaScript Articles</a></p>
<h1>Creating a JavaScript Search Engine for your Website</h1>
<p><em>A guide to using The Website Utility to create a JavaScript Search Engine for your
website.</em></p>
<h2>Why Websites need Search Facilities</h2>
<p>Once a website grows beyond half a dozen pages then it can sometimes be difficult to
create a site navigation scheme that allows users to quickly find what they're looking
for. One way to improve site navigation is to add a search facility to the website. Adding
a search facility brings major benefits to a website, making it easier to find information
as well as adding an additional method of navigating a website. Search facilities are
generally well used, and will frequently appear within the top ten most requested pages on
a website. </p>
<h2>Search Engines Allow Visitors to Search your Content</h2>
<p>One of the easiest ways to add a facility for searching the pages in your website is to
link to search results for your website from one of the major search engines. Google and
other major search engines allow you to do this. However, using this method it can be
difficult to integrate the search results with the design of your website. It also carries
the obvious risk of a website visitor leaving your website <em>and not returning</em>!
Even worse, your website visitors may see an advert for a <em>competitor</em> on the
search results page, and so go and <em>do their business elsewhere</em>! </p>
<h2>Building Your Own Search Engine</h2>
<p>There are a number of software solutions that allow you to put your own search engine
on your website. These include server-side search solutions available such as Microsoft's <a
href="IndexServerCompanion.asp">Index Server</a> or <a href="http://www.htdig.org/"
target="_blank">ht://Dig</a>. Although they allow sophisticated search facilities to be
created, they generally require a high level of technical knowledge to install and
configure. To create a search page with these solutions, programming knowledge of
server-side scripting languages such as Active Server Pages (ASP) or PHP is also usually
required, or you will need to employ somebody to create the code for you. To complicate
matters, not many web hosting companies support these search solutions, and those that do
often charge additional hosting fees.</p>
<p>An alternative is to use a purely client-side solution using a browser scripting
language such as JavaScript. This has no additional software requirements on the server,
and will work regardless of whether the website is hosted on Windows, Unix or Linux
servers. Indeed, with a bit of work it is also possible to make a JavaScript search
facility work on disk based HTML content such as a website on a CD-ROM or DVD.</p>
<h2>The Website Utility Builds JavaScript Search Engines</h2>
<p><a href="http://www.winnershtriangle.com/w/Products.TheWebsiteUtility.asp">The Website
Utility</a> is able to create a client-side JavaScript search facility for a website. The
walkthrough below shows the steps involved:</p>
<h3>Configuring The Website Utility to Produce JavaScript Search Engines</h3>
<p>The Website Utility is configured using a small Windows application. There is a <em>Create
ASP/JavaScript Search Facility</em> checkbox in the Report Settings part of the window
that needs to be ticked in order for the JavaScript search facility to be created:</p>
<p align="left"><img src="images/The-Website-Utility-Screenshot_Smaller.png" width="414"
height="367"
alt="A screenshot of The Website Utility's graphical user interface, showing the options used to create a JavaScript Search Engine for a website"></p>
<p align="left">Note that if your website uses query strings then it is a good idea to
tick the checkbox called <em>Use URL Query Strings</em>. This will ensure that in the
search results pages with different query strings will be treated as different search
results pages. So for example <a href="http://www.mywebsite.com/news.php?ID=12">www.mywebsite.com/news.php?ID=12</a>
will link to a different news article from <a
href="http://www.mywebsite.com/news.php?ID=12">www.mywebsite.com/news.php?ID=21</a> and so
The Website Utility will ensure they are indexed separately.</p>
<h3 align="left">Running The Website Utility</h3>
<p align="left">Clicking on the <em>Run</em> button will start The Website Utility's web
robot. This web robot start at a user specified page in the website and will automatically
crawl all of the pages in that website. The Website Utility extracts all of the words from
these pages, and finds the most relevant pages in the website for each word. Common
English words (e.g. <i>got</i>, <i>like</i>, <i>then</i>) are removed, as are words of one
or two characters. Word rankings depend on many factors, including their distribution
through the entire website and their distribution in the content of a specific page.</p>
<h3 align="left">Incorporating the JavaScript Search Facility into any Website</h3>
<p>The Website Utility creates two JavaScript files that can be used on the website's
search results page:
<ul>
<li>A Search Data JavaScript File contains the rankings for each word and the most relevant
pages for that word.</li>
<li>A Search Code JavaScript File contains the code required to parse the user's search
query and finds the most relevant pages for that query.</li>
</ul>
<p>Pages are sorted in search results according to their ranking for the particular word
or words being searched for. The ranking scale goes from 0 to 99. Rank is higher for pages
that most closely match the search term. In general, searching for words that are common
on the site will produce search results with a lower rank than very specific words that
occur on only one or two pages.</p>
<p>The search facility also requires a search form and a search results page. The search
form can either be put on a separate search page on the site, or the search form could be
added to all of the pages in a website (e.g. in the top right hand corner of the website's
navigation). The HTML code for a typical search form is shown below. The search form needs
a text box called <b>TWUQuery</b>. The form should use the <b>GET</b> method to submit to
the search results page.</p>
<p><font color="#000080"><small><html><br>
<head><br>
<title>JavaScript Search for http://www.brettb.com/</title><br>
</head><br>
<body><br>
<h1>Search http://www.brettb.com/</h1><br>
<form name="frmSearch" method="GET"
action="searchresults.htm"><br>
<br>
Search for: <input type="text" name="TWUQuery"
maxlength="50"><br>
<input type="submit" name="submitbutton"
value="Submit"><br>
</form><br>
<br>
</body><br>
</html></small></font></p>
<p>The search results page needs to include references to the two JavaScript files created
by The Website Utility (TWUSearchData.js and TWUSearchCode.js):</p>
<p><small><font color="#000080"><html><br>
<head><br>
<title>Search Results</title><br>
<script language="JavaScript" src="</font><font color="#FF0000"><strong>TWUSearchData.js</strong></font><font
color="#000080">"></script><br>
<script language="JavaScript" src="</font><font color="#FF0000"><strong>TWUSearchCode.js</strong></font><font
color="#000080">"></script><br>
</head><br>
<br>
<body><br>
<br>
<script language="JavaScript"><br>
var TWU_MaximumSearchResults = 50;<br>
var TWU_DisplayPageTitles = true;<br>
var TWU_DebugMode = false;<br>
</script><br>
<br>
<br>
<h1>Search Results for <script
language="JavaScript">document.write(TWU_OriginalSearchQuery);</script></h1><br>
<br>
<script
language="JavaScript">TWU_DisplaySearchResults(TWU_SearchQuery);</script><br>
<br>
</body><br>
</html></font></small></p>
<p>This page can of course be customised to fit in with the existing design of your
website. If you want to display the search terms the user was searching for, then use this
JavaScript code: <font color="#000080"><script
language="JavaScript">document.write(TWU_OriginalSearchQuery);</script></font>.
To display the search results, place this JavaScript code where you want the search
results to appear: <font color="#000080"><script
language="JavaScript">TWU_DisplaySearchResults(TWU_SearchQuery);</script></font>.</p>
<p>The search results page defines three JavaScript variables that can be used to change
the output:
<ul>
<li><b>TWU_MaximumSearchResults</b> Controls the maximum number of pages that will be listed
in the search results. This stops users getting confused by seeing large numbers of pages
in the search results. </li>
<li><b>TWU_DisplayPageTitles</b> If set to <i>true</i> then the pages displayed in the
search results will show their HTML titles as clickable links. If set to <i>false</i> then
the URL is displayed instead. URLs are also shown if a page does not have a title. If the
website does not contain accurate page titles you might have to turn this feature off. </li>
<li><b>TWU_DebugMode</b> If set to <i>true</i> then debugging information is displayed (you
should not need to use this). </li>
</ul>
<p>If you have a basic knowledge of JavaScript, it is also possible to change the display
of the search results. This will involve editing the Search Code JavaScript file. Here is
an example website search facility that has been customised:
<ul>
<li>A JavaScript search facility created by The Website Utility then customised: <a
href="http://www.winnershtriangle.com/w/TheWebsiteUtility/Reports/BrettbDotCom/searchform.htm"
target="_blank">Search Brettb.com</a>.</li>
</ul>
<h3>Performance Issues</h3>
<p>A client-side JavaScript search engine is obviously going to have a performance
overhead on the client web browser. The size of the TWUSearchData.js JavaScript include
file will depend on the number of pages in the website indexed, and also the amount of
content on each page in the website. It is also dependent upon the nature of the website
itself - websites with pages about similar subjects will tend to require a smaller file
than a website with pages about different subjects.</p>
<p>For this reason, The Website Utility is also able to generate a <a
href="http://www.winnershtriangle.com/w/Products.TheWebsiteUtility.ASPSearchEngine.asp"
target="_blank">search engine that makes use of server-side ASP</a>. This search facility
requires no client-side JavaScript, so it can be used to build search facilities websites
that are not able to use client-side scripting (e.g. for client accessibility
requirements).</p>
<h2><a name="Downloads"></a>Download the Evaluation Version</h2>
<p>The evaluation version of The Website Utility will allow you to determine whether the
JavaScript (and ASP) Search Engines it creates are suitable for use on your own websites:
<ul>
<li><a
title="Download The Website Utility Evaluation Version"
href="redirector.asp?URL=downloads/TheWebsiteUtilityTrial.zip">Download The Website
Utility Evaluation Version</a></strong> (3Mb ZIP file).</li>
</ul>
<h4>Purchase The Website Utility</h4>
<ul>
<li>The full version of The Website Utility is available for online
purchase - visit <a
href="http://www.winnershtriangle.com/w/Products.TheWebsiteUtility.asp" target="_blank">The
Website Utility's website for more information</a>.</li>
</ul>
</td>
<td width="100" valign="top" align="left">
<script type="text/javascript"><!--
google_ad_client = "pub-7044749527879330";
google_ad_width = 120;
google_ad_height = 600;
google_ad_format = "120x600_as";
google_ad_channel ="8099820950";
google_ad_type = "text_image";
google_color_border = "CCCCCC";
google_color_bg = "FFFFFF";
google_color_link = "0000FF";
google_color_url = "DD3333";
google_color_text = "000000";
google_alternate_ad_url = "http://www.brettb.com/NoGoogleAds.asp";
//--></script>
<script type="text/javascript"
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
<iframe marginwidth="0" marginheight="0" src="http://rcm.amazon.com/e/cm?t=brettbcom&o=1&p=6&l=bn1&mode=books&browse=5&=1&fc1=&lc1=FF3333<1=_blank&bg1=&f=ifr" width="130" height="160" border="0" frameborder="0" style="border:none;" scrolling="no"></iframe>
</td>
</tr>
</table>
</center></div>
<div align="center"><center>
<table border="0" cellpadding="2" cellspacing="0" width="802"
style="border: 1px solid rgb(0,0,0)">
<tr>
<td class="TDFooter"> <a href="toc.asp" title="Site Map">Site Map</a></td>
<td class="TDFooter"><p align="right">All content is © 1995 - 2006 Brett Burridge</td>
</tr>
</table>
</center></div>
</body>
</html>