| Once a website grows beyond a couple of dozen pages then it can sometimes
be difficult to create a site navigation scheme that allows users to quickly find exactly
what they're looking for. One way to improve site navigation is to add a search facility
to the website. Unfortunately, building a website search facility for your website can
be a time consuming exercise. Although ASP.NET supports the searching of files using the
Windows Indexing Service, writing code to query can Indexing Service can be quite complex.
Furthermore, not all web hosting companies support the use of Indexing Service, so this
may not be an option for your website.
This example shows how to build a website search engine for ASP.NET. The code samples
are in C#, but could be easily adapted for the VB.NET programming language.
Building Your Own Search Engine
While it is possible to build a file based search facility using C#, the problem with
this approach is that a significant amount of effort would be required to build the file
content indexing routine. A database would also be required to store the list of words
within the website. Furthermore, if the file system is indexed rather than the actual
website then it would be possible for undesirable content (e.g. include files, global.asax
files, restricted access documents) to be indexed and appear in search results.
Building a word index for a website by using a web crawler is an obvious solution to
these problems. The web crawler sees the same website content as an end user, so there is
no problem with undesired content appearing in search results. Web crawlers can also be
prevented from indexing certain parts of websites by making use of robots.txt files and
the robots meta tag. Furthermore, a web crawler is not dependent on the underlying
technology used on a website, so can crawl websites regardless of whether they use PHP,
ASP, ASP.NET or a combination of all three.
Building a web crawler is not a trivial exercise, so this code sample relies on our web
crawling product - The Website Utility. This
product crawls any website and automatically builds the .NET class necessary to allow the
website to be searched for text strings. Note that version 2.0 of the Microsoft .NET
Framework or above is required.
The .NET search engine created by The Website Utility is contained within the partial
class TWUSearch of the namespace com.WinnershTriangle.TheWebsiteUtility.
The partial class is contained in two files: TWUSearchCode.cs and TWUSearchData.cs.
Both of these files should be copied to the ASP.NET web application's App_Code
folder - the TWUSearch class is then accessible to other code files in the web
application.
The TWUSearch partial class has a number of methods and properties, which are
described below:
Methods
- SetQuery(query as string) (returns void) Displays a message that no matching
results were found.
- GetSearchResults() (returns DataSet): Retrieves search results.
- GetErrorMessage() (returns string): Retrieves a description of the error.
Properties
- MaximumSearchResults (int): Gets/sets the number of matching documents.
- ReturnPageTitles (bool): Optionally turn offs the return of page titles in the
DataSet.
- ReturnPageDescriptions (bool): Optionally turn offs the return of page
descriptions in the DataSet.
- HasErrors (bool): Returns true if an error occurred (use the GetErrorMessage()
method to retrieve the error message).
- DebugMessage (string): Returns debugging messages (for troubleshooting only).
The C# partial class file TWUSearchData.cs contains the data structures needed
for the search class. If you re-crawl a website to update the search facility, this is the
only file that will have changed, so updating the search facility may be achieved by
overwriting the website's previous copy of this file.
Using the ASP.NET Search Object from C#
The source code below shows how to instantiate the .NET website search class and
retrieve a DataSet of search results matching the search query. In this example, the query
is set from the Text property of a textbox called TWUSearch, and the
search results are databound to the GridView1 GridView control.
The results are sorted in descending rank by making use of the DataView's Sort
method.
/// <summary>
/// Show the
search results after the search button is invoked
/// </summary>
/// <param
name="sender"></param>
/// <param
name="e"></param>
protected void
submitbutton_Click(object sender, EventArgs
e)
{
//Initialise the search class
com.WinnershTriangle.TheWebsiteUtility.TWUSearch
SearchObject = new
com.WinnershTriangle.TheWebsiteUtility.TWUSearch();
//Set search query from the TextBox control
SearchObject.SetQuery(TWUQuery.Text);
//Initialise a DataSet for the search results
DataSet SearchData = new DataSet();
//Optionally change the maximum number of search results
(default is 50)
SearchObject.MaximumSearchResults = 25;
//Optionally turn off the return of page titles (default
is to return titles)
SearchObject.ReturnPageTitles = true;
//Optionally turn off the return of page descriptions
(default is to return descriptions)
SearchObject.ReturnPageDescriptions = true;
//Retrieve the search results
SearchData = SearchObject.GetSearchResults();
//Note that if the search facility encounters an error you
can call
//the GetErrorMessage() method to retrieve a description of the error.
string SearchError = SearchObject.GetErrorMessage();
//Check to see if any matching pages were found
if
(SearchObject.NumberOfMatchingPages == 0)
{
//Did an error occur?
if (SearchObject.HasErrors == false)
{
//User probably searched for a
term that does not exist
LabelSearchResults.Text = "No
matching pages were found for this query. Please try another search.";
GridView1.Visible = false;
}
}
//Did an error occur?
if
(SearchObject.HasErrors)
{
LabelSearchResults.Text = "This search
failed due to: " + SearchError + ". Please try
another search.";
GridView1.Visible = false;
}
//No errors were encountered and there were matching pages
in the search
//results, so display the search results GridView
if (SearchObject.HasErrors == false
&& SearchObject.NumberOfMatchingPages > 0)
{
//Create a DataView from the search results data
DataView SearchDataView = new
DataView(SearchData.Tables[0]);
//Sort the search results by rank
SearchDataView.Sort = "PageRank DESC";
GridView1.DataSource = SearchDataView;
GridView1.Visible = true;
//Show the number of search results
LabelSearchResults.Text = SearchObject.NumberOfMatchingPages.ToString() + " matching page(s) were found.";
//Bind the search results data to the GridView
GridView1.DataBind();
}
}
How it Works
The Website Utility extracts all of the words from the website and finds the most
relevant pages in the website for each word. Common English words (e.g. got, like,
then) are removed, as are words of one or two characters in length. Word rankings
depend on many factors, including their distribution through the entire website and their
distribution in the content of a specific page.
Pages are sorted in search results according to their ranking for the particular word
or words being searched for. The ranking scale goes from 0 to 99. Rank is higher for pages
that most closely match the search term. In general, searching for words that are common
on the site will produce search results with a lower rank than very specific words that
occur on only one or two pages.
Important Note: For very large websites or more sophisticated searching, you may
need to consider using a specialised server-based search solution such using ASP.NET to
search Microsoft's Indexing Service. The Indexing
Service Companion can be used to allow Index Server to search remote websites (and
also to search more than one website simultaneously). |