Winnersh Triangle Web Solutions Limited

Timesaving tools for software developers

HOME | NEWS | PRODUCTS | DOWNLOADS | SPECIAL OFFERS | ORDERING | UPGRADES | CONTACT 
 
SEARCH WEBSITE
Search Website

PRODUCTS
The Website Utility
.NET Documentation Tool
ASP Documentation Tool
ASP.NET Documentation Tool
VB 6.0 Documentation Tool
PHP Documentation Tool
SQL Documentation Tool
JavaScript Banner Ad Rotator
Indexing Service Companion
Product Ordering
Special Offers

SERVICES
Articles & Whitepapers
Documentation Portal
Client Success Stories
Sell Our Products
Our Blog

Building an ASP.NET website search engine


Once a website grows beyond a couple of dozen pages then it can sometimes be difficult to create a site navigation scheme that allows users to quickly find exactly what they're looking for. One way to improve site navigation is to add a search facility to the website.

Unfortunately, building a website search facility for your website can be a time consuming exercise. Although ASP.NET supports the searching of files using the Windows Indexing Service, writing code to query can Indexing Service can be quite complex. Furthermore, not all web hosting companies support the use of Indexing Service, so this may not be an option for your website.

This example shows how to build a website search engine for ASP.NET. The code samples are in C#, but could be easily adapted for the VB.NET programming language.

Building Your Own Search Engine

While it is possible to build a file based search facility using C#, the problem with this approach is that a significant amount of effort would be required to build the file content indexing routine. A database would also be required to store the list of words within the website. Furthermore, if the file system is indexed rather than the actual website then it would be possible for undesirable content (e.g. include files, global.asax files, restricted access documents) to be indexed and appear in search results.

Building a word index for a website by using a web crawler is an obvious solution to these problems. The web crawler sees the same website content as an end user, so there is no problem with undesired content appearing in search results. Web crawlers can also be prevented from indexing certain parts of websites by making use of robots.txt files and the robots meta tag. Furthermore, a web crawler is not dependent on the underlying technology used on a website, so can crawl websites regardless of whether they use PHP, ASP, ASP.NET or a combination of all three.

Building a web crawler is not a trivial exercise, so this code sample relies on our web crawling product - The Website Utility. This product crawls any website and automatically builds the .NET class necessary to allow the website to be searched for text strings. Note that version 2.0 of the Microsoft .NET Framework or above is required.

The .NET search engine created by The Website Utility is contained within the partial class TWUSearch of the namespace com.WinnershTriangle.TheWebsiteUtility. The partial class is contained in two files: TWUSearchCode.cs and TWUSearchData.cs. Both of these files should be copied to the ASP.NET web application's App_Code folder - the TWUSearch class is then accessible to other code files in the web application.

The TWUSearch partial class has a number of methods and properties, which are described below:

Methods

  • SetQuery(query as string) (returns void) Displays a message that no matching results were found.
  • GetSearchResults() (returns DataSet): Retrieves search results.
  • GetErrorMessage() (returns string): Retrieves a description of the error.

Properties

  • MaximumSearchResults (int): Gets/sets the number of matching documents.
  • ReturnPageTitles (bool): Optionally turn offs the return of page titles in the DataSet.
  • ReturnPageDescriptions (bool): Optionally turn offs the return of page descriptions in the DataSet.
  • HasErrors (bool): Returns true if an error occurred (use the GetErrorMessage() method to retrieve the error message).
  • DebugMessage (string): Returns debugging messages (for troubleshooting only).

The C# partial class file TWUSearchData.cs contains the data structures needed for the search class. If you re-crawl a website to update the search facility, this is the only file that will have changed, so updating the search facility may be achieved by overwriting the website's previous copy of this file.

Using the ASP.NET Search Object from C#

The source code below shows how to instantiate the .NET website search class and retrieve a DataSet of search results matching the search query. In this example, the query is set from the Text property of a textbox called TWUSearch, and the search results are databound to the GridView1 GridView control.

The results are sorted in descending rank by making use of the DataView's Sort method.

/// <summary>
/// Show the search results after the search button is invoked
/// </summary>
///
<param name="sender"></param>
///
<param name="e"></param>
protected void submitbutton_Click(object sender, EventArgs e)
{
  //Initialise the search class
  com.WinnershTriangle.TheWebsiteUtility.TWUSearch SearchObject = new com.WinnershTriangle.TheWebsiteUtility.TWUSearch();

  //Set search query from the TextBox control
  SearchObject.SetQuery(TWUQuery.Text);

  //Initialise a DataSet for the search results
  DataSet SearchData = new DataSet();

  //Optionally change the maximum number of search results (default is 50)
  SearchObject.MaximumSearchResults = 25;

  //Optionally turn off the return of page titles (default is to return titles)
  SearchObject.ReturnPageTitles = true;

  //Optionally turn off the return of page descriptions (default is to return descriptions)
  SearchObject.ReturnPageDescriptions = true;

  //Retrieve the search results
  SearchData = SearchObject.GetSearchResults();

  //Note that if the search facility encounters an error you can call
  //the GetErrorMessage() method to retrieve a description of the error.

  string SearchError = SearchObject.GetErrorMessage();

  //Check to see if any matching pages were found
  if (SearchObject.NumberOfMatchingPages == 0)
  {

    //Did an error occur?
    if (SearchObject.HasErrors == false)
    {

      //User probably searched for a term that does not exist
      LabelSearchResults.Text = "No matching pages were found for this query. Please try another search.";
      GridView1.Visible = false;
    }

  }

  //Did an error occur?
  if (SearchObject.HasErrors)
  {

    LabelSearchResults.Text = "This search failed due to: " + SearchError + ". Please try another search.";
    GridView1.Visible = false;
  }

  //No errors were encountered and there were matching pages in the search
  //results, so display the search results GridView

  if (SearchObject.HasErrors == false && SearchObject.NumberOfMatchingPages > 0)
  {

  //Create a DataView from the search results data
  DataView SearchDataView = new DataView(SearchData.Tables[0]);

  //Sort the search results by rank
  SearchDataView.Sort = "PageRank DESC";

  GridView1.DataSource = SearchDataView;
  GridView1.Visible = true;

  //Show the number of search results
  LabelSearchResults.Text = SearchObject.NumberOfMatchingPages.ToString() + " matching page(s) were found.";

  //Bind the search results data to the GridView
  GridView1.DataBind();

  }

}

How it Works

The Website Utility extracts all of the words from the website and finds the most relevant pages in the website for each word. Common English words (e.g. got, like, then) are removed, as are words of one or two characters in length. Word rankings depend on many factors, including their distribution through the entire website and their distribution in the content of a specific page.

Pages are sorted in search results according to their ranking for the particular word or words being searched for. The ranking scale goes from 0 to 99. Rank is higher for pages that most closely match the search term. In general, searching for words that are common on the site will produce search results with a lower rank than very specific words that occur on only one or two pages.

Important Note: For very large websites or more sophisticated searching, you may need to consider using a specialised server-based search solution such using ASP.NET to search Microsoft's Indexing Service. The Indexing Service Companion can be used to allow Index Server to search remote websites (and also to search more than one website simultaneously).


© Copyright 2002 - 2009 Winnersh Triangle Web Solutions Limited. Registered company number: 4493816.       Sales Policy | Site Map  
documentor for C# search engine for asp NDoc alternative