Creating a Site Search Engine - Part II
Posted On July 26, 2007 by Priyadarshan Roy filed under Programming
Page.vb

The role of Page object is to store data related to each page of the site. Refer figure 1 and table 1. Check out code 1.
Table 1
The Page class defines the following properties:
| Path | Stores the path of the file. |
| Title | Stores text in the HTML title tag. |
| Keywords | Stores text in the HTML meta keywords tags. |
| Description | Stores text in the HTML meta description tags. |
| Contents | Stores text in the HTML page. |
| Matchcount | Stores matches found in the HTML page. |
Code 1
‘*************************************************** Public Property Size() As Decimal ‘*************************************************** Public Property Path() As String ‘*********************************************** ‘ ‘*********************************************** ‘*********************************************** ‘*********************************************** |
The Page class has two private methods and two public methods. It defines the following methods – CheckFileInfo method and search method.
CheckFileInfo Method:
This is a public method that checks if the title, description and content exists. If the text for title is empty, then it assigns the default value "No Title". Similarly, if the text for description is empty, then it either assigns the contents of the file or default value "There is no description available for this page". See code 2.
Code 2
'************************************************* 'If the page contains no title then Page Description |
Search Method
Search method is a public method that calls SearchPhrase and SearchWords methods, depending on the search criteria. SearchPhrase method searches for phrases while SearchWords searches for all or any word/s. Both these methods call SearchPattern method, which uses regular expressions to search the files. See code 3.
Code 3
'******************************************* 'If the user has choosen to search by phrase '****************************************************** |
The escaped character \b is a special case. In a regular expression, \b denotes a word boundary (between \w and \W characters), except within a [] character class where \b refers to the backspace character. In a replacement pattern, \b always denotes a backspace. We might need to drop the word boundary when we are using encoding other than UTF-8. Check out code 4.
Code 4
'**************************************************** |
UserSearch.vb

Table 2 lists the properties of UserSearch.vb. Refer code 5.
Table 2
| SearchCriteria | The user choice of search is stored and retrieved from here |
| SearchWords | The search words used by the user is stored and retrieved from here |
| TotalFilesSearched | Total Files Searched is read from here |
| TotalFilesFound | Total Files Searched found is read from here |
Code 5
'********************************************************** ' ' Public ReadOnly Property TotalFilesFound() As Integer Public ReadOnly Property PageDataset() As DataSet |
More on SSE in the next part.
Language: ASP.NET
Platform: Windows
