Creating a Site Search Engine - Part II

Page.vb



The role of Page object is to store data related to each page of the site. Refer figure 1 and table 1. Check out code 1.

Table 1

 

The Page class defines the following properties:

Path

Stores the path of the file.

Title

Stores text in the HTML title tag.

Keywords

Stores text in the HTML meta keywords tags.

Description

Stores text in the HTML meta description tags.

Contents

Stores text in the HTML page.

Matchcount

Stores matches found in the HTML page.

 

Code 1

‘***************************************************
   
    ‘ Size Property
   
    ‘ Assign and retrieve size of the file
   
    ‘***********************************************
    Public Property Size() As Decimal
      Get
        Return m_size
      End Get
      Set(ByVal Value As Decimal)
        m_size = Value
      End Set
    End Property


    ‘***************************************************
   
    ‘ Path Property
   
    ‘ Assign and retrieve path of the file
   
    ‘***********************************************
    Public Property Path() As String
      Get
        Return m_path
      End Get
      Set(ByVal Value As String)
        m_path = Value
      End Set
    End Property
    ‘***********************************************
   
    ‘ Title Property
   
    ‘Assign and retrieve title of the file
   
    ‘***********************************************

   
Public Property Title() As String
      Get
        Return m_title
      End Get
      Set(ByVal Value As String)
        m_title = Value
      End Set
    End Property
    ‘***********************************************
    
    ‘ Keywords Property
   
    ‘ Assign and retrieve Keywords
    ‘ (meta tags) of the file
   
    ‘***********************************************

   
Public Property Keywords() As String
      Get
        Return m_keywords
      End Get
      Set(ByVal Value As String)
        m_keywords = Value
      End Set
    End Property




    ‘***********************************************
   
    ‘ Description Property
   
    ‘ Assign and retrieve description
    ‘ (meta tags) of the file
   
    ‘***********************************************
    Public Property Description() As String
      Get
        Return m_description
      End Get
      Set(ByVal Value As String)
        m_description = Value
      End Set
    End Property
    ‘***********************************************
   
    ‘ Contents Property
   
    ‘ Assign and retrieve contents of the file
   
    ‘***********************************************
    Public Property Contents() As String
      Get
        Return m_contents
      End Get
      Set(ByVal Value As String)
        m_contents = Value
      End Set
    End Property


    ‘***********************************************
   
    ‘ Contents Property
   
    ‘ Assign and retrieve MatchCount of the file
   
    ‘***********************************************
    Public Property MatchCount() As Integer
      Get
        Return m_matchcount
      End Get
      Set(ByVal Value As Integer)
        m_matchcount = Value
      End Set
    End Property


The Page class has two private methods and two public methods. It defines the following methods – CheckFileInfo method and search method.

 

CheckFileInfo Method:

This is a public method that checks if the title, description and content exists. If the text for title is empty, then it assigns the default value "No Title". Similarly, if the text for description is empty, then it either assigns the contents of the file or default value "There is no description available for this page". See code 2.

 

Code 2

'*************************************************
   '
  ' CheckFileInfo method
   '
   ' Subroutine to the check the file contains
   ' title and decription
   '
   '*************************************************
   Public Sub CheckFileInfo()
     'If the page contains no title then Page Title
     ' variable the appropriate message to display
    If IsNothing(m_title) Or m_title.Trim().Equals("") Then
      m_title = "No Title"
     End If

     'If the page contains no title then Page Description
     'variable the appropriate message to display
     If IsNothing(m_description) Or _
       m_description.Trim().Equals("") Then
      If IsNothing(m_contents) Or _
       m_contents.Trim().Equals("") Then
        m_description = _
         "There is no description available for this page"
      Else
        If m_contents.Length > 200 Then
         m_description = m_contents.Substring(0, 200)
        Else
         m_description = m_contents
        End If
      End If
     End If
End Sub


Search Method

Search method is a public method that calls SearchPhrase and SearchWords methods, depending on the search criteria. SearchPhrase method searches for phrases while SearchWords searches for all or any word/s. Both these methods call SearchPattern method, which uses regular expressions to search the files. See code 3.

 

Code 3

'*******************************************
  
'
    ' Search method
    '
    ' Subroutine to the search file
    '
    '*******************************************
    Public Sub Search(ByVal strSearchWords As String, _
          ByVal SrchCriteria As SearchCriteria) 
      'If the user has choosen to search by phrase 
      If SrchCriteria = SearchCriteria.Phrase Then
        SearchPhrase(strSearchWords)
        'Else the search is either by all or any words
      Else
        SearchWords(strSearchWords, SrchCriteria)
      End If
    End Sub
    '******************************************************
    '
    ' SearchPhrase method
    '
    ' Subroutine to the search file
    '
    '******************************************************

   
Private Sub SearchPhrase(ByVal strSearchWords As String)

     
Dim mtches As MatchCollection
       mtches = SearchPattern(strSearchWords)
       'Check to see if the phrase has been found

     
If mtches.Count > 0 Then
        'Get the number of times the phrase is matched
        m_matchcount = mtches.Count
     End If
   End Sub

   
'**************************************************
    '
    ' SearchWords method
    '
    ' Subroutine to the search file
    '
    '**************************************************
    Private Sub SearchWords(ByVal strSearchWords As String, _
       ByVal SrchCriteria As SearchCriteria)
      Dim intSearchLoopCounter As Integer
      Dim sarySearchWord As String()
        'Array to hold the words to be searched for
      Dim mtches As MatchCollection
      'Split each word to be searched up and place in an array
      sarySearchWord = Split(Trim(strSearchWords), " ")
      'Loop round to search for each word to be searched
      For intSearchLoopCounter = 0 To UBound(sarySearchWord)
        'Set the pattern to search for
        mtches = SearchPattern(sarySearchWord(_
        intSearchLoopCounter))
 
        If SrchCriteria = SearchCriteria.AnyWords Then
          m_matchcount = m_matchcount + mtches.Count
        ElseIf SrchCriteria = SearchCriteria.AllWords Then
          'Check to see if any of the words have been found
          If mtches.Count > 0 Then
            'Get the number of times the search word is matched
            If m_matchcount = 0 Or (m_matchcount > 0 _
                And m_matchcount > mtches.Count) Then
              m_matchcount = mtches.Count
            End If
          Else
            'If the search word is not found then set the
            'search found variable back to false as one of
            
'the words has not been found
            m_matchcount = 0
       Exit Sub
           End If
        End If
      Next


    End Sub


The escaped character \b is a special case. In a regular expression, \b denotes a word boundary (between \w and \W characters), except within a [] character class where \b refers to the backspace character. In a replacement pattern, \b always denotes a backspace. We might need to drop the word boundary when we are using encoding other than UTF-8. Check out code 4.

 

Code 4

'****************************************************
    '
    ' SearchPattern method
    '
    ' Subroutine to the search file
    '
    '****************************************************
    Private Function SearchPattern( _
       ByVal strSearchWord As String) As MatchCollection
      Dim regexp As Regex
      Dim strPattern
       'Set the pattern to search for
      regexp = New Regex("", RegexOptions.IgnoreCase)
       'Search the file for the phrase
      If Searchs.Site.Encoding.Equals("utf-8") Then
        strPattern = "\b{0}\b"
      Else
        strPattern = "{0}"
      End If
      Return regexp.Matches(m_contents, String.Format(strPattern, _
       strSearchWord), RegexOptions.IgnoreCase)
    End Function

UserSearch.vb



Table 2 lists the properties of UserSearch.vb. Refer code 5.

 

Table 2

 

SearchCriteria

The user choice of search is stored and retrieved from here

SearchWords

The search words used by the user is stored and retrieved from here

TotalFilesSearched

Total Files Searched is read from here

TotalFilesFound

Total Files Searched found is read from here

 

Code 5

    '**********************************************************


    '
    ' SearchCriteria Property
    '
    ' Assign and retrieve SearchCriteria of the site
    '
    '**********************************************************

   
Public Property SearchCriteria() As Searchs.SearchCriteria
      Get
        Return m_searchCriteria
      End Get
      Set(ByVal Value As Searchs.SearchCriteria)
        m_searchCriteria = Value


      End Set
    End Property

   
'**********************************************************
    '
    ' SearchWords Property
    '
   'Assign and retrieve SearchWords of the site
    '
    '**********************************************************

   
Public Property SearchWords() As String
      Get
        Return m_searchWords
      End Get
      Set(ByVal Value As String)


        m_searchWords = Value
      End Set
    End Property



   
'**********************************************************
    '
    ' TotalFilesSearched Property
    '
    ' Retrieve TotalFilesSearched of the site
    '
    '**********************************************************

   
Public ReadOnly Property TotalFilesSearched() As Integer


      Get
        Return m_totalFilesSearched
      End Get
    End Property

   
'**********************************************************
    '
    ' TotalFilesFound Property
    '
    ' Retrieve TotalFilesFound of the site
    '
    '**********************************************************
    Public ReadOnly Property TotalFilesFound() As Integer
      Get
        Return m_totalFilesFound
      End Get
    End Property

   
'**********************************************************
    '
    ' PageDataset Shared Property
    '
    ' Retrieve data of the entire site of the site
    '
    '**********************************************************
    Public ReadOnly Property PageDataset() As DataSet
      Get
        Return m_dstPages
      End Get
    End Property

More on SSE in the next part.

Language: ASP.NET 
Platform: Windows




Added on July 26, 2007 Comment

Comments

Post a comment

Your name:

Comment: