Website search engine

        English / Russian

System requirements
Query language
  Installation   Tuning   Troubleshooting


  1. Open the compressed archive you downloaded. Inside you will find several files.

         index.pl     - indexing script
         spider.pl    - script for indexing via HTTP
         search.pl    - searching script
         stat.pl      - queries statistic analysis
         config.pl    - file with all configurable parameters
         template.htm - template file
         searchbox    - sample search box
         readme.txt and readme.rus
  2. Put index.pl, search.pl, config.pl, template.htm and stat.pl files in your cgi-bin directory.

  3. Create directories "db" for index files and "log" for query logs.

  4. Set permissions of all files/dirs to world-readable world-executable (755 for script files and 777 for directories "db" and "log").

  5. The file searchbox contain sample search form. Edit it and put anywhere in your html files.

      Before you start, once again check main sources of errors in CGI scripts.

  1. First line of every script should begin with path to Perl in your server. Usualy it is #!/usr/bin/perl. On Windows system you should write something like #!C:\PERL\5.00502\bin\MSWin32-x86-object\perl.exe, though simple #!perl should work.

  2. Unix-systems have different format of text files, than Windows. The difference is in "end of line" symbols. Therefore, you should convert your scripts in Unix format before uploading (it could be done in many text editors, like UltraEdit) or use ASCII mode in your FTP client during uploading.

  3. And check once again permission settings for all scripts (you can set them in almost all FTP clients, even if you have no access to shell). Please note, that your provider may require to set for scripts permissions different from listed above.


      To start indexing, you should run script "index.pl". You may do it using UnixShell, if your provider allows it, or run it as usual CGI script (just write in your browser http://www.server.com/cgi-bin/index.pl). During the indexing script will create several files with information about your site (0_hash, 0_hashwords, 0_sitewords, 0_finfo, 0_word_ind) and store them in "db" directory.

      Another way to index your site is via HTTP protocol. Run "spider.pl" and it will crawl through your files and parse out all the links (spider.pl requires LWP module). It is useful for indexing dinamic sites (such as webboards). However, this script is extremely simple and can't be used for web indexing. Another restriction: you can't stop indexing process and then resume it from this point. You need to index whole site at once.

      Indexing process requires a lot of system resources. Your webhosting provider can be very unhappy, if you will run it too often. Probably, it is better to index local copy of your site. Then just copy created database files to the server (please use "BIN" mode). Amount of RAM, required for indexing, depends of the site size. You will not have problems with 10-20 Mb, but if you plan to index 500 Mb of text, I would recommend to buy at least 512 Mb RAM.

      Please note, that most webservers will not allow to script to work too long time. After 30-60 seconds webserver will kill your script if it not finish indexing at that time. Therefore, you will not be able to index more than several megabytes running "index.pl" as CGI script. In order to index large sites you have to run script via UnixShell or to index local copy of your site.

Introduction | Installation | System requirements | Performance | Query language | FAQ | ToDo | Forum

Home: http://www.alooks.ru/ Sergej Tarasov, © 2010.