Blocking Bad Bots & Search Crawlers (Firstserv Guide)

Search engines use automated tools called spiders (or bots) to crawl websites and index content. These bots visit your site, read pages, and help your content appear in search results.


Why Block Bots?

While most bots behave responsibly, some:

  • ❌ Ignore your robots.txt rules
  • ❌ Crawl too aggressively (causing high server load)
  • ❌ Waste bandwidth and resources
  • ❌ Potentially slow down or disrupt your website

👉 In these cases, it can be useful to block them.


Recommended Approach (CMS Plugins)

If you are using a CMS such as WordPress, Joomla, or Drupal, it’s usually easier to block bots using plugins/extensions.


✅ Popular Options

  • WordPress: Bot-blocking or security plugins
  • Joomla / Drupal: Similar security or bot-filtering extensions

How These Work

Many tools:

  • Create hidden links on your site
  • Tell bots not to follow them
  • Automatically block bots that ignore these rules

✅ This is the easiest and safest method for most users


Manual Method (Advanced)

You can also block bots manually using your website’s .htaccess file.


Step 1: Identify Problem Bots

To block bots, you first need:

  • ✅ IP address
  • OR ✅ User-Agent string

How to Get This Information

  1. Log in to cPanel
  2. Go to:
    Metrics → Raw Access Logs
    
  3. Download and extract your log file
  4. Open it in a text editor or spreadsheet tool

Example Log Entry

 
 
 
 
 
 
Plain Text
 
 
180.76.5.14 - - [22/Jul/2013:20:07:48 +0100] "GET / HTTP/1.0" 200 1234 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0)"
 
 
  • IP Address: 180.76.5.14
  • User-Agent: Baiduspider/2.0

Method 1: Block by IP Address


Step 1: Open or Create .htaccess

  • File location:
    /public_html/.htaccess
    
  • Use cPanel → File Manager
  • Enable hidden files (dotfiles) if needed

Step 2: Add Blocking Rules

 
 
 
 
 
 
Apache Config
 
 
order allow,deny
allow from all
deny from 180.76.5.14
 
 

Blocking an IP Range

Some bots use multiple IPs.

Example:

 
 
 
 
 
 
Apache Config
 
 
order allow,deny
allow from all
deny from 180.76.5.0/24
deny from 180.76.6.0/24
 
 

Block Access to a Specific Folder

 
 
 
 
 
 
Apache Config
 
 
<Directory "/public_html/protected-folder">
order allow,deny
allow from all
deny from 1.2.3.4
</Directory>
 
 

Method 2: Block by User-Agent

If the bot identifies itself, you can block it by name.


Example: Block Baidu Spider

 
 
 
 
 
 
Apache Config
 
 
BrowserMatchNoCase baiduspider banned
Deny from env=banned
 
 

How It Works

  • Matches requests containing baiduspider
  • Blocks all matching visitors

✅ Works even if IPs change


Important Tips

  • ✅ Always back up your .htaccess before editing
  • ✅ Only block known problematic bots
  • ⚠️ Blocking legitimate search engines (e.g. Googlebot) can harm SEO
  • ⚠️ Incorrect rules can break your site

Alternative: robots.txt

For well-behaved bots, use:

robots.txt

Example:

 
 
 
 
 
 
Plain Text
 
 
User-agent: *
Disallow: /private/
``
 
 

👉 Note: Rogue bots may ignore this file


Summary

  • Bots crawl your site for search engines
  • Some bots may need to be blocked
  • Best methods:
     
    • ✅ Use CMS plugins (recommended)
    • ✅ Block via .htaccess (advanced)
  • Block by:
    • IP address
    • User-Agent

Need Help?

If you're unsure whether a bot should be blocked or need help editing your .htaccess file safely, the Firstserv support team is always happy to assist.

 
 

 

  • 0 Users Found This Useful
Was this answer helpful?

Related Articles

Enable Redis on a Premium PCI-DSS compliant plan

Using Redis on Firstserv Hosting What is Redis? Redis (Remote Dictionary Server) is an...

Shell access to your server using SSH (PuTTY for Windows)

Accessing Your Server via SSH (Firstserv Guide) All Firstserv...

Enable or disable SSH (Secure Shell) access

Enabling SSH Access (Firstserv Guide) Firstserv provides secure,...

Connect to MySQL using MySQL Workbench

Connecting to MySQL via SSH Tunnel (Firstserv Guide – MySQL...

Connect to MySQL from an external host

Connecting to MySQL from an External Host (Firstserv Guide) This guide explains how to...