Blocking Bad Bots & Search Crawlers (Firstserv Guide)
Search engines use automated tools called spiders (or bots) to crawl websites and index content. These bots visit your site, read pages, and help your content appear in search results.
Why Block Bots?
While most bots behave responsibly, some:
- ❌ Ignore your robots.txt rules
- ❌ Crawl too aggressively (causing high server load)
- ❌ Waste bandwidth and resources
- ❌ Potentially slow down or disrupt your website
👉 In these cases, it can be useful to block them.
Recommended Approach (CMS Plugins)
If you are using a CMS such as WordPress, Joomla, or Drupal, it’s usually easier to block bots using plugins/extensions.
✅ Popular Options
- WordPress: Bot-blocking or security plugins
- Joomla / Drupal: Similar security or bot-filtering extensions
How These Work
Many tools:
- Create hidden links on your site
- Tell bots not to follow them
- Automatically block bots that ignore these rules
✅ This is the easiest and safest method for most users
Manual Method (Advanced)
You can also block bots manually using your website’s .htaccess file.
Step 1: Identify Problem Bots
To block bots, you first need:
- ✅ IP address
- OR ✅ User-Agent string
How to Get This Information
- Log in to cPanel
- Go to:
Metrics → Raw Access Logs
- Download and extract your log file
- Open it in a text editor or spreadsheet tool
Example Log Entry
- IP Address: 180.76.5.14
- User-Agent: Baiduspider/2.0
Method 1: Block by IP Address
Step 1: Open or Create .htaccess
- File location:
/public_html/.htaccess
- Use cPanel → File Manager
- Enable hidden files (dotfiles) if needed
Step 2: Add Blocking Rules
Blocking an IP Range
Some bots use multiple IPs.
Example:
Block Access to a Specific Folder
Method 2: Block by User-Agent
If the bot identifies itself, you can block it by name.
Example: Block Baidu Spider
How It Works
- Matches requests containing baiduspider
- Blocks all matching visitors
✅ Works even if IPs change
Important Tips
- ✅ Always back up your .htaccess before editing
- ✅ Only block known problematic bots
- ⚠️ Blocking legitimate search engines (e.g. Googlebot) can harm SEO
- ⚠️ Incorrect rules can break your site
Alternative: robots.txt
For well-behaved bots, use:
robots.txt
Example:
👉 Note: Rogue bots may ignore this file
Summary
- Bots crawl your site for search engines
- Some bots may need to be blocked
- Best methods:
- ✅ Use CMS plugins (recommended)
- ✅ Block via .htaccess (advanced)
- Block by:
- IP address
- User-Agent
Need Help?
If you're unsure whether a bot should be blocked or need help editing your .htaccess file safely, the Firstserv support team is always happy to assist.
