How to block unwanted traffic easily but effectively and without WAF or CloudFlare.

LiteCache

Active Member
#1
How to block unwanted traffic easily but effectively and without WAF or CloudFlare.

I have been working on the topic of how to block unwanted traffic, i.e. bad bots etc., for several years when PHP is not available due to LScache. The same applies if a Wep Application Firewall or CloudFlare is not available or cannot be used. In such a case, only .htaccess remains, although dropping requests via .htaccess is not entirely uncritical because a blocked request also creates a load. The widespread blocking of IP addresses is therefore just as unsuitable for effective protection as the blocking of user agents. The latter is error-prone and IP addresses, or the increasing use of MAC addresses, is as effective as trying to filter water with a sieve.

If the classic blocking criteria have next to no use, how can you still use .htaccess?

If, like me, you regularly analyze the access_log for years, you quickly notice certain patterns. 1 pattern is that a natural and therefore desired request qualifies because a natural user uses a reasonably up-to-date browser. The currentness can be identified by the browser version, which is part of the user agent. If this version information does not exist or if the version corresponds to a browser that has been out of date for several years, then this is not just an indication, but a special feature. Bad bots, as long as they imitate the typical user via the classic user agent string, use browser versions that are older than 5 years in 99.99% of cases. Despite this significant feature, the operators of these bad bots seem to pay no attention to this feature. It is important to emphasize that I have been observing this abnormality for around 10 years. At least for me, this feature is a criterion that I use to block unwanted traffic. But this doesn't necessarily mean that I don't use exceptions.

Another feature are certain request headers, which, with one exception, are sent by every browser or by a natural user with a reasonably up-to-date browser. These request headers are the Sec-Fetch -* headers that all common browsers use. The exception is the Safari browser on Apple devices. The Safari browser apparently does not use these headers in protest against Google, which creates an apparent gap or forces an exception. Otherwise, all Apple users with a Safari browser would be blocked.

Another indication that makes it comparatively easy is a missing user agent string, which is an immediate indicator that it cannot be a natural user.

Last but not least, the request method is available to define filter criteria. In regular operation, a classic CMS such as WordPress only requires a GET or POST request. HEAD or OPTION are definitely not among the HTTP methods that a classic CMS requires or uses, with the main focus being HEAD, since HEAD only returned the response headers but not the HTML body. No natural user's browser uses HEAD as a request method!

Summary:
With the previous descriptions of the alternative filter criteria, it is possible to effectively block more than 95% of unwanted traffic. However, it should be pointed out again that dropping requests via .htaccess cannot protect against a DDoS attack and also creates additional load. In addition, in individual cases it may be necessary to define exceptions to prevent incorrect blocking.

You are welcome to ask questions or make suggestions for improvement.

Code:
# Block by missing Sec-Fetch-* Header and browser version number < 9
# Add additional User-Agents to exclude from blocking
# If LScache plugin for WordPress is used add litespeed_runner to exclude this UA from blocking
RewriteCond %{HTTP_USER_AGENT} "!googlebot|bingbot" [NC]
RewriteCond %{HTTP_USER_AGENT} (Chrome|Firefox|Edg|Opera|Brave)/([0-9]{1,2})\. [NC]
RewriteCond %{HTTP:Sec-Fetch-Dest} !.+
RewriteCond %{HTTP:Sec-Fetch-Mode} !.+
RewriteCond %{HTTP:Sec-Fetch-Site} !.+
RewriteCond %{HTTP:Sec-Fetch-User} !.+
RewriteRule .* - [F,L]
Code:
# Block requests of faked Googlebot
# IP addesses of Googlebot depends on region used, so this rule has to be justified
# if Googlebot's IP addresss doesn't start with 66.249
RewriteCond %{HTTP_USER_AGENT} "Googlebot" [NC]
RewriteCond %{REMOTE_ADDR} !^66\.249\.
RewriteRule .* - [F,L]
Code:
# Block empty User-Agent string
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteRule .* - [F,L]
Code:
# Block HEAD requests
RewriteCond %{REQUEST_METHOD} ^HEAD$
RewriteRule .* - [F,L]
 
Top