Cloudflare returning HTTP 403 Forbidden
Why Cloudflare was blocking myself from my own site.
TL;DR: Cloudflare by default blocks all requests without a User Agent string. Python’s urllib module by default does not supply a User Agent.
Recently I mistakenly changed my Cloudflare the setting in such a way that all requests from Cloudflare to the origin began to fail. Due to local caching it looked like my change did not cause any problems, so I logged off and did not visit the site again. A few weeks later I realized the site was not working and had to fix it.
Unfortunately this isn’t the first time my site has stopped working without me knowing. I originally set this site up on GitHub Pages, and for no reason in particular it was set up from a private repo. I didn’t know that this requires an upgraded GitHub plan, and the reason it worked for me was that coincidentally I had a free GitHub Pro plan through the GitHub Student Developer Pack. When I graduated from university I eventually lost this plan, and my site would give a GitHub Pages error. I do not know how long it was like this.
I knew I needed an automatic way of detecting when the site stopped working. The AWS Free Tier gives 1 million free Lambda executions per month, so I figured it would be easy and free to create a Lambda function that notifies me on Discord when my site isn’t responsive. I would set up a Cloudwatch Events rule to invoke this function on a regular basis, and set up The Lambda source code looks something like this:
It is quite simple, it simply checks if any of the URLs return a non-200 response code, which will cause
urllib to raise an exception. If any exceptions are caught, a message is posted in a Discord server I am in. I use
urllib instead of Requests because I wanted to make it as simple as possible to deploy.
When hitting my site with
curl I would get successful responses, but when trying my script I would get 403 Forbidden errors. I was able to replicate this on other sites as well such as httpstat.us. The content of the response was always the same:
error code: 1010. According to the Cloudflare docs this is caused by a failed Browser Integrity Check (BIC). We can confirm this by checking the Firewall logs in your Cloudflare account.
Apparently this feature is enabled by default. I found that surprising because I set my website’s Cloudflare Security mode to
Essentially Off1; my simple static website does not benefit much from the protection and they can be quite intrusive to people in certain countries or non-standard browsing habits like using Tor.
There are two easy fixes to this problem.
Fix 1: Set a User-Agent header on your requests
curl works fine but not the code above is because
curl sets a User Agent by default and
urllib does not. It is easy to do this in Python, simply change this:
From my experience you can set the User Agent to literally anything, as it’s the lack of User Agent that usually causes the BIC failure. In fact, this means if you use Requests instead of
urllib you shouldn’t run into this problem as Requests sets a User Agent string by default.
Fix 2: Disable Browser Integrity Check (BIC)
In your Cloudflare’s Firewall page, create a new firewall rule. For the request pattern, if your Lambda functions use a specific IP address then you can select “IP Source Address equals …”. For me I am using a VPC-less Lambda so there is no easy way to know the IP address in advance, so I used the pattern “URI contains
/” which is always true.
If you are wondering how to change your site’s security level to
Essentially Offbecause you do not want your visitors to hit a CAPTCHA wall, I have found the way to change the setting is not intuitive. On your Cloudflare’s “Overview” page, there is a Quick Action labeled “Under Attack Mode.” Turning that on then off again allows you to change your Security Mode to whatever you’d like. ↩︎