Cloudflare returning HTTP 403 Forbidden

Why Cloudflare was blocking myself from my own site.

TL;DR: Cloudflare by default blocks all requests without a User Agent string. Python’s urllib module by default does not supply a User Agent.

This website is generated with Hugo on Vercel, and I use Cloudflare as a free DNS and CDN. I use Cloudflare instead of other free DNS options like Namecheap or Gandi.net because in the future I have plans to use Cloudflare’s analytics features to give myself server-sided analytics without needing to install any tracking JavaScript libraries on my pages.

Recently I mistakenly changed my Cloudflare the setting in such a way that all requests from Cloudflare to the origin began to fail. Due to local caching it looked like my change did not cause any problems, so I logged off and did not visit the site again. A few weeks later I realized the site was not working and had to fix it.

Unfortunately this isn’t the first time my site has stopped working without me knowing. I originally set this site up on GitHub Pages, and for no reason in particular it was set up from a private repo. I didn’t know that this requires an upgraded GitHub plan, and the reason it worked for me was that coincidentally I had a free GitHub Pro plan through the GitHub Student Developer Pack. When I graduated from university I eventually lost this plan, and my site would give a GitHub Pages error. I do not know how long it was like this.

I knew I needed an automatic way of detecting when the site stopped working. The AWS Free Tier gives 1 million free Lambda executions per month, so I figured it would be easy and free to create a Lambda function that notifies me on Discord when my site isn’t responsive. I would set up a Cloudwatch Events rule to invoke this function on a regular basis, and set up The Lambda source code looks something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from urllib.request import Request, urlopen

TIMEOUT = int(os.environ.get('TIMEOUT', 10))

URLS = [
    'http://example.com',
]

def check_status():
    # Quick check to see if the network is okay.
    # If not then I don't want to get a notification.
    try:
        with urlopen('https://google.com', timeout=TIMEOUT) as response:
            pass
    except Exception as e:
        raise Exception(f"Didn't get a successful response from Google, is the network OK? {e}")
    
    errors = {}

    for url in URLS:
        req = Request(url, headers={'User-Agent': 'aws-lambda'})
        try:
            with urlopen(req, timeout=TIMEOUT) as response:
                pass
        except Exception as e:
            errors[url] = e
    
    if errors:
        discord_notify(errors)

def handler(event, ctx):
    check_status()

It is quite simple, it simply checks if any of the URLs return a non-200 response code, which will cause urllib to raise an exception. If any exceptions are caught, a message is posted in a Discord server I am in. I use urllib instead of Requests because I wanted to make it as simple as possible to deploy.

When hitting my site with curl I would get successful responses, but when trying my script I would get 403 Forbidden errors. I was able to replicate this on other sites as well such as httpstat.us. The content of the response was always the same: error code: 1010. According to the Cloudflare docs this is caused by a failed Browser Integrity Check (BIC). We can confirm this by checking the Firewall logs in your Cloudflare account.

Firewall event log showing BIC failure

Apparently this feature is enabled by default. I found that surprising because I set my website’s Cloudflare Security mode to Essentially Off1; my simple static website does not benefit much from the protection and they can be quite intrusive to people in certain countries or non-standard browsing habits like using Tor.

There are two easy fixes to this problem.

Fix 1: Set a User-Agent header on your requests

The reason curl works fine but not the code above is because curl sets a User Agent by default and urllib does not. It is easy to do this in Python, simply change this:

1
2
3
4
from urllib.request import urlopen

with urlopen('http://example.com') as response:
  pass

To:

1
2
3
4
5
6
from urllib.request import urlopen, Request

request = Request('http://example.com', headers={'User-Agent': 'foobar'})

with urllib.urlopen(request) as response:
  pass

From my experience you can set the User Agent to literally anything, as it’s the lack of User Agent that usually causes the BIC failure. In fact, this means if you use Requests instead of urllib you shouldn’t run into this problem as Requests sets a User Agent string by default.

Fix 2: Disable Browser Integrity Check (BIC)

In your Cloudflare’s Firewall page, create a new firewall rule. For the request pattern, if your Lambda functions use a specific IP address then you can select “IP Source Address equals …”. For me I am using a VPC-less Lambda so there is no easy way to know the IP address in advance, so I used the pattern “URI contains /” which is always true.

Creating a new firewall rule to disable BIC


  1. If you are wondering how to change your site’s security level to Essentially Off because you do not want your visitors to hit a CAPTCHA wall, I have found the way to change the setting is not intuitive. On your Cloudflare’s “Overview” page, there is a Quick Action labeled “Under Attack Mode.” Turning that on then off again allows you to change your Security Mode to whatever you’d like. ↩︎