Patching a directory traversal attack vulnerability

Date: 2023-07-15

Consider the following component in my personal website responsible for serving static web assets from an OSS bucket to users.

Function assets

subPath may contain zero or more path components. The bucket donaldsebleung-assets is mounted under /mnt/donaldsebleung-assets/ in the container filesystem within the function assets, which appends the request path subPath to the mount point in order to fetch the associated object from the bucket and return its contents to the user who initiated the request.

For reference, the original source code of the assets function is shown below.

import os, re

def handler(environ, start_response):
    path_info = environ['PATH_INFO']
    local_path = os.path.join('/mnt/donaldsebleung-assets/', path_info[1:])
    local_file_exists = os.path.isfile(local_path)
    if not local_file_exists:
        status = '404 Not Found'
        response_headers = [('Content-Type', 'text/plain')]
        start_response(status, response_headers)
        return [status]
    with open(local_path, 'rb') as local_file:
        contents = local_file.read()
        status = '200 OK'
        content_type = 'application/octet-stream'
        is_css = re.compile(r'\.css$').search(local_path)
        if is_css:
            content_type = 'text/css'
        is_png = re.compile(r'\.png$').search(local_path)
        if is_png:
            content_type = 'image/png'
        is_jpeg = re.compile(r'\.jpe?g$').search(local_path)
        if is_jpeg:
            content_type = 'image/jpeg'
        response_headers = [('Content-Type', content_type)]
        start_response(status, response_headers)
        return [contents]

The intended behavior of the function is that it should only use the request path subPath to fetch the associated object from the bucket donaldsebleung-assets and return its contents to the requesting user - the function should never return content from other parts of the container filesystem, e.g. the function source code located at /code/index.py which should be kept hidden at all costs.

Now imagine you are a malicious actor tasked to fetch the source code located at /code/index.py and disclose it to the public. How would you trick the function to return its source code, if at all possible? Hint: it’s in the title ;-)

The directory traversal attack

Notice in line 5 of the source code that the local filesystem path is constructed by appending the request path to the bucket mount point /mnt/donaldsebleung-assets/ using os.path.join without any input validation or sanitation:

local_path = os.path.join('/mnt/donaldsebleung-assets/', path_info[1:])

For most cases, this line of code functions as expected. For example, suppose the request sent to the API gateway is GET /assets/css/bootstrap.min.css. Then the modified request received by the function assets is GET /css/bootstrap.min.css, so path_info becomes /css/bootstrap.min.css. Thus:

    os.path.join('/mnt/donaldsebleung-assets/', path_info[1:])
==> os.path.join('/mnt/donaldsebleung-assets/', 'css/bootstrap.min.css')
==> '/mnt/donaldsebleung-assets/css/bootstrap.min.css'

So the object with key css/bootstrap.min.css (a copy of Bootstrap CSS) is fetched from the bucket donaldsebleung-assets and returned to the requesting user as expected.

However, consider the following request sent to the API gateway:

GET /assets//code/index.py

The modified request sent to our function is then:

GET //code/index.py

So path_info is //code/index.py and we have:

    os.path.join('/mnt/donaldsebleung-assets/', path_info[1:])
==> os.path.join('/mnt/donaldsebleung-assets/', '/code/index.py')
==> '/code/index.py'

This is because according to the os.path.join specification, if any intermediate path starts with a forward slash / on Unix/Linux, all preceding paths are discarded and the filesystem traversal starts from the root directory again. This means our source code has just been leaked - no wonder the vulnerable code is now available in this blog post for everyone to see (-:

Now, if you test this in one of the major browsers such as Chrome or Firefox, you’ll realize that filesystem traversal attempts like this are automatically collapsed, so for example entering a link in the search bar like https://www.donaldsebleung.com/assets//code/index.py and pressing Enter gets treated as https://www.donaldsebleung.com/assets/code/index.py and you get a 404 response as expected. However, you can’t expect a malicious actor to play by the rules ;-)

In fact, it’s possible to craft the exact request as shown above using openssl s_client:

openssl s_client -connect www.donaldsebleung.com:443

And entering the following request once the SSL connection is established:

GET /assets//code/index.py HTTP/1.1
Host: www.donaldsebleung.com

The patch

import os, re

def handler(environ, start_response):
    path_info = environ['PATH_INFO']
    dt_attack = re.compile(r'\.\.|\/\/').search(path_info)
    if dt_attack or path_info == '/homepage.md':
        status = '404 Not Found'
        response_headers = [('Content-Type', 'text/plain')]
        start_response(status, response_headers)
        return [status]
    local_path = os.path.join('/mnt/donaldsebleung-assets/', path_info[1:])
    local_file_exists = os.path.isfile(local_path)
    if not local_file_exists:
        status = '404 Not Found'
        response_headers = [('Content-Type', 'text/plain')]
        start_response(status, response_headers)
        return [status]
    with open(local_path, 'rb') as local_file:
        contents = local_file.read()
        status = '200 OK'
        content_type = 'application/octet-stream'
        is_css = re.compile(r'\.css$').search(local_path)
        if is_css:
            content_type = 'text/css'
        is_png = re.compile(r'\.png$').search(local_path)
        if is_png:
            content_type = 'image/png'
        is_jpeg = re.compile(r'\.jpe?g$').search(local_path)
        if is_jpeg:
            content_type = 'image/jpeg'
        response_headers = [('Content-Type', content_type)]
        start_response(status, response_headers)
        return [contents]

Notice we added 6 lines of code in lines 5-10 to detect attempts at directory traversal attacks and respond to such requests with a 404 Not Found, since we don’t want to inform the attacker that we detected and actively intercepted the attack. The path_info == '/homepage.md' condition is because the homepage for my personal website is also stored in the same donaldsebleung-assets bucket (it doesn’t make sense to create a new bucket just to store the homepage specifically) but shouldn’t be returned to the user as-is through our assets function - instead, it should be converted to a pretty HTML homepage through another function homepage which is then displayed to the user.

Architecture with Alibaba Cloud Function Compute

In fact, the original issue of my assets function unexpectedly returning the raw homepage.md file with the request path /assets/homepage.md was what lead me to discovering this filesystem traversal attack vulnerability which has since been patched - who said I deliberately made my code vulnerable just for the sake of authoring this blog post? ;-)

The moral

Never trust user input!

Subscribe: RSS Atom [Valid RSS] [Valid Atom 1.0]

Return to homepage