Data validation
All code examples are available here.
Failures or omissions in data validation can lead to data corruption or a security vulnerability. Data validation checks that data are valid, sensible, reasonable, and secure before they are processed. – Owasp
TLDR
Validate all your input, check the boundaries and the expected types!
Vulnerable code
Here is an example of python web-service that doesn’t check the type of its parameters
@app.route('/category/', methods="POST")
def get_articles_by_category():
category = request.get_json()['category']
if category == 'drafts':
return []
return pymongo.MongoClient().test.articles.find({'category': category})
This endpoint forbid access to the drafts
category but forget to validate the json request.
Vulnerability explanation
When we develop code, we often limit our reflexion to the happy path. What are the expected inputs and expected outputs. We test them with unittests and move on to the next piece of code.
But even the simplest piece of code could crash when playing with the inputs:
from decimal import Decimal
def decimal_division(number_1, number_2):
return Decimal(number_1) / Decimal(number_2)
It takes two numbers, convert them into Decimal and return the division.
It works great with int and floats:
>>> decimal_division(1, 0.5)
Decimal('2')
There is the well-known edge-case:
>>> decimal_division(1, 0)
...
raise error(explanation)
decimal.DivisionByZero: x / 0
That is easy to check:
from decimal import Decimal
def decimal_division(number_1, number_2):
if number_2 == 0:
raise ValueError(number_2)
return Decimal(number_1) / Decimal(number_2)
What about with non-numbers?
>>> decimal_division(1, None)
...
raise TypeError("Cannot convert %r to Decimal" % value)
TypeError: Cannot convert None to Decimal
We should check for inputs types:
from decimal import Decimal
def decimal_division(number_1, number_2):
if not isinstance(number_1, (int, float)):
raise ValueError(number_1)
if number_2 == 0 or not isinstance(number_2, (int, float)):
raise ValueError(number_2)
return Decimal(number_1) / Decimal(number_2)
But it’s not sufficient, we can also trick the boundaries of the arguments we send:
>>> decimal_division(float('inf'), float('-inf'))
...
raise error(explanation)
decimal.InvalidOperation: (+-)INF/(+-)INF
We should also limit the values we accept.
from decimal import Decimal
BLACKLIST = [Decimal('inf'), Decimal('-inf')]
def decimal_division(number_1, number_2):
if not isinstance(number_1, (int, float)) or number_1 in BLACKLIST:
raise ValueError(number_1)
if number_2 == 0 or not isinstance(number_2, (int, float)) or number_2 in BLACKLIST:
raise ValueError(number_2)
return Decimal(number_1) / Decimal(number_2)
Not vulnerable code
There is severals Python libraries that can helps you validate your inputs:
- Django provides validation through Forms and Models, be sure to use them.
- Cerberus is a clean and nice libraries which is input and validation format agnostic, give it two dicts, it will raise if it fails.
Example of attack
We have provided a sample python web application coded in Flask that accept a post category in a JSON Body in a vulnerable way. You can also find a script that will exploit this vulnerability to retrieve the expected token.
Run the web application
In order to run the web application, you’ll need a MongoDB database running.
First create a virtualenv:
python -m virtualenv -p $(which python3) venv
source venv/bin/activate
Then install the dependencies:
pip install -r requirements.txt
Inject some articles by running:
python inject_articles.py
You can then run the web application this way:
python app.py
You can now get the articles in the python
category with:
$> curl -X POST -H "Content-Type: application/json" -d '{"category": "python"}' "http://localhost:5000/category/"
[{"category": "python", "title": "Running js in python"}]
Or in the security
category with:
$> curl -X POST -H "Content-Type: application/json" -d '{"category": "security"}' "http://localhost:5000/category/"
[{"category": "security", "title": "How to safely store password"}]
But you can’t get the articles in the drafts
category:
$> curl -X POST -H "Content-Type: application/json" -d '{"category": "drafts"}' "http://localhost:5000/category/"
[]
Hack the application
The category endpoint is accepting data without first validating it, so we can actually send a json object and do a Mongodb injection with this payload
{"$gte": ""}
With this payload, the endpoint will returns all articles with a category, include the ones in the drafts
category:
$> curl -X POST -H "Content-Type: application/json" -d '{"category": {"$gte": ""}}' "http://localhost:5000/category/"
[{"category": "python", "title": "Running js in python"},
{"category": "security", "title": "How to safely store password"},
{"category": "drafts", "title": "My secret draft"}]
Use docker
We also provided a docker-compose file that you can use if you don’t have or want to have a running MongoDB process.
First you can build with:
docker-compose build .
Then launch it with:
docker-compose up
The service is now accessible at http://localhost:5000
and you can use the same curl
commands as above.
References:
- http://www.ibm.com/developerworks/library/l-sp2/index.html