Evaluation of Rulesets
Creating rulesets for web-application firewalls is a tedious task. Rulesets based on a negative security
model tend to become very large as they contain a rule for each attack pattern. Positive rulesets are
therefore harder to write as these need a lot of experience with the application they are written for.
An important issue beyond that is - how good are these rulesets?
The article proposes a test-environment for evaluating rulesets written for the ModSecurity module.
The article proposes a test-environment for evaluating rulesets written for the ModSecurity module.
Problem statement
The ModSecurity module is a filter-plugin for the well-known Apache webserver. It enhances the Apache server with the possibility to define a complex set of rules for filtering bad requests made to the server. With the tremendious power of the ModSecurity module a big leap towards better protection of webservers has been made. However the power of ModSecurity relies on the ruleset deployed with the rule-engine. These rulesets define, which requests are blocked by the engine and which pass through. So the security modules gives you a lot of power that needs to be setup to work well.
Evaluation Criteria
The criteria described in this article are splitted into two categories. Which of these to give more priority depends on a lot of factors outside the scope of this text. The first one being the detection factor, which describes how many attacks have been found by your ruleset and how many of those have been false positives. Another thing to observe when evaluating rulesets might also be the performance factor, since you want security, but not scare of your visitors with weak performance. So how are these factors defined? Let's have a deeper look at these indicators before thinking about how to measure them.Detection Factor
If writing a ruleset, you obviously want it to detect any attacks against your web application/webserver. So you need some serious ways how to measure your rulesets detection-rate. In this article I will base this on two distinct properties:- detection rate
- false-positive rate
Unfortunately, you will not know the number of attacks against your server in advance. However, if you would, the detection rate would give you a good impression on your ruleset. As this seems to be needless in a first glance, it still allows you to compare different rulesets regarding their detection rate when evaluating them on a common test scenario. The other important quality that needs to be taken care of is the false-positive rate. False positives are legal requests by your visitors/customers that are marked as attacks and - depending on your rules - are probably blocked by your ruleset. There are several studies about visitors quickly being annoyed by web sites if they don't respond/load quick enough or produce errors. These visitors are most often on the edge of never coming back to your site. But legal requests that are blocked are not just one of these things that contribute to that. In a more severe way false positives might stop production environments (e.g. blocked requests in a b2b order application) and result in expensive fail-outs.
The false-positive rate is the quotient of the number of wrongly blocked/marked requests divided by the total number of alerts that your ruleset triggered. Thus it is the fraction of false alerts and you want this to be at small as possible, ideally 0.0.
Performance Evaluation
Despite the protective qualities of a ruleset its performance is another important issue. The number of rules and their individual complexity might influence the additional time that is needed for processing by the ruleset. In this article I will refer to this additional time as ruleset delay. It is calculated by measuring the time needed for the server from receiving the request, process it (maybe send a proxy-request to a backend server) and send a response back to the client which is in the following refered to as processing time. This processing time is measured twice. First with the ruleset disabled, which will result in an amount of time t1. The second measure is taken with the ruleset enabled which results in t2. The ruleset delay is then given by the difference t2 - t1. For more precision this should be repeated for a whole test-scenario in which case the average ruleset delay is to be calculated.
>> Part 2