Monday, January 11, 2016

An Overview of the Upcoming libModSecurity

Notice: This blog post was originally posted on SpiderLabs Blog.

libModSecurity is a major rewrite of ModSecurity. It preserves the rich syntax and feature set of ModSecurity while delivering improved performance, stability, and a new experience in easy integration on different.


libModSecurity - Motivations

While ModSecurity version 2.9.0 is available on different platforms (IIS, NGINX, etc…), It really favors an Apache Deployment. ModSecurity standalone is part of ModSecurity project and is basically a wrapper that packs requests from different formats into an Apache format, to later be processed by ModSecurity in the same fashion that it works on an Apache web server. That was certainly the fastest way to have ModSecurity running on different platforms but at the cost of performance and high amount of dependencies. Leading, for instance, to situations where NGINX users have to install Apache dependency in order to have the NGINX ModSecurity module working; see metabug ModSecurity/#661 for further information. The growth of the project in terms of integration with scripting languages or adoption in other platforms such as IDSs becomes very difficult because of those limitations.

The addition of new operators or general features is also limited because ModSecurity 2.9 (Sec Language) uses the Apache configuration parser to be loaded. In other words, the language format and conditional syntax could not be extended because it was dependent on a 3rd party software, that may not have our features in mind when they make changes. Even worse, sometimes bugs that heavily affect us are not on the priority list of the Apache (https://bz.apache.org/bugzilla/show_bug.cgi?id=55910).

LibModSecurity – Our goals

In order to circumvent those limitations, a year ago we decided to develop a new version of ModSecurity. A version for which the primary goal was not to introduce new features, but to add the possibility of an easy expansion and integration.

Due to the limitations of “ModSecurity standalone” and ModSecurity v2.9 architecture, we decided to move forward to implement something from scratch. By providing a new architecture while simultaneously supporting all the features of 2.9 we were able to remove many of the limitations that have burdened the project for the past few years.

In this blog post I plan to go over some of the details on why it was important to change the architecture and how some of those changes will lead to positive outcomes. Enjoy...

More about ModSecurity standalone architecture

As explained previously, the ModSecurity 2.x version has a high dependency on 3rd party projects including Apache. As it uses Apache internals directly, it was also using the libapr (Apache Portable Runtime). To highlight this point, we note that almost all memory allocation inside ModSecurity was allocated using libapr memory pools. This is, of course, not a problem, unless you want to remove APR.

To understand a little bit about the level of dependency of ModSecurity version 2.9 on 3rd party modules have a look on the Figure 1.

Figure 1. ModSecurity dependencies. (a) nginx extension. (b) IIS extension. Notice that both utilize the ModSecurity Standalone module.

At the right of Figure 1 we have several dependencies that are bound to the operators or request body parsers (e.g. libxml), some of those are mandatory on ModSecurity version 2.9.x and the idea is that those will become optional in ModSecurity version 3. The availability of a given feature will depend on the existence of the dependency. But, that is the subject for another blog post :)

On the left side of the Figure 1 are listed the dependencies that are mandatory for the ModSecurity v2.9.x core execution, without these core features of ModSecurity v2.9 cannot function. Removing the APR without removing other Apache dependencies was not be possible, as there are internal calls to the Apache API which demand data in the APR structure (APR memory pool). Additionally, removing just the Apache dependency also did not make sense, as it would need to be replaced by something Apache-like in any event, inheriting all those limitations. So the natural step was to move to a new architecture free of both Apache and APR. 

libModSecurity

The first thing that comes to mind when discussing “refactoring” of the ModSecurity core, is the possibility to have a segmentation of what is the “core” and what code is required to interact with a given web server, or as we call it a “connector”. Looking at the issues on GitHub it is difficult to tell which bug reports belong to each part of the code.

This monolithic design also puts pressure on us, its maintainers, when developing, testing, and packaging releases. This is because whenever we release in the current version, 2.x, everything needs to be released together (All platforms, even if there is no benefit to a given platform). Splitting the core from the “connectors” seems to be the right choice. Splitting the project give us numerous advantages, not only from the project planning perspective, but also the fact that the library can be easily ported, , and manipulated in this more modular architecture.

By splitting the project between “connectors” and “core”, the core naturally become a library, and the connectors become consumers of the core library. This way, ModSecurity core becomes completely independent of the underlying web server.

Another goal of the project was to reuse as much code as possible; this was done for two main reasons: (1) the code was tested and proven to work. (2) we didn’t need to implement everything from the scratch. So we first implemented a very minimalistic LALR parser (SecLanguage) to read the rules and transform them into C++ objects in memory. That limited language was slowly expanded as we added more operators, transformations and variables.

During the development of the core, other important utilities were also created. Two of those utilities deserve special attention, namely, the regression and unit test utilities. Both have a very important role in the development of ModSecurity version 3 and probably they will be even more important for the maturity of the project. In the future there will be a specific blog post to cover the regression and unit tests inside ModSecurity version 3.

One of the challenges on ModSecurity version 3 was to rewrite the SecRules, but to return exactly the same results of ModSecurity v2.9.x, thereby avoiding a compatibility break. This meant that we also had to reproduce the corner cases and unexpected behaviors of v2.9.x. This was achieved, so far, by the utilization of the regressions tests, were the requests are mimicked into JSON and the expected results are established from analysis of ModSecurity 2.9.

Going beyond the test, we needed to see ModSecurity version 3 working in practice as part of a web server. We choose the NGINX web server to be the first to fully implemented ModSecurity version 3. For that we started a different GitHub project, called ModSecurity-nginx (https://github.com/SpiderLabs/ModSecurity-nginx).

Figure 2 contains the dependencies of “ModSecurity NGINX connector”.


Figure 2. Dependencies of ModSecurity NGINX connector for libmodsecurity.

The ModSecurity NGINX connector, together with libModSecurity was presented at NGINX.conf 2015. At the conference we discussed various aspects of the ModSecurity connector and we obtained valuable feedback from the community to shape development of the connector.

Current status?


Currently we have the major features of libModSecurity implemented and ready to use. Although we don’t have support for collections yet, it is possible to load the OWASP core rule set version 3.0.0-dev.

So far, we are 374 commits ahead of master. Supporting almost all ModSecurity 2.x features. Figure 3. Contains a punch card with the commits to the project so far.


 Figure 3. Commits punch card.


Testability

While starting with versions 3, one of the main concerns was the project quality and that is why most of the features contain regression tests and/or unit tests. These tests can be executed on the developer’s machine but they are also executed on our BuildBots.

ModSecurity version 2.9.x already has its own regression and unit test utilities, however, they are very slow. A test with these utilities will take around 1 hour, has a dependency on Perl scripts, is Unix only, and is dependent on the web server.

With version 3 we fork the tests used in 2.9 into a separate GitHub project, and we added it inside ModSecurity by the utilizing a Git subtree. Having the test cases migrated into a separate project gives us the flexibility to run different tests with different versions of ModSecurity.

We also created another subset of tests that is specific  to the connector. This was done so we can segregate potential problems between the core and any connectors. In the specific case of NGINX, the regression tests were built as an extension of the NGINX server regression tests.

All these tests are executed for each commit that is performed on GitHub. This gives us the capability to promptly identify and fix any problem during the development of ModSecurity v3. It also gives us the ability to mimic problems without needing an entire web server configuration.

 Figure 4. Buildbot in the shape that we want to see, all green.

Performance

Performance is something that we are constantly measuring during the development of version 3. Instead of using the DebugLogs, with performance time stamps, we created SystemTap scripts that allow for real time instrumentations. SystemTap also allows the creation of flame charts. As demonstrated on Figure 5.

Figure 5. ModSecurity version 3 execution time using OWASP CRS version 3.0.0-dev.

The Figure 5 illustrates the mean times of each rule, grouped into different phases. Notice that the entire execution of the OWASP CRS took 483 microseconds. That specific subject deserves a creation of a specific blog post, together with the possibility of rules optimization.

What is next?

As mentioned, the libModSecurity still isn’t feature complete when compared to ModSecurity version 2.9.x. Your help is more than welcome to support that initiative. Currently, we have missing Connectors, Operators, Transformations, and Variables. How do you want to help?

Connectors

As mentioned before, currently we only have connector for nginx. In other words, Apache users will not have any benefit from ModSecurity v3, at least, not yet.

The development of the connectors for IIS and Apache will start as soon as we have a version of ModSecurity v3 released, unless someone from the community starts to develop it :)

Operators, Transformations and Variables

A question that we frequently receive is how to start coding for ModSecurity. Well, this is a good opportunity. The missing features in ModSecurity versions 3 are not very difficult to, plus, many of them already contain a description of what needs to be done. Some even have the file already, and just need to be filled. For a complete list of missing features with descriptions, check here:

Although all the operators needed by the ModSecurity SLR commercial rules are already supported inside ModSecurity version 3, we just encourage the usage of ModSecurity version 3 to advanced users. ModSecurity version 3 was not released yet, thus, not considered stable. Once it is released it will be 100% compatible with our SLR commercial rules.

Testing!

At this stage of the development testing is very important. We count on the community to provide feedback and report any issue on ModSecurity version 3. We are aiming to have a release candidate soon, ideally with only a small number of issues.


ModSecurity Python Bindings: Parsing ModSecurity rules from Python

Notice: This blog post was originally posted on SpiderLabs Blog.


One of the good things about the next generation of ModSecurity, libModSecurity (AKA ModSecurity version 3), is the fact that it portable to almost any platform. This extensibility makes the use of bindings for other languages, beyond C/C++, very useful. For those that are unfamiliar with the concept of a ‘binding’, the term describes an interface between two different languages. In the case of ModSecurity this binding will provide an interface that allows you to use libModSecurity functionally inside your scripting language of choice, with negligible loss of performance.

The libModSecurity Python bindings may serve many purposes; for instance, it could be leveraged to rapidly (and easily) create a custom web interface for ModSecurity. However, we are not limited to simply displaying information, the new bindings give you nearly all the capabilities that you would have interacting with libModSecurity from native C, allowing you to, for example, display ModSecurity rules in any fashion that you desire, with almost zero effort. A simple usage is demonstrated on the Figure 1.


Figure 1. ModSecurity being used inside a Python shell.

Bindings for languages others than Python will be also created. In fact, the utility that was used in the creation of the Python bindings, SWIG, can also generates interfaces to other languages such as: Ruby. You are welcome to volunteer to expand your favorite language to add ModSecurity capabilities.

In this blog post I will dive into the creation of a simple console utility to load and list a set of rules in a elegant way. For this I will start with the compilation of the ModSecurity-Python-bindings.

Notice: The ModSecurity Python bindings depend on the libModSecurity which is publicly available on ModSecurity GitHub, but it is not considered stable release at this point yet.

Installing ModSecurity Python Bindings

Before starting the installation of the ModSecurity Python bindings, make sure you have libModSecurity installed and operational on your machine. For further information about libModSecurity and how to install it, please see the README at ModSecurity’s GitHub repository, here: https://github.com/SpiderLabs/ModSecurity/tree/libmodsecurity . Of course, you will also want to make sure you have Python installed and all the developer utilities you may need on your Linux .

Once libModSecurity is installed, it is time to proceed with the installation of the ModSecurity Python bindings. Simply download the code from our GitHub repository, and proceed with the compilation, as demonstrated below:
  $ git clone http://www.github.com/SpiderLabls/ModSecurity-Python-bindings
  $ cd ModSecurity-Python-bindings
  $ make
  $ sudo make install


Once this process is finished, it is a good idea to test your new installation. You can do this by running the test script provided. This can be accomplished using the following command:
  $ ./test/t.py

If everything is alright, you should not get any error messages from Python.


The hello world script

With the ModSecurity Python bindings installed and tested it is time to create your first ‘hello world’ application using the ModSecurity library. Let’s start with something very simply like checking the ModSecurity version that we are playing with.

Inside libModSecurity we expose the whoAmI() method which describes which version of ModSecurity you are bound with. Further information about this method, can be found here: https://github.com/SpiderLabs/ModSecurity/blob/libmodsecurity/src/modsecurity.cc#L67-L78

Let’s use the whoAmI() method, to print the libModSecurity version. We demonstrate proper usage in Script 1 below.

#!/usr/bin/python

from modsecurity import *

modsec = ModSecurity()
s = “Hello World, I am: “ + str(modsec.whoAmI())
print s
Script 1. Hello world using libModSecurity.

As output, you should be able to see something similar to the Figure 2.

Figure 2. The output of the hello world script.

Loading the rules

The next step is to add a few more pieces to our script allowing it to actually load a given set of rules. Make sure you have a workable set of ModSecurity rules before you start to code. A good set for this example might be the OWASP CRS 3.0 ruleset, available here: https://github.com/SpiderLabs/owasp-modsecurity-crs/tree/v3.0.0-dev

Notice: Make sure you download the version 3.0.0-dev. Our parser is not 100% compatible with ModSecurity v2.9.x yet. It may not work with other versions of CRS.

Using our first example, start by commenting out the “Hello World...” string and the associated print statement. These two pieces won’t be necessary for this step. Your script should look similar to example provided in Script 2.

#!/usr/bin/python

from modsecurity import *

modsec = ModSecurity()
#s = “Hello World, I am: “ + str(modsec.whoAmI())
#print s
Script 2. Hello World script with the unnecessary strings commented.

The rules in ModSecurity are loaded through a Rules object. While the Rules object may be merged with other objects of the same type, in this script let’s keep it simple. For this example we just need to load a set of rules from a file and print them to the console. Again, a very simple utilization for libModSecurity.

We can load the rules, using the loadFromUri() method, which takes one argument as follows: “loadFromUri('/path/to/the/rules/file.txt')”. The loadFromUri() method allow us to load a set of rules into memory. Notice that this method will return “-1” if there are any problems while loading the rules. If this occurs you can call the getParserError() method to get more information about any possible errors. The loadFromUri() method must be called with the path to a rules file as a target, as demonstrated in Script 3.

#!/usr/bin/python

from modsecurity import *

modsec = ModSecurity()
#s = “Hello World, I am: “ + str(modsec.whoAmI())
#print s
rules = Rules.loadFromFile(“/path/to/your/v3.0.0-dev/rules/REQUEST-10-IP-REPUTATION.conf”)
Script 3. Loading the rules from a given file.

The Rules object has provides the getRulesForPhase() method which is called as follows: “rules.getRulesForPhase(phase_number)”. Calling this method will return a vector of Rule objects. The Rule object contains all the properties associated with a given rule. To list all the rules from the target file that will load during a given phase you may simply iterate over the returned value from getRulesForPhase(). This is demonstrated in Script 4.


#!/usr/bin/python

from modsecurity import *

modsec = ModSecurity()
#s = “Hello World, I am: “ + str(modsec.whoAmI())
#print s
rules = Rules()
r = rules.loadFromUri(“/path/to/your/v3.0.0-dev/rules/REQUEST-10-IP-REPUTATION.conf”)
if r == -1:
    print rules.getParserError()
    sys.exit()

i = 0
while i < modsec.NUMBER_OF_PHASES:
    r =  rules.getRulesForPhase(i)
    print "-- Phase " + str(i)
    for x in r:
        if x.rule_id == 0:
            continue

        print "    Rule Id: " + str(x.rule_id)
        print "       From: " + str(x.m_fileName) + " at " + str(x.m_lineNumber)
    i = i + 1
Script 4. Print all rules ID from a ModSecurity configuration file. 

The output of Script 4 should appear similar to the output illustrated below in Figure 3.

 Figure 3. The output of the script which prints the rules ID in the console.

Of course this script can be extended to print this information in any manner desired. A slightly more refined example is illustrated in Figure 4. Remember that as this is Python it is easy to display this information as part of GUI or web application.
 Figure 4. ModSecurtity rules information extracted using the libModSecurity parser, inside a Python script.


Conclusions

The new ModSecurity Python bindings should make it easy and fast to utilize libModSecurity as demonstrated in the examples above. These Python bindings are just one of a host of new features we will be presenting in upcoming blog posts, so be on the lookout.

The advantage of fast prototyping provided by the script language utilization, plus, the performance of a core in C++ opens a wide range of possibilities. Like the construction of a Rule editor in Python, and/or a Django web application to navigate inside the rules. In fact, why not implement those as open source projects?

Notice that those are just simple examples, but it is just as easy to use these bindings to actually process a transaction, but this is left as exercise for the reader, enjoy :)