How To Fly Under The Radar Of Mass Internet Scanners

TL;DR

Using these custom written IDS rules with Suricata it's possible to evade a lot of bots scanning the internet, by checking connection details for things like, is TLS SNI field set correctly, or if HTTP request has been sent to an IP address instead of a domain. That way it becomes much easier to filter out usual internet noise from Your regular IDS events and your server is not listed on resources like Censys or Shodan, which in latter case, allows bulk export of information about servers they have scanned on the internet, which attracts even more bot activity.

Intro

So it has became apparent that one of the main issues of running an IDS system is that there are a lot of background internet noise generated every minute of a day. That in turn makes it hard to figure out if our infrastructure is being targeted specifically for compromise or it's just another scanner looking for easy targets.

There are multiple free or paid services on the internet, where everyone can lookup information about particular IP, to see what kind of services and their versions are running on the system. While it's only half of the issue, the other is that internet is full of compromised infrastructure, which are part of botnets looking for new targets to exploit.

Censys gladly showing information they have about a server.

Censys gladly showing information they have about our server.

In this writeup we are going to look into what we can do about this situation. By using software like Suricata, we can analyze every connection to our server, be it TCP, HTTP or SMTP layer type of connection, compare content to predefined set of IDS signatures, and block the connection mid-flight, if it matched anything we'd like to block.

That leaves us with a question of what to block.

The Bad

While there are surely a lot of things to block of what is coming from the internet on regular basis, our goal here is to drop scanners in particular. Everyone knows a famous scanner like Nmap. It is great for many things and has a lot of scanning modes and even own plugin system. This time around we are not focusing on its many TCP port scanning techniques, however, but HTTP requests in general, which are mostly the same among all scanners.

For example, this is how a request looks like, when it's sent by another great scanning tool called masscan.

GET / HTTP/1.0
User-Agent: masscan/1.0 (https://github.com/robertdavidgraham/masscan)
Accept: */*

It's easy to identify it as suspicious, as it lacks a Host header in HTTP request, and well, it clearly is not hiding the fact that it's masscan. But we also know that anyone can make masscan, which is open source, to include a Host header and change User-Agent field to something else, and all of a sudden signature rules wont match anything and operator of scanner will know we are having a WEB server hosted.

That's why we are going to focus on only thing we can to identify scanning attempts - all these scanner operators don't know what we are actually hosting. In specific - they don't know the domain name of the server they are connecting to. So most likely, Host header in HTTP request is going to be plain IP address of a server. The only practical ways of knowing domain beforehand is, if servers IP address has fully qualified domain name set as PTR record, it is the easiest method of all, that is rarely the case ofcourse, unless there is email involved. If server has a Let's Encrypt certificate, then domain is a public knowledge, see crt.sh. This is a common thing these days, but it still requires quite a bit of infrastructure from ~~attackers~~ scanner operators part - and that is to handle a list of all those millions of domains. Way more work than generating list of IP addresses. And last technique is checking what type of Common Name field is present on TLS certificates returned by target server, but we will get to that.

So the first part we identified - it's IP address in Host header. The second part comes with TLS which is widely used these days. Whenever anyone connects to a server by using TLS, they first send Client Hello handshake packet, which might contain a domain name client wants to securely connect to. Here it becomes obvious that scanners most likely won't have any idea of domains hosted on server, so they omit this field altogether. Server will still return its default SSL certificate ( most likely it only has one ), and client can finish TLS connection and start communicating with server securely. What we are going to do is prevent connections to those who haven't set such field. This will still work for anyone legitimately trying to visit our WEB page by using browser, as they know the domain, they enter it - browser does the handshake and actually checks that certificate is indeed of what you are trying to visit. Be rest assured this will break a lot of custom code using TLS and most likely protocols like SMTPS or IMAPS. There are a lot of implementations out there using TLS just for encryption of traffic and don't bother with setting extra TLS fields. We here focus on protecting WEB servers from scanners this time around, that is easy to do by setting signatures to only trigger on HTTPS port - 443.

Now onto actual signatures.

The Rules of the Rulers

Before writing our own rules, lets see what Suricata comes packaged with. There are about 47 rules for various anomalies in HTTP requests which are determined by the IDS engine when processing HTTP headers. For now we only going to enable these two and set them to drop when either Host header is missing or not needed, like when using CONNECT method.

SURICATA HTTP missing Host header
SURICATA HTTP Host header ambiguous

A good place to start with writing own IDS signatures is excellent Suricata documentation on the topic. First we are going to block things connecting to clear-text HTTP ports ( need to redirect those clients to TLS somehow, right? ) and that is almost easy enough.

Header of our rule looks like this:

# Drop HTTP protocol level connections, coming not from our servers ( internet ), from any TCP port 
# coming to our HTTP servers, http port.
drop http !$HOME_NET any -> $HTTP_SERVERS 80

And here is the breakdown of our IDS Rule:

# Work on established TCP connections, incoming to our server:
flow: established,to_server;
# Check that Host header contains a dot, somewhere between 1 and 3 bytes deep.
content: "."; http_host;offset: 1;depth: 3;
# Then check if there is another, 4 bytes deep from last match.
content: "."; http_host;within: 4;
# If we got this far, check for another dot.
content: "."; http_host;within: 4;
# Here comes heavy part, use regex to check if Host indeed looks like IP address. 
# It's expensive to do, but it's only done when we are sure it already looks like IP address.
# also note that it's done on http.host buffer, which is normalized version, meaning leading :80 port
# is already removed by IDS engine, if it was there to begin with.
pcre: "/^(?:\d{1,3}\.){3}\d{1,3}$/W";
# These are not needed, but nice to have, this flags whole connection, so other rules in IDS engine
# can decide to do something else, knowing that connection they are dealing have 'Host as IP address'.
# if we are blocking the connection from here on, however, these flowbits wont be useful.
flowbits: isnotset, IDPS_IP_HOST;
flowbits: set, IDPS_IP_HOST;

Note, comment '#' is not a valid part of IDS signature syntax, it's rather hard to write them in files, so don't copy these examples, but rather get our signatures directly form the source here.

So far so good, any incoming HTTP request to server with Host as IP address will be dead in its tracks, before server receives it, so scanner wont get a reply at all and time out on request.

Now onto more advanced stuff, figuring out how to stop TLS handshake. The best help here is Wireshark, which has an excellent protocol breakdown to fields with their respective names and lengths. It is excellent tool and is crucial when writing good IDS signatures.

Content of TLS packet with an explanation for every field as seen in Wireshark. Blue are fields checked by rule, green are fields checked for extension type.

The rule header we define as follows:

# Drop TCP level connection coming not from our server ( internet ), from any TCP port
# incoming to our HTTP servers, ports 443 and 4443.
drop tcp !$HOME_NET any -> $HTTP_SERVERS [443,4443]

And then onto Rule body:

// work on established TCP sessions, incoming to our server
flow: established,to_server;
// match TLS 0x16 TLS handshake, version 1.0 0x0301
content: "|16 03 01|";depth: 3;
// Find 0x01 by skipping 2 bytes ( that's message length ), indicating this is TLS Client Hello message
content: "|01|";distance: 2;within: 1;
// Read 1 byte of packet and skip that much over Session ID length:
// but do that at 37 byte offset ( skipping over message length, version field and 32 byte random. )
byte_jump: 1, 37, relative, big;
// next read 2 bytes and skip over cipher suites length:
byte_jump: 2, 0, relative, big;
// next read 1 byte and skip over compression methods length:
byte_jump: 1, 0, relative, big;
// Read two bytes and save it in ext_len variable.
byte_extract: 2, 0, ext_len, relative, big;
// And finally we periodically match packet content with an inverted match,
// for 0x0000, which means there is no TLS SNI extension set.
// 'isdataat' checks if there is still content in packet to work with.
content: !"|00 00|";distance: 0;within: 2;isdataat: 1, relative;
byte_jump: 2, 2, relative, big;
content: !"|00 00|";distance: 0;within: 2;isdataat: 1, relative;
byte_jump: 2, 2, relative, big;
content: !"|00 00|";distance: 0;within: 2;isdataat: 1, relative;
byte_jump: 2, 2, relative, big;
content: !"|00 00|";distance: 0;within: 2;isdataat: 1, relative;
byte_jump: 2, 2, relative, big;
content: !"|00 00|";distance: 0;within: 2;isdataat: 1, relative;
byte_jump: 2, 2, relative, big;
// if this rule triggers, don't generate an event for another 10 seconds, if it comes from same IP
// that is to prevent us from being flooded with events, if such attempts repeat.
threshold: type limit, track by_src, seconds 10, count 1;

If IDS rule syntax had any flow control, this would have been a bit easier task. There is Lua support available in Suricata IDS, which could greatly help us in task like this, but that is a blog post for another time.

The end result is IDS signature match on TLS "Client Hello" packet, without SNI extension set, not at least in very first 5 extensions of the packet. In similar way we also create a rule to block if content of SNI is an IP address. But here we don't need to traverse packet "manually", since Suricata decoded fields for us already.

flow: established,to_server; tls_sni;
content: ".";offset: 1;depth: 3;
content: ".";within: 4;
content: ".";within: 4;
# check if SNI looks like an IP address
pcre: "/^(?:\d{1,3}\.){3}\d{1,3}$/";
flowbits: isnotset, IDPS_IP_SNI;
flowbits: set, IDPS_IP_SNI;
threshold: type limit, track by_src, seconds 10, count 1;

We can't use this decoded 'tls_sni' buffer in our previous rule, because apparently Suricata doesn't run rules matching on buffers which are in undefined state. No SNI set - no rule matching on sni_buffer.

So how do we fair with this rule set on action drop?

The Results

If you are reading this, then browser set SNI extension correctly and You visited this blog by its domain name. Which means connection wasn't dropped as a suspected scanner.

Our new IDS rule triggered multiple times in just couple of hours

While there are various networks in there, Censys one stands out by its name, so nice of them to name their network IP addresses. After only couple of days after applying these signatures we are effectively running under the radar of these mass scanners. Shodan has yet to rescan our server, because at the time of writing this, their database still contain old data.

Censys - "Best Attack Surface Management for the Cloud", thank you, but we prefer to stay under the surface.

It's too early to compare by how much these signatures reduce bot activity on the network, but once some time has passed we are going to publish that information.

Of course it becomes clear that ~~attackers~~ scanner operators, can start setting random strings in TLS handshakes, then checking what Common Name server returns in response and then switch to that. But while they don't do that, we can have a break from a lot of useless IDS logs. In future there will be things like, encrypted SNI to which we will have to adapt our intrusion detection techniques. [::]