Thursday, July 14, 2016

CTF for better IDS! Part one.

Having some experience now of using osquery ( in production coupled with Splunk to build a centralised IDS, I began to turn my thoughts to what exactly I needed to be detecting.

The problem often seen with IDS implementations (if not the products themselves), is no matter where they get located in a given stack is the fact they tend to become "Everything is OK!" alarms; generating nothing but varying levels of noise which give you a warm, fuzzy false sense of security which based only on the successful execution of your existing controls. I've had a great many conversations with people at various positions in IT who think that IDS consists of logging blocked connections at firewalls, failed auth attempts, bad HTTP requests and making them all into a pretty graph...


Don't get me wrong, those data are important for profiling your attacker and detecting the reconnaissance stage of the killchain, but in isolation they don't provide much value apart from a nice way of seeing your investment in shiny firewalls pay off; blocked connections, locked accounts and failed HTTP requests don't steal your data.

With that in mind, let's look an exercise where successfully attacking a host is the objective, a CTF. I've been playing with boot2root and other attackable image vareities from for a few years now, and the next steps seemed pretty logical...

1. Pick a nice CTF image.
2. Root it
3. Install OSQuery with a reasonably broad config.
4. Get OSQuery events into Splunk
5. Clean up any artifacts of your rooting...
6. Re-hack using the same steps in 2.
7. Analyse the lovely data.
8. Build better alerting and correlation based off the observed patterns during the attack.

Up to step 6 now. Given the variety of host OS' onto which these challenges are built, getting OSQuery to play nicely wasn't easy; you may find yourself chasing dependencies for a while. Anyway,  I have a couple of CTF images, one boot2root and another more of a vulnerability busybox, both images now running queries from the users, processes, and sockets tables in OSQuery and feeding their results into Splunk. I'm also pushing httpd requests and errors too, in order to get a side-by-side picture of the cause and effect between the attacking the web application and any generated OS artifacts.

I also have a rough timeline of various fields from queries in Splunk, set out like this:

index=* sourcetype=_json name!=info name!=rpm_packages "hostIdentifier"=* | eval hash=md5(_time._raw) | eval "action_name_host" = action+" - "+name+" - "+hostIdentifier | table "action_name_host", "calendarTime", "columns.address","columns.remote_address","columns.local_address","columns.remote_port","columns.local_port","columns.user","columns.md5","columns.target_path","columns.path","columns.mode","","columns.action" "","columns.tty", "", "event_id","unixTime",hash | stats values(*) as * by hash | sort - "unixTime" | fields - unixTime,hash

Which is giving a reasonable spread of events on the targets in a nice timeline format:

Now to get hacking...