There is a great debate in the security world right now: have SIEM and logging products run their course? Will Hadoop ride to the rescue? Can machines "learn" about security and reliably spot threats that no other approach can find?
Gartner calls this phenomenon Big Data Security Analytics, and they make a strong point to define BDSA solutions as a three-layer pyramid. At the bottom is the "data lake," which is what most people equate with Hadoop. The next layer is context-the addition of relevant business, location, and other non-traditional security information to increase the precision of the next layer: applications and analytics (such as Machine Learning). It is this top layer where the real value of BDSA is realized in terms of finding new threats and remediating them before they do damage.
At PetaSecure, the top of the BDSA pyramid is exactly where we are focused. We build next generation threat management applications that leverage Big Data frameworks like Hadoop. We cover three key areas: precision threat detection, false positive suppression, and reducing attack investigations from hours to seconds. For us, the Hadoop data lake resides in the Hadoop File System (HDFS).
Sounds simple, but it isn't. Even if an enterprise is successful in loading petabytes of security data into HDFS, it will just sit there unless it is curated and summarized for use by the analytics and applications. And keep in mind, Hadoop is not a single module or program, but a menagerie of constantly changing open source components. So just figuring out which of the Flume, Sqoop, HBase, Hive, Pig, etc. functions are required, integrating real time "supplements" like Storm and Kafka, and fitting them together into a security solution is a non-trivial task. Our favorite quote is: "we've gone from islands of data outside of Hadoop to islands of data inside Hadoop."
Oh, but aren't there are now "commercial" distributions of Hadoop? Yes, but they are designed for general purpose usage, everything from drug research to movie recommendations, and they aren't tuned or organized for the extreme velocity, variety, and volume requirements of real-time security data aggregation, threat detection, and remediation.
Apropos to all this, we recently caught up with an ex-ArcSight colleague who now does strategic security consulting for enterprises while teaching for SANS Institute (ArcSight is the leading SIEM product now owned by HP).
Since he is at the leading edge of enterprise security trends, we asked about how he sees Hadoop fitting in. The first thing he said was "Have you guys seen OpenSOC? It's just been released on Github and it has the potential to provide enterprises the security-specific Hadoop platform they need to supplement the SIEM and logging tools they now use for threat management."
Since we are in the Cisco Entrepreneurs in Residence (Cisco EIR) program and work closely with the Cisco Managed Threat Defense team that has partnered with Hortonworks to produce OpenSOC, we loved the validation. OpenSOC is a collaborative, open source development project dedicated to providing an extensible and scalable advanced security analytics tool. In working with Cisco, our vision is to add value to the OpenSOC ecosystem by building our technology and applications alongside the solutions that Cisco and others in the community will be providing.
Hadoop does have a key role in next generation enterprise security solutions. But it is only the starting point. The good news is that once the Big Data security framework is in place via solutions like OpenSOC, a torrent of new threat detection and management applications will be unleashed from companies like Cisco, PetaSecure and others.
The elephant dances, everyone wins.