Technical topics covered: Indexing, Capacity Planning, Querying, Integration | Languages used: Python | Size of Deployment: 50 node cluster with 50TB of data
Storing, searching, and analyzing cyber security data such as network logs, file metadata, or security tool output in an agile and expandable manner presents unique challenges. To maintain agility with the different organizational threats, such as Advanced Persistent Threats, hacktivists, cyber criminals, and broad-based threats, cyber security organizations must gather metadata from various sources with diverse output formats and verbosities. These pose many cyber security and big data challenges such as: data size, normalization, query centralization, indexing, real time performance sensitivities, and handling temporal datasets. To address these challenges and enable the cyber security analysts with a flexible and usable tool, we used a distributed FOSS NoSQL implementation with our own customized database rotation support to facilitate high throughput, quick queries and map reduces, and store a wide variety of datasets. The custom implementation resulted in our ability to support realistic query patterns across hundreds of terabytes of data stored independently by collection nodes distributed around the world. Additionally, the subsystem can be configured, operated and maintained with limited database administration knowledge by associate engineers rather than highly skilled database administrators.