Securing YugabyteDB: The SIEM/SOAR Quest
At Yugabyte, our mission is to build the most secure DBaaS* available. So we began researching how to best secure the infrastructure supporting our fully managed version of YugabyteDB, YugabyteDB Aeon (formerly YugabyteDB Managed). We began the process of evaluating SEIM/SOAR (Security Information and Event Management / Security Orchestration, Automation, and Response) solutions and quickly came to the conclusion that external, 3rd party solutions would not meet our needs.
Understanding Our Requirements
A SIEM/SOAR solution may be one of the most critical security infrastructure tools for a company (of any size). Event monitoring, access audits, log analysis, and file integrity monitoring are the fundamental security mechanisms the cloud is built upon.
So we began by outlining our essential requirements (on paper, believe it or not). We had two primary objectives with the SIEM/SOAR infrastructure.
- To deploy a security layer over our critical infrastructure
- To align with industry best practices and comply with top certification frameworks (ISO, SOC, etc.)
We wanted to cover our most critical pieces of infrastructure—the SaaS infrastructure behind the fully managed deployment of YugabyteDB, the build and test pipeline, and others.
We wrote down our exhaustive list of requirements with a few nice-to-haves. Our top objectives were:
- Intrusion detection system
- File integrity monitoring
- Malware detection
- Log retention and search
Our non-functional requirements included:
- Long-term log retention
- Easy integration with notification and alerting systems
- Ability to support custom log sources.
Cost Estimation
When developing any large solution, the purchase and implementation costs can be substantial, often making the solution prohibitive if they balloon. For our use cases, we factored the following costs into our estimate:
- Subscription costs(if SaaS vendor) or the cost of a license plus hosting(if a self-hosted vendor)
- Storage costs
- Data transfer costs
The first two (subscription and storage) would be the most expensive. Data transfer costs, even though significant on their own, would be minimal in the context of this big project.
Storage Estimation
To calculate pricing, a rough estimate of storage is required, which are heavily influenced by the type of solution used. A storage estimation also greatly helps determine the storage type that can be used, which could affect the underlying cloud type used (AWS/GCP/other) and pricing. Let’s create a quick formula for easy reference later.
We have two kinds of data sources:
- Cloud native audit logs(AWS Cloudtrail, GCP cloud logging, etc.)
- SIEM agents installed on machines
For each log source, we’ll have to find average log size per day Si, where i is the source name, e.g. Sgcp
For Cloud native sources, we need the number of cloud accounts/projects, Ai
For agent sources, we need number of agents, N
Finally we need the number of days for log retention, Di
The formula for storage calculation is:
((Saws * Aaws) + (Sgcp * Agcp) + (Sazure * Azure) + (Sagent * N)) * D
NOTE: This formula assumes that each cloud account’s average cloud log size would be the same, which is good enough for a rough storage estimate. For more precise calculations, the formula will split into Saccount1 + Saccount2 + …
Challenges in Finding the Right SIEM Tool
The next step involved locating this ideal tool. However, it proved difficult to find a single solution for all our security needs. No one product was tailor-made to our requirements. Our challenges included:
- Most SIEM tools are not cloud native or lack deep integration with public clouds
- There’s a significant gap in Kubernetes support among SIEM solutions
- Adapting and customizing to the evolving software landscape is an uphill battle
After evaluating many solutions, we finalized our SIEM/SOAR tool of choice. I’ll detail the tool and its implementation in the next blog post.
*NOTE: A few abbreviations
- SIEM – Security Infrastructure and Event Management
- SOAR – Security Orchestration Automation and Response
- DBaaS – Database as a Service