About us
We’re a SF Bay Area Cyber AI startup. We’ve raised funding from top tier investors. Our vision is simple: enable all security teams, to perform security operations with the efficiency and effectiveness needed to prevent breaches . We’re a small team of researchers and engineers with a deep focus in cyber and AI. Our product automates the triage for any security alert leveraging deep research, big data and dozens of AI Agents.
Join us and boost your career with hands-on AI experience.
The Role
As a Staff Software Engineer at Radiant Security, you’ll own the full lifecycle of customer security telemetry — from ingestion to storage in our data lake.
When customers face active incidents, our ingestion pipeline is mission-critical. Reliability and operational excellence here are product requirements, not just engineering ideals.
You’ll drive the scalability and reliability of our ingestion infrastructure, define the architecture of our data lake, and establish the DevOps practices that allow a lean team to evolve safely over time.
What you'll do
- Own and scale our ingestion platform end-to-end
Design and operate high-throughput ingestion pipelines with zero-downtime deployment patterns (dual-write, backfills, safe rollback), ensuring resilience under real-world failure modes (backpressure under load spikes, delivery guarantees, DLQs, replay mechanisms) and enforcing strict tenant isolation (per-tenant rate limiting, noisy neighbor prevention, storage partitioning across pipeline and lake layers)
- Define and evolve our data lake architecture
Own storage layout, partitioning, schema design, and ensuring efficient high-throughput writes and reliable downstream consumption, while managing lifecycle (compaction, retention, cold storage, cost optimization)
- Build and operationalize platform foundations
Develop deployment pipelines for stateful services, per-tenant quota systems, synthetic load testing, and monitoring that the broader engineering team depends on
- Establish reliability standards and operate in production
Define and enforce SLOs (latency, durability, availability), including alerting, and incident response, while continuously improving observability and operational excellence
- Drive technical leadership and platform strategy
Partner with product and engineering leadership to translate strategic goals into clear requirements and execution plans, while mentoring engineers, setting technical direction, and raising the bar on design, reliability, and operational excellence across the team
Things we're looking for
- Strong backend and data systems experience
Python, Golang, or Node.js, with proven experience building and operating high-throughput ingestion systems in production
- Cloud, streaming, and data platform expertise
Experience with AWS, GCP, or Azure (S3, GCS, Data Lake), streaming systems (Kafka, Kinesis — including delivery semantics and consumer group management), and large-scale data lake design (partitioning, formats, lifecycle)
- Production-grade infrastructure and reliability practices
Experience with zero-downtime migrations (dual-write, backfills, safe cutovers), Infrastructure as Code (Terraform, Pulumi), CI/CD (canary + rollback), and operating and monitoring data platforms in production (Prometheus, Grafana, Datadog), including SLO definition and incident response
- Strong distributed systems and storage fundamentals
Fault tolerance, backpressure handling, graceful degradation, partition tolerance, plus experience with databases, object storage, and performance tuning for high-throughput workloads
- Modern infrastructure stack experience
Containerization and orchestration (Docker, Kubernetes) for deploying and scaling stateful service
The process
Application Review > People Screening > Hiring Manager Interview > Technical Interviews > Executive Interview