Team & Tech
Cloud: Azure (primary), expanding to GCP/AWS
Platform: Databricks, Spark (batch + streaming), Airflow, Apache Superset, Kafka
Data & governance: Delta Lake, Unity Catalog, Delta Sharing
Infra & delivery: Terraform, Docker/Kubernetes, CI/CD (GitHub Actions/Azure DevOps)
Interfaces: REST/gRPC; schemas with Avro/Protobuf
Processing alternatives: Apache Flink/Apache Beam where appropriate; custom
processors/services in Go for specialized low-latency needs
App stack: React + TypeScript (front-end), Go (preferred) and Java (backend)
Focus: Real-time streaming, lakehouse analytics, reliability, and cost efficiency
Experimentation & metrics: MLflow for experiment tracking and AI quality/performance
metrics
Tooling integration: MCP (Model Context Protocol) to expose/consume data tools for agents
What you’ll do
Design, build, and operate low-latency streaming pipelines (Kafka, Spark Structured
Streaming) and robust batch ETL/ELT on Databricks Lakehouse.
Establish reliable orchestration and dependency management (Airflow), with strong SLAs
and on-call readiness for business-critical data flows.
Model, optimize, and document curated datasets and interfaces that serve analytics, product
features, and AI workloads.
Implement data quality checks, observability, and backfills; drive root-cause analysis and
incident prevention.
Partner with application teams (Go/Java), analytics, and ML/AI to ship data products into
production.
Build and maintain datasets and services that power RAG pipelines and agentic AI workflows
(tool-use/function calling).
When Spark/Databricks isn’t optimal, design and operate custom processors/services in Go
to meet strict latency or specialized transformation requirements.
Instrument prompt/response and token usage telemetry to support LLMOps evaluation and
cost optimization; provide datasets for labeling and golden sets.
Improve performance and cost (storage/compute), review code, and raise engineering
standards.
Security & Compliance
Design data solutions aligned to enterprise security, privacy, and compliance requirements
(e.g., SOC 2, ISO 27001, GDPR/CCPA as applicable), partnering with Security/Legal.
Implement RBAC/ABAC and least-privilege access; manage service principals, secrets, and
key rotation; enforce encryption in transit and at rest.
Govern sensitive data: classification, PII handling, masking/tokenization, retention/archival,
lineage, and audit logging across pipelines and storage.
Build observability for data security and quality; support incident response, access reviews,
and audit readiness.
Embed controls in CI/CD (policy checks, dependency vulnerability scanning) and ensure
infra-as-code adheres to guardrails.
Partner with security engineering on penetration tests, threat modeling, and red-team
exercises; remediate findings and document controls.
Contribute to compliance audits (e.g., SOC 2/ISO 27001) with evidence collection and
continuous control monitoring; support DPIAs/PIAs where required.