About the Role
We are hiring a Senior AI Engineer who builds production-grade AI products end-to-end. You will design and ship AI agents, Retrieval-Augmented Generation (RAG) systems, and fine-tuned small language models, while also owning the full-stack delivery from React/Vue/Angular frontends through Python/Node backends to AWS, GCP and Azure deployments.
Equally important: you are an AI-adopted engineer. You use Claude Code, Cursor, Codex, and other AI coding assistants as a daily multiplier, and you know how to use them well — managing context, controlling token spend, writing CLAUDE.md / AGENTS.md files, using subagents and MCP servers, and applying evaluation-driven workflows so that AI-generated code is shipped responsibly.
What You Will Do
- Design, build and deploy AI agents using LangChain, LangGraph, LlamaIndex, CrewAI or equivalent frameworks — including multi-agent orchestration, tool use, memory, and planning loops.
- Architect RAG pipelines end-to-end: ingestion, chunking, embedding selection, vector stores (Pinecone / Weaviate / Qdrant / pgvector), hybrid search, re-ranking, query rewriting, and evaluation.
- Fine-tune small and open-source language models (Llama, Mistral, Phi, Gemma, Qwen) using LoRA, QLoRA, PEFT, instruction tuning and DPO — and decide when fine-tuning is the right answer versus prompting or RAG.
- Build full-stack AI applications: React/Next.js frontends with streaming UIs (Vercel AI SDK / SSE / WebSockets), FastAPI or Node backends, and well-designed APIs.
- Own deployment, scaling and observability on AWS (Bedrock, SageMaker, Lambda, ECS/EKS) and GCP (Vertex AI, Cloud Run, GKE), with Docker, Kubernetes, Terraform and CI/CD.
- Implement LLM observability and evals using LangSmith, Langfuse, RAGAS, DeepEval — and treat evaluation as a first-class engineering artifact, not an afterthought.
- Apply AI coding assistants (Claude Code, Cursor, Codex, Windsurf, Copilot) as a daily tool with strong discipline around context management, token efficiency, subagents, hooks, slash commands, and MCP servers.
- Address non-functional requirements: latency budgets, cost/token economics, prompt injection defense, PII handling, OWASP LLM Top 10, rate limiting, semantic caching, and graceful degradation.
Collaborate with product, design and business stakeholders to translate ambiguous problems into shippable AI solutions, and mentor mid-level engineers on AI engineering practices.
-
Must-Have Skills
AI / GenAI Engineering
- 4+ years of software engineering and at least 2 years of hands-on production work with LLMs (OpenAI, Anthropic Claude, Gemini, or open-source).
- Strong RAG experience: chunking strategies, embedding models, vector databases, hybrid search, re-ranking, evaluation, and avoiding common failure modes.
- Production experience building AI agents with LangChain and LangGraph (or LlamaIndex, CrewAI, AutoGen, Pydantic AI). Comfortable with tool/function calling, structured outputs, agent memory and multi-agent patterns.
- Experience fine-tuning small/open-source models (LoRA, QLoRA, PEFT) and using Hugging Face Transformers, Datasets, Accelerate, and the Hub.
- Strong prompt engineering: system design, few-shot, chain-of-thought, prompt caching, structured output schemas, evaluation of prompts as code.
AI-Augmented Development
- Daily, production-grade use of Claude Code, Cursor, or Codex. Understands CLAUDE.md / AGENTS.md, project memory files, slash commands, subagents, hooks, MCP servers, and plan-vs-execute workflows.
- Deliberate token and context management: knows when to use Haiku vs Sonnet vs Opus (and equivalents on other providers), uses prompt caching, batches work, prunes context aggressively.
- Disciplined review of AI-generated code, with tests and evals — never ships unread output.
Full-Stack Engineering
- Backend: Python (FastAPI / Flask) and/or Node.js (TypeScript). Solid grasp of async patterns, streaming responses (SSE / WebSockets/ API).
- Frontend: React, Next.js, TypeScript, Tailwind CSS. Comfortable building streaming chat UIs and agentic interfaces.
- Databases: PostgreSQL, Redis, at least one vector DB. Familiar with schema design, indexing, and query optimization.
Non-Functional Engineering
- Latency: streaming, parallel tool calls, model routing, semantic caching, request batching.
- Cost: token accounting, model tiering (cheap-first), prompt caching, context compression, batch APIs.
- Security: prompt injection defense, output filtering, PII redaction, OWASP LLM Top 10, secrets management, least-privilege IAM.
Reliability: retries, fallbacks across providers, rate limiting, queue-based decoupling, structured error handling.
-
Good to Have
- Anthropic Claude Code certification or Anthropic skill-based credentials.
- NVIDIA Generative AI / LLM certifications, DeepLearning.AI Specializations (LangChain, RAG, Agentic AI), or Hugging Face certifications.
- Experience with MCP (Model Context Protocol) — building or consuming MCP servers.
- Experience with GraphRAG, knowledge graphs (Neo4j), or hybrid symbolic/neural systems.
- On-device or edge inference (Ollama, llama.cpp, ONNX, TensorRT).
- Production deployments on AWS (Bedrock, SageMaker, Lambda, ECS/EKS, S3, IAM) and/or GCP (Vertex AI, Cloud Run, GKE).
- Docker, Kubernetes, CI/CD (GitHub Actions or GitLab CI), and IaC (Terraform or Pulumi).
- LLM observability and tracing: LangSmith, Langfuse, Weights & Biases, Arize, or equivalent. Evaluation harnesses in CI.