Top 1 docker qlora model tuning react prompting Jobs

We are hiring a Senior AI Engineer who builds production-grade AI products end-to-end. You will design and ship AI agents, Retrieval-Augmented Generation (RAG) systems, and fine-tuned small language models, while also owning the full-stack delivery from React/Vue/Angular frontends through Python/Node backends to AWS, GCP and Azure deployments.

Equally important: you are an AI-adopted engineer. You use Claude Code, Cursor, Codex, and other AI coding assistants as a daily multiplier, and you know how to use them well — managing context, controlling token spend, writing CLAUDE.md / AGENTS.md files, using subagents and MCP servers, and applying evaluation-driven workflows so that AI-generated code is shipped responsibly.

What You Will Do

Design, build and deploy AI agents using LangChain, LangGraph, LlamaIndex, CrewAI or equivalent frameworks — including multi-agent orchestration, tool use, memory, and planning loops.
Architect RAG pipelines end-to-end: ingestion, chunking, embedding selection, vector stores (Pinecone / Weaviate / Qdrant / pgvector), hybrid search, re-ranking, query rewriting, and evaluation.
Fine-tune small and open-source language models (Llama, Mistral, Phi, Gemma, Qwen) using LoRA, QLoRA, PEFT, instruction tuning and DPO — and decide when fine-tuning is the right answer versus prompting or RAG.
Build full-stack AI applications: React/Next.js frontends with streaming UIs (Vercel AI SDK / SSE / WebSockets), FastAPI or Node backends, and well-designed APIs.
Own deployment, scaling and observability on AWS (Bedrock, SageMaker, Lambda, ECS/EKS) and GCP (Vertex AI, Cloud Run, GKE), with Docker, Kubernetes, Terraform and CI/CD.
Implement LLM observability and evals using LangSmith, Langfuse, RAGAS, DeepEval — and treat evaluation as a first-class engineering artifact, not an afterthought.
Apply AI coding assistants (Claude Code, Cursor, Codex, Windsurf, Copilot) as a daily tool with strong discipline around context management, token efficiency, subagents, hooks, slash commands, and MCP servers.
Address non-functional requirements: latency budgets, cost/token economics, prompt injection defense, PII handling, OWASP LLM Top 10, rate limiting, semantic caching, and graceful degradation.

Must-Have Skills

AI / GenAI Engineering

4+ years of software engineering and at least 2 years of hands-on production work with LLMs (OpenAI, Anthropic Claude, Gemini, or open-source).
Strong RAG experience: chunking strategies, embedding models, vector databases, hybrid search, re-ranking, evaluation, and avoiding common failure modes.
Production experience building AI agents with LangChain and LangGraph (or LlamaIndex, CrewAI, AutoGen, Pydantic AI). Comfortable with tool/function calling, structured outputs, agent memory and multi-agent patterns.
Experience fine-tuning small/open-source models (LoRA, QLoRA, PEFT) and using Hugging Face Transformers, Datasets, Accelerate, and the Hub.
Strong prompt engineering: system design, few-shot, chain-of-thought, prompt caching, structured output schemas, evaluation of prompts as code.

AI-Augmented Development

Daily, production-grade use of Claude Code, Cursor, or Codex. Understands CLAUDE.md / AGENTS.md, project memory files, slash commands, subagents, hooks, MCP servers, and plan-vs-execute workflows.
Deliberate token and context management: knows when to use Haiku vs Sonnet vs Opus (and equivalents on other providers), uses prompt caching, batches work, prunes context aggressively.
Disciplined review of AI-generated code, with tests and evals — never ships unread output.

Full-Stack Engineering

Backend: Python (FastAPI / Flask) and/or Node.js (TypeScript). Solid grasp of async patterns, streaming responses (SSE / WebSockets/ API).
Frontend: React, Next.js, TypeScript, Tailwind CSS. Comfortable building streaming chat UIs and agentic interfaces.
Databases: PostgreSQL, Redis, at least one vector DB. Familiar with schema design, indexing, and query optimization.

Non-Functional Engineering

Latency: streaming, parallel tool calls, model routing, semantic caching, request batching.
Cost: token accounting, model tiering (cheap-first), prompt caching, context compression, batch APIs.
Security: prompt injection defense, output filtering, PII redaction, OWASP LLM Top 10, secrets management, least-privilege IAM.

Good to Have

Anthropic Claude Code certification or Anthropic skill-based credentials.
NVIDIA Generative AI / LLM certifications, DeepLearning.AI Specializations (LangChain, RAG, Agentic AI), or Hugging Face certifications.
Experience with MCP (Model Context Protocol) — building or consuming MCP servers.
Experience with GraphRAG, knowledge graphs (Neo4j), or hybrid symbolic/neural systems.
On-device or edge inference (Ollama, llama.cpp, ONNX, TensorRT).
Production deployments on AWS (Bedrock, SageMaker, Lambda, ECS/EKS, S3, IAM) and/or GCP (Vertex AI, Cloud Run, GKE).
Docker, Kubernetes, CI/CD (GitHub Actions or GitLab CI), and IaC (Terraform or Pulumi).
LLM observability and tracing: LangSmith, Langfuse, Weights & Biases, Arize, or equivalent. Evaluation harnesses in CI.

Apply Now

Refine Your Search

docker qlora model tuning react prompting jobs

Senior AI Engineer (Full-Stack)

I want to receive the latest job alert for docker qlora model tuning react prompting

Explore jobs in more locations

Job Seeker Tools

Employer Tools

Browse

Stay Connected