As a Senior–Staff–Principal MLOps Engineer at TRU Rec AI , you will own the end-to-end ML systems lifecycle at TRU: training, evaluation, deployment, monitoring, and continuous improvement of production models. This is not a support role. You will design the ML platform itself—setting standards, defining architecture, and making trade-offs that directly affect product performance, cost, and customer trust. You will work tightly with Computer Vision, Backend, and Product teams to ensure that experimentation translates into stable, observable, and scalable production systems. Key Responsibilities Design deployment patterns for LLM/VLM inference (GPU scheduling, concurrency control, caching, batching, streaming responses where relevant). Implement RAG and multimodal retrieval pipelines (vector DBs, embedding lifecycle, evaluation, refresh policies). Build evaluation harnesses for LLM/VLM systems (golden sets, regression tests, hallucination/consistency checks, safety filters). Support fine‑tuning / adapters (LoRA/QLoRA), model registry, and rollback‑safe release processes. Own cost/performance governance for foundation models (latency SLOs, token/image cost budgets, GPU utilization). ML Platform & Deployment Design and operate CI/CD pipelines for ML training, validation, and deployment (GitHub Actions, GitLab CI, Jenkins). Build and maintain scalable, GPU‑backed model serving infrastructure (Triton, TorchServe, FastAPI/gRPC, or managed services). Own containerization and Kubernetes deployment patterns for inference and batch workloads. Data & Streaming Systems Design and operate real‑time, event‑driven pipelines using Kafka / Redpanda / Pulsar / Kinesis / Pub/Sub / NATS. Implement and tune stream processing frameworks (Flink, Beam, Spark Structured Streaming) for low‑latency analytics. Enforce data contracts, schemas, and lineage across training and inference paths. Implement experiment tracking, model versioning, and lineage (MLflow, W&B, DVC, custom). Build automated retraining and continuous evaluation pipelines. Own observability across infrastructure and model behavior (latency, throughput, drift, quality metrics). Design alerting and dashboards (Prometheus, Grafana, Datadog). Performance & Reliability Collaborate with CV engineers to optimize GPU inference (batching, concurrency, memory, profiling). Drive cost/performance trade‑offs across cloud, hybrid, and on‑prem deployments. Ensure secure, reproducible, and auditable ML releases. Leadership Act as a technical authority for MLOps architecture. Mentor junior engineers and influence engineering best practices across teams. Required Skills & Experience 5+ years in MLOps / ML Platform / DevOps for ML, with deep Python experience. Strong production experience with Docker and Kubernetes. Proven operation of real‑time or near‑real‑time streaming systems. CI/CD, IaC (Terraform, Helm), and experience on AWS, GCP, or Azure. Strong systems thinking and cross‑team communication skills. Nice to Have CUDA, TensorRT, mixed precision, quantization, or model optimization for edge. vLLM / TensorRT-LLM / Triton for LLMs, KV‑cache management. Multimodal pipelines (image/video + text), prompt/version management, LLM eval tooling (custom or open-source). Experience deploying foundation models in on‑prem / constrained / privacy‑sensitive environments. Video analytics or large‑scale computer vision systems. Data governance and lakehouse technologies (Delta, Iceberg, Hudi). #J-18808-Ljbffr
Senior Mlops Engineer
AT
city of melbourne, city of melbourne
Published 4 days ago
Report job
Similar jobs
Part Time Work From Home Focus Group Panelist. Call Centre Agent Experience Not Required
APEX FOCUS GROUP LLC
Permanent