Diverse Agile Solutions logo

TO-695 Senior AI Operations Engineer

Diverse Agile Solutions
2 days ago
Full-time
On-site
Washington, District of Columbia, United States
$120,000 - $140,000 USD yearly

TO-695 – Senior AI Operations Engineer

Diverse Agile Solutions (DAS)

Location: Washington, DC (Hybrid) (or as required by the customer)
Clearance: Ability to obtain and maintain a Public Trust or applicable Federal clearance
Citizenship: U.S. Citizenship Required
Employment Type: Full-Time, W2
Performance Period: Through the end of the year, with the possibility of extension

About Diverse Agile Solutions

Diverse Agile Solutions (DAS) is a certified Minority Business Enterprise (MBE) delivering innovative IT solutions to Federal, State, and Commercial customers. Our expertise spans Cloud Engineering, DevSecOps, Artificial Intelligence, Cybersecurity, Enterprise Modernization, and IT Staff Augmentation. We help organizations modernize mission-critical systems while leveraging Agile methodologies and emerging technologies.

We are seeking a Senior AI Operations (AIOps) Engineer to lead the deployment, automation, monitoring, governance, and operational excellence of enterprise Artificial Intelligence and Machine Learning platforms supporting mission-critical federal systems.

This position is ideal for someone who combines DevOps, MLOps, Cloud Engineering, Site Reliability Engineering (SRE), and AI platform operations into scalable, secure production environments.

Position Overview

The Senior AI Operations Engineer will design, implement, automate, and support enterprise AI infrastructure and operational workflows. This individual will be responsible for deploying and maintaining production AI services, optimizing model performance, managing infrastructure automation, implementing monitoring solutions, and ensuring compliance with federal security requirements.

The engineer will work closely with Data Scientists, Machine Learning Engineers, DevSecOps teams, Cloud Architects, Cybersecurity Engineers, and software developers to operationalize AI solutions across secure cloud environments.

Responsibilities

  • Deploy, operate, and support enterprise AI/ML production environments
  • Design scalable MLOps pipelines for continuous model deployment
  • Automate AI infrastructure using Infrastructure as Code (IaC)
  • Build CI/CD pipelines supporting machine learning workflows
  • Implement automated model validation and deployment strategies
  • Monitor model health, drift detection, performance, and availability
  • Optimize GPU and compute resource utilization
  • Configure logging, observability, and operational dashboards
  • Manage AI model lifecycle from development through production
  • Support containerized AI workloads using Kubernetes
  • Build automated rollback and disaster recovery capabilities
  • Secure AI infrastructure following Zero Trust principles
  • Implement AI governance and model version management
  • Integrate AI platforms with enterprise applications
  • Maintain operational documentation and runbooks
  • Participate in incident response and root cause analysis
  • Collaborate with DevSecOps teams to automate security controls
  • Optimize cloud costs for AI workloads
  • Ensure compliance with NIST, FedRAMP, and federal security standards

Required Qualifications

  • Bachelor's degree in Computer Science, Engineering, Information Systems, or related field
  • 8+ years of IT engineering experience
  • 5+ years supporting cloud infrastructure
  • 4+ years supporting AI/ML production environments
  • Experience deploying enterprise AI solutions
  • Strong knowledge of MLOps methodologies
  • Experience with CI/CD automation
  • Experience managing production Kubernetes clusters
  • Experience supporting containerized workloads
  • Experience with infrastructure automation
  • Strong Linux administration experience
  • Experience with scripting and automation
  • Excellent troubleshooting and analytical skills
  • Experience working in Agile environments
  • Strong communication and documentation skills

Required Technical Skills

Cloud Platforms

  • AWS
  • Azure
  • Google Cloud Platform (GCP)

AI & Machine Learning

  • MLOps
  • Model deployment
  • Model monitoring
  • Model versioning
  • Model registry
  • Feature stores
  • Prompt management
  • Generative AI operations
  • AI inference optimization

DevOps & Automation

  • GitLab CI/CD
  • GitHub Actions
  • Jenkins
  • Terraform
  • Ansible
  • Helm
  • Docker
  • Kubernetes
  • OpenShift

Programming

  • Python
  • Bash
  • PowerShell
  • SQL
  • REST APIs

AI Frameworks

  • TensorFlow
  • PyTorch
  • Hugging Face Transformers
  • LangChain
  • MLflow
  • Kubeflow

Monitoring & Observability

  • Prometheus
  • Grafana
  • ELK Stack
  • Splunk
  • Datadog
  • CloudWatch
  • Azure Monitor

Data Technologies

  • PostgreSQL
  • MongoDB
  • Redis
  • Kafka
  • Snowflake
  • Vector Databases

Security

  • IAM
  • Secrets Management
  • Encryption
  • NIST 800-53
  • FedRAMP
  • Zero Trust Architecture

Preferred Qualifications

  • Experience supporting Federal Government customers
  • Experience operating AI workloads in AWS GovCloud
  • Experience with Azure AI Foundry
  • Experience with Azure OpenAI
  • Experience with Amazon Bedrock
  • Experience with Vertex AI
  • Experience implementing Responsible AI governance
  • Experience supporting Retrieval Augmented Generation (RAG) systems
  • Experience deploying LLM applications
  • Experience with GPU clusters
  • Experience with NVIDIA AI Enterprise
  • Experience with ServiceNow integrations

Preferred Certifications

One or more of the following:

  • AWS Certified DevOps Engineer
  • AWS Certified Machine Learning Engineer
  • Microsoft Azure AI Engineer Associate
  • Microsoft Azure Administrator
  • Kubernetes Administrator (CKA)
  • HashiCorp Terraform Associate
  • Certified Kubernetes Security Specialist (CKS)
  • Google Professional Machine Learning Engineer
  • Security+
  • CISSP

What You'll Do

  • Operationalize enterprise AI platforms
  • Improve reliability of production AI services
  • Build automated AI deployment pipelines
  • Reduce operational overhead through automation
  • Improve model performance and reliability
  • Enhance observability of AI systems
  • Implement secure AI operations
  • Enable scalable AI infrastructure across multiple cloud environments

Why Join Diverse Agile Solutions?

At DAS, you'll work alongside highly skilled cloud architects, DevSecOps engineers, cybersecurity professionals, and AI specialists supporting mission-critical federal initiatives. We embrace innovation, continuous learning, Agile delivery, and cutting-edge technologies that make a real-world impact.

What We Offer

  • Competitive salary
  • Comprehensive benefits package
  • 401(k)
  • Paid Time Off (PTO)
  • Paid Federal Holidays
  • Professional development and certification reimbursement
  • Career advancement opportunities
  • Collaborative, innovation-driven culture

Equal Opportunity Employer

Diverse Agile Solutions is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive workplace for all employees regardless of race, color, religion, sex, national origin, age, disability, veteran status, or any other protected characteristic.

BreezyHR Keywords (ATS Optimization)

AI Operations, AIOps, MLOps, Machine Learning Operations, Artificial Intelligence, Large Language Models, LLM, Generative AI, GenAI, Azure AI Foundry, Azure OpenAI, Amazon Bedrock, Vertex AI, Kubernetes, Docker, Terraform, AWS, Azure, GCP, GitLab CI/CD, Jenkins, MLflow, Kubeflow, LangChain, Hugging Face, TensorFlow, PyTorch, Python, Infrastructure as Code, DevSecOps, Site Reliability Engineering, AI Governance, RAG, Prompt Engineering, Model Monitoring, Model Deployment, AI Platform Engineer, Federal Government, GovCloud, Zero Trust, FedRAMP, NIST 800-53.