Senior DevOps/MLOps Engineer – Digital Health job at Irembo
Website :
17 Days Ago
Linkedid Twitter Share on facebook
Senior DevOps/MLOps Engineer – Digital Health
2026-01-03T11:08:45+00:00
Irembo
https://cdn.greatrwandajobs.com/jsjobsdata/data/default_logo_company/defaultlogo.png
FULL_TIME
 
Kigali
Kigali
00000
Rwanda
Information Technology
Computer & IT, Science & Engineering, Healthcare
RWF
 
MONTH
2026-01-09T17:00:00+00:00
 
 
8

Background information about the job or company (e.g., role context, company overview)

Irembo is a technology company that designs and develops digital products to ease the accessibility of services in users’ everyday lives worldwide, starting with Rwanda. Our pioneer products, IremboGov and IremboPay, have enabled Rwandan citizens and foreigners to access and pay for over 150 public services online through our one-stop-shop e-governance and payment platforms. To date, we have facilitated over 30 million transactions through our platforms and have ambitious goals to scale our technology worldwide to enable more governments and institutions to serve their citizens better. More information is available on irembo.com.

Irembo is transforming the healthcare landscape by launching a national-scale telemedicine platform, building on our success in service management (IremboGov) and payment solutions (IremboPay). This project directly leverages the seven-year legacy of Babyl Rwanda, which pioneered telemedicine and delivered over 3.5 million consultations.

We are seeking a highly specialized DevOps/MLOps Engineer to design, implement, and manage the critical, resilient on-premise infrastructure for the Irembo TeleClinic platform. Your primary mission is to ensure high availability, security, and performance for a hybrid workload, specifically supporting cutting-edge AI workloads for improved diagnosis and personalized care, while handling massive user traffic across multiple channels (Web, Mobile, USSD/Voice legacy).

You will build the robust infrastructure needed to securely run high-impact digital health services within a national data environment. You will sit at the intersection of Telecommunications, High-Performance Computing, and Healthcare Compliance.

Responsibilities or duties

Core Infrastructure Management and Operations:

  • Infrastructure Management (On-Premise): Co-design, Set up, maintain, and upgrade on-premise infrastructure (compute, storage, network) to reliably support Digital Health traffic and ensure high availability for clinical services.
  • CI/CD Automation: Design, implement, and maintain robust, automated CI/CD pipelines for microservices (backend) and mobile/web applications, ensuring rapid, safe, and reliable feature deployment.
  • Edge/API Layer Optimization: Engineer the API layer for environments with unstable networks (3G/4G). This includes implementing high-efficiency binary protocols (e.g., gRPC/Protobuf) and aggressive edge caching strategies to minimize bandwidth consumption for citizens, moving beyond standard REST/JSON architectures.
  • Immutable Audit and Compliance Logging: Establish a centralized, tamper-proof logging architecture that correlates all infrastructure events with AI decisions, ensuring full traceability for medical audits and regulatory compliance.

Machine Learning Operations:

  • Provision, configure, and manage dedicated on-premise GPU clusters optimized for low-latency AI model serving, real-time triage, and advanced diagnostic engines.
  • Collaborate with the data team to design and secure efficient data pipelines that feed high-quality, clinical data to the training and inference environments.
  • Manage MLOps tools (e.g., MLflow, Kubeflow, KServe) or comparable alternatives to streamline the lifecycle of AI models, including tracking, versioning, testing, and serving models in high-volume production environments.
  • Continually optimize AI serving infrastructure for cost, latency, and throughput, essential for improving diagnosis and personalizing care at a national scale.

Security and Compliance:

  • Implement stringent security controls and compliance checks, adhering to national health data regulations and international best practices for data protection and security audits.
  • Establish and regularly test comprehensive disaster recovery and backup strategies for all patient data and core service components to ensure business continuity for critical healthcare services.

Qualifications or requirements (e.g., education, skills)

Required Skills & Experience

  • 4+ Years of Experience in a dedicated DevOps, SRE, or Platform Engineering role.
  • Low-Latency Protocol Mastery (gRPC/Protobuf): Deep experience in the design, setup, maintenance, and troubleshooting of gRPC and Protobuf for mobile and web-application communication. A critical understanding of HTTP/2 multi- and demultiplexing, and strategies for minimizing bandwidth usage on unstable 3G/4G networks, is required.
  • Experience with the telecom protocols and technologies (SMPP, USSD Gateways, and SIP) for delivering services via USSD and Voice/IVR channels.
  • MLOps Production Deployment: Deep-Dive Experience with MLOps toolchains (e.g., MLflow, Kubeflow, KServe) or comparable alternatives for successfully deploying and serving machine learning models in high-volume production environments.
  • Mandatory experience with on-premises infrastructure management.
  • Deep AI Observability: Proven ability to design and operate full-stack monitoring solutions from scratch, moving beyond simple “uptime checks” to complex SLO/SLI (Service Level Objective) definitions for AI workloads.
  • RAG System Scaling: Proficiency in scaling Vector Databases and building robust data ingestion pipelines for Retrieval-Augmented Generation (RAG) systems.
  • Deep Linux System Mastery: Proven ability in Linux kernel tuning, networking stack optimization, and storage performance management.
  • Container and Orchestration Proficiency: Strong, mandatory knowledge of Kubernetes and Docker.

Preferred Skills

  • Experience with setting up and optimizing GPU clusters for inference workloads.
  • Security-First Monitoring: Experience implementing “Privacy-Preserving Telemetry”- Ensuring that logs and traces never accidentally capture PII or PHI.
  • Certifications: Relevant certifications (e.g., CKS, NCA-AIIO, HCISPP) are a strong plus.

Any other provided details (e.g., benefits, work environment, team info, or additional notes)

This is a unique opportunity to apply cutting-edge DevOps and MLOps practices to a project with a profound social impact. You will not only manage the infrastructure but also be a critical force in expanding access and convenience for efficient, high-quality digital health services for every Rwandan. You will directly build the resilient foundation for running cutting-edge AI services securely within a national data environment.

Please note that the salary for this position is commensurate with experience and qualifications and will be discussed during the interview process.

  • Infrastructure Management (On-Premise): Co-design, Set up, maintain, and upgrade on-premise infrastructure (compute, storage, network) to reliably support Digital Health traffic and ensure high availability for clinical services.
  • CI/CD Automation: Design, implement, and maintain robust, automated CI/CD pipelines for microservices (backend) and mobile/web applications, ensuring rapid, safe, and reliable feature deployment.
  • Edge/API Layer Optimization: Engineer the API layer for environments with unstable networks (3G/4G). This includes implementing high-efficiency binary protocols (e.g., gRPC/Protobuf) and aggressive edge caching strategies to minimize bandwidth consumption for citizens, moving beyond standard REST/JSON architectures.
  • Immutable Audit and Compliance Logging: Establish a centralized, tamper-proof logging architecture that correlates all infrastructure events with AI decisions, ensuring full traceability for medical audits and regulatory compliance.
  • Provision, configure, and manage dedicated on-premise GPU clusters optimized for low-latency AI model serving, real-time triage, and advanced diagnostic engines.
  • Collaborate with the data team to design and secure efficient data pipelines that feed high-quality, clinical data to the training and inference environments.
  • Manage MLOps tools (e.g., MLflow, Kubeflow, KServe) or comparable alternatives to streamline the lifecycle of AI models, including tracking, versioning, testing, and serving models in high-volume production environments.
  • Continually optimize AI serving infrastructure for cost, latency, and throughput, essential for improving diagnosis and personalizing care at a national scale.
  • Implement stringent security controls and compliance checks, adhering to national health data regulations and international best practices for data protection and security audits.
  • Establish and regularly test comprehensive disaster recovery and backup strategies for all patient data and core service components to ensure business continuity for critical healthcare services.
  • 4+ Years of Experience in a dedicated DevOps, SRE, or Platform Engineering role.
  • Low-Latency Protocol Mastery (gRPC/Protobuf): Deep experience in the design, setup, maintenance, and troubleshooting of gRPC and Protobuf for mobile and web-application communication. A critical understanding of HTTP/2 multi- and demultiplexing, and strategies for minimizing bandwidth usage on unstable 3G/4G networks, is required.
  • Experience with the telecom protocols and technologies (SMPP, USSD Gateways, and SIP) for delivering services via USSD and Voice/IVR channels.
  • MLOps Production Deployment: Deep-Dive Experience with MLOps toolchains (e.g., MLflow, Kubeflow, KServe) or comparable alternatives for successfully deploying and serving machine learning models in high-volume production environments.
  • Mandatory experience with on-premises infrastructure management.
  • Deep AI Observability: Proven ability to design and operate full-stack monitoring solutions from scratch, moving beyond simple “uptime checks” to complex SLO/SLI (Service Level Objective) definitions for AI workloads.
  • RAG System Scaling: Proficiency in scaling Vector Databases and building robust data ingestion pipelines for Retrieval-Augmented Generation (RAG) systems.
  • Deep Linux System Mastery: Proven ability in Linux kernel tuning, networking stack optimization, and storage performance management.
  • Container and Orchestration Proficiency: Strong, mandatory knowledge of Kubernetes and Docker.
  • Experience with setting up and optimizing GPU clusters for inference workloads.
  • Security-First Monitoring: Experience implementing “Privacy-Preserving Telemetry”- Ensuring that logs and traces never accidentally capture PII or PHI.
  • 4+ Years of Experience in a dedicated DevOps, SRE, or Platform Engineering role.
  • Low-Latency Protocol Mastery (gRPC/Protobuf): Deep experience in the design, setup, maintenance, and troubleshooting of gRPC and Protobuf for mobile and web-application communication. A critical understanding of HTTP/2 multi- and demultiplexing, and strategies for minimizing bandwidth usage on unstable 3G/4G networks, is required.
  • Experience with the telecom protocols and technologies (SMPP, USSD Gateways, and SIP) for delivering services via USSD and Voice/IVR channels.
  • MLOps Production Deployment: Deep-Dive Experience with MLOps toolchains (e.g., MLflow, Kubeflow, KServe) or comparable alternatives for successfully deploying and serving machine learning models in high-volume production environments.
  • Mandatory experience with on-premises infrastructure management.
  • Deep AI Observability: Proven ability to design and operate full-stack monitoring solutions from scratch, moving beyond simple “uptime checks” to complex SLO/SLI (Service Level Objective) definitions for AI workloads.
  • RAG System Scaling: Proficiency in scaling Vector Databases and building robust data ingestion pipelines for Retrieval-Augmented Generation (RAG) systems.
  • Deep Linux System Mastery: Proven ability in Linux kernel tuning, networking stack optimization, and storage performance management.
  • Container and Orchestration Proficiency: Strong, mandatory knowledge of Kubernetes and Docker.
  • Experience with setting up and optimizing GPU clusters for inference workloads.
  • Security-First Monitoring: Experience implementing “Privacy-Preserving Telemetry”- Ensuring that logs and traces never accidentally capture PII or PHI.
  • Certifications: Relevant certifications (e.g., CKS, NCA-AIIO, HCISPP) are a strong plus.
bachelor degree
48
JOB-6958f8bd9ba4a

Vacancy title:
Senior DevOps/MLOps Engineer – Digital Health

[Type: FULL_TIME, Industry: Information Technology, Category: Computer & IT, Science & Engineering, Healthcare]

Jobs at:
Irembo

Deadline of this Job:
Friday, January 9 2026

Duty Station:
Kigali | Kigali

Summary
Date Posted: Saturday, January 3 2026, Base Salary: Not Disclosed

Similar Jobs in Rwanda
Learn more about Irembo
Irembo jobs in Rwanda

JOB DETAILS:

Background information about the job or company (e.g., role context, company overview)

Irembo is a technology company that designs and develops digital products to ease the accessibility of services in users’ everyday lives worldwide, starting with Rwanda. Our pioneer products, IremboGov and IremboPay, have enabled Rwandan citizens and foreigners to access and pay for over 150 public services online through our one-stop-shop e-governance and payment platforms. To date, we have facilitated over 30 million transactions through our platforms and have ambitious goals to scale our technology worldwide to enable more governments and institutions to serve their citizens better. More information is available on irembo.com.

Irembo is transforming the healthcare landscape by launching a national-scale telemedicine platform, building on our success in service management (IremboGov) and payment solutions (IremboPay). This project directly leverages the seven-year legacy of Babyl Rwanda, which pioneered telemedicine and delivered over 3.5 million consultations.

We are seeking a highly specialized DevOps/MLOps Engineer to design, implement, and manage the critical, resilient on-premise infrastructure for the Irembo TeleClinic platform. Your primary mission is to ensure high availability, security, and performance for a hybrid workload, specifically supporting cutting-edge AI workloads for improved diagnosis and personalized care, while handling massive user traffic across multiple channels (Web, Mobile, USSD/Voice legacy).

You will build the robust infrastructure needed to securely run high-impact digital health services within a national data environment. You will sit at the intersection of Telecommunications, High-Performance Computing, and Healthcare Compliance.

Responsibilities or duties

Core Infrastructure Management and Operations:

  • Infrastructure Management (On-Premise): Co-design, Set up, maintain, and upgrade on-premise infrastructure (compute, storage, network) to reliably support Digital Health traffic and ensure high availability for clinical services.
  • CI/CD Automation: Design, implement, and maintain robust, automated CI/CD pipelines for microservices (backend) and mobile/web applications, ensuring rapid, safe, and reliable feature deployment.
  • Edge/API Layer Optimization: Engineer the API layer for environments with unstable networks (3G/4G). This includes implementing high-efficiency binary protocols (e.g., gRPC/Protobuf) and aggressive edge caching strategies to minimize bandwidth consumption for citizens, moving beyond standard REST/JSON architectures.
  • Immutable Audit and Compliance Logging: Establish a centralized, tamper-proof logging architecture that correlates all infrastructure events with AI decisions, ensuring full traceability for medical audits and regulatory compliance.

Machine Learning Operations:

  • Provision, configure, and manage dedicated on-premise GPU clusters optimized for low-latency AI model serving, real-time triage, and advanced diagnostic engines.
  • Collaborate with the data team to design and secure efficient data pipelines that feed high-quality, clinical data to the training and inference environments.
  • Manage MLOps tools (e.g., MLflow, Kubeflow, KServe) or comparable alternatives to streamline the lifecycle of AI models, including tracking, versioning, testing, and serving models in high-volume production environments.
  • Continually optimize AI serving infrastructure for cost, latency, and throughput, essential for improving diagnosis and personalizing care at a national scale.

Security and Compliance:

  • Implement stringent security controls and compliance checks, adhering to national health data regulations and international best practices for data protection and security audits.
  • Establish and regularly test comprehensive disaster recovery and backup strategies for all patient data and core service components to ensure business continuity for critical healthcare services.

Qualifications or requirements (e.g., education, skills)

Required Skills & Experience

  • 4+ Years of Experience in a dedicated DevOps, SRE, or Platform Engineering role.
  • Low-Latency Protocol Mastery (gRPC/Protobuf): Deep experience in the design, setup, maintenance, and troubleshooting of gRPC and Protobuf for mobile and web-application communication. A critical understanding of HTTP/2 multi- and demultiplexing, and strategies for minimizing bandwidth usage on unstable 3G/4G networks, is required.
  • Experience with the telecom protocols and technologies (SMPP, USSD Gateways, and SIP) for delivering services via USSD and Voice/IVR channels.
  • MLOps Production Deployment: Deep-Dive Experience with MLOps toolchains (e.g., MLflow, Kubeflow, KServe) or comparable alternatives for successfully deploying and serving machine learning models in high-volume production environments.
  • Mandatory experience with on-premises infrastructure management.
  • Deep AI Observability: Proven ability to design and operate full-stack monitoring solutions from scratch, moving beyond simple “uptime checks” to complex SLO/SLI (Service Level Objective) definitions for AI workloads.
  • RAG System Scaling: Proficiency in scaling Vector Databases and building robust data ingestion pipelines for Retrieval-Augmented Generation (RAG) systems.
  • Deep Linux System Mastery: Proven ability in Linux kernel tuning, networking stack optimization, and storage performance management.
  • Container and Orchestration Proficiency: Strong, mandatory knowledge of Kubernetes and Docker.

Preferred Skills

  • Experience with setting up and optimizing GPU clusters for inference workloads.
  • Security-First Monitoring: Experience implementing “Privacy-Preserving Telemetry”- Ensuring that logs and traces never accidentally capture PII or PHI.
  • Certifications: Relevant certifications (e.g., CKS, NCA-AIIO, HCISPP) are a strong plus.

Any other provided details (e.g., benefits, work environment, team info, or additional notes)

This is a unique opportunity to apply cutting-edge DevOps and MLOps practices to a project with a profound social impact. You will not only manage the infrastructure but also be a critical force in expanding access and convenience for efficient, high-quality digital health services for every Rwandan. You will directly build the resilient foundation for running cutting-edge AI services securely within a national data environment.

Please note that the salary for this position is commensurate with experience and qualifications and will be discussed during the interview process.

 

Work Hours: 8

Experience in Months: 48

Level of Education: bachelor degree

Job application procedure

Application Deadline

January 9, 2026

Application Link: Click Here to Apply Now

 

All Jobs | QUICK ALERT SUBSCRIPTION

Job Info
Job Category: Engineering jobs in Rwanda
Job Type: Full-time
Deadline of this Job: Friday, January 9 2026
Duty Station: Kigali | Kigali
Posted: 03-01-2026
No of Jobs: 1
Start Publishing: 03-01-2026
Stop Publishing (Put date of 2030): 10-10-2076
Apply Now
Notification Board

Join a Focused Community on job search to uncover both advertised and non-advertised jobs that you may not be aware of. A jobs WhatsApp Group Community can ensure that you know the opportunities happening around you and a jobs Facebook Group Community provides an opportunity to discuss with employers who need to fill urgent position. Click the links to join. You can view previously sent Email Alerts here incase you missed them and Subscribe so that you never miss out.

Caution: Never Pay Money in a Recruitment Process.

Some smart scams can trick you into paying for Psychometric Tests.