Submitting more applications increases your chances of landing a job.

Here’s how busy the average job seeker was last month:

Opportunities viewed

Applications submitted

Keep exploring and applying to maximize your chances!

Looking for employers with a proven track record of hiring women?

Click here to explore opportunities now!

We Value Your Feedback

You are invited to participate in a survey designed to help researchers understand how best to match workers to the types of jobs they are searching for

Would You Be Likely to Participate?

If selected, we will contact you via email with further instructions and details about your participation.

You will receive a $7 payout for answering the survey.

https://bayt.page.link/JC457jQv2BmTBoy96

Back to the job results

GPU Compute & MLIR Engineer

- Weekday AI
- India

2 hours ago 2026/10/30

Complete Questionnaire

Apply on company site

Remote

Other Business Support Services

Create a job alert for similar positions

Job alert turned off. You won’t receive updates for this search anymore.

Undo

Job description

This role is for one of Weekday’s clients Salary range: Rs 2000000 - Rs 10000000 (ie INR 20 - 100 LPA) Min Experience: 3+ years Location: India JobType: full-time We are looking for a highly skilled GPU Compute, MLIR Compiler, and Kernel Optimization Engineer with deep expertise in GPU compute, MLIR-based code generation, and end-to-end performance optimization for AI workloads.
In this role, you will design, optimize, and deploy high-performance GPU compute kernels, build and extend MLIR compiler backends, and collaborate closely with ML, runtime, and hardware teams to push the limits of performance on modern GPU architectures.
Key Responsibilities Develop and optimize GPU compute kernels targeting OpenCL and Vulkan compute backends for high-throughput AI/ML workloads.
Design, build, and extend MLIR dialects across multiple abstraction levels—including frontend dialects, graph-level IR, tensor IR (e.
g., Linalg, Tensor, TOSA), and runtime/low-level dialects—to enable efficient end-to-end model compilation.
Implement and maintain MLIR-based compiler passes and transformations, including tiling, fusion, bufferization, vectorization, and lowering pipelines targeting OpenCL and Vulkan GPU backends.
Conduct profiling and bottleneck analysis of compiled kernels using GPU counters and vendor-specific profilers, and drive performance improvements through compiler-level optimizations.
Build and maintain GPU runtime infrastructure for both OpenCL and Vulkan, including memory management, pipeline setup, command buffer orchestration, and resource scheduling.
Develop and extend code generation pipelines, enabling automatic lowering from tensor IR through MLIR to efficient OpenCL and Vulkan GPU kernels.
Implement performance-critical schedules—including tiling, loop fusion, parallelism, and caching strategies—within MLIR-based backends targeting OpenCL and Vulkan runtimes.
Collaborate with framework teams to optimize end-to-end model lowering for computer vision and LLM workloads using MLIR compilation stacks.
Design and implement robust compiler and runtime components using modern C/C++, leveraging advanced programming paradigms for high-performance systems.
Required Qualifications Strong hands-on experience with the MLIR framework, including authoring and extending custom dialects, writing compiler passes, and building end-to-end lowering pipelines.
Deep expertise across MLIR abstraction levels: Frontend dialects – ingestion and representation of ML models (e.
g., TOSA, StableHLO, ONNX-MLIR) Graph-level IR – high-level operation fusion, shape inference, and graph transformations Tensor IR level – structured operation representation using Linalg, Tensor, and Vector dialects; tiling and fusion strategies Runtime/low-level dialects – Bufferization, MemRef, SCF, GPU, and LLVM dialects for final code generation Strong hands-on experience in OpenCL programming, including kernel development, memory model, work-group/work-item optimization, and OpenCL runtime management.
Solid understanding of Vulkan compute programming, including descriptor management, compute pipelines, synchronization primitives, and Vulkan runtime internals.
Strong understanding of GPU architecture, memory hierarchies, and asynchronous compute.
Proficiency in C/C++ for system-level development.
Experience with kernel profiling and bottleneck analysis on GPU platforms.
Strong background in machine learning fundamentals, covering both Computer Vision (CV) and Large Language Model (LLM) workloads.
Must-have skills gpu computing, MLIR, C/C++ Good-to-have skills vulkan, Kernel, OpenCL

This job post has been translated by AI and may contain minor differences or errors.

Apply on company site Email to Friend Complete Questionnaire

Compare your profile with other applicants

Cancel

You’ve reached the maximum limit of 15 job alerts. To create a new alert, please delete an existing one first.

MANAGE

Job alert created for this search. You’ll receive updates when new jobs match.

Manage alerts

Are you sure you want to unapply?

You'll no longer be considered for this role and your application will be removed from the employer's inbox.