feat: Megatron LoRA GRPO w/ Weight Merging by vadam5 · Pull Request #1889 · NVIDIA-NeMo/RL

vadam5 · 2026-02-05T21:26:18Z

What does this PR do ?

Supports sync, async, and non-colocated LoRA GRPO via the megatron path with weight merging for rollouts. This PR merges lora adapter weights into model weights before exporting to VLLM for rollouts.

Issues

closes #1372
closes #1371
closes #833

Usage

uv run examples/run_grpo.py \
    --config examples/configs/recipes/llm/grpo-qwen3-8b-base-1n8g-megatron-lora.yaml \
    grpo.max_num_steps=20 \
    grpo.num_prompts_per_step=8 \
    policy.train_global_batch_size=128 \
    policy.generation.colocated.enabled=false \
    policy.generation.colocated.resources.gpus_per_node=4 \
    policy.generation.colocated.resources.num_nodes=1 \
    policy.generation.vllm_cfg.tensor_parallel_size=4 \
    cluster.gpus_per_node=8 \
    policy.megatron_cfg.tensor_model_parallel_size=4 \
    policy.generation.vllm_cfg.async_engine=true \
    grpo.async_grpo.enabled=true \
    loss_fn.use_importance_sampling_correction=true \
    logger.log_dir=results/grpo-async-qwen3-8b-base-1n8g-megatron-lora/logs \
    logger.wandb_enabled=True \
    logger.wandb.project=lora-rl \
    logger.wandb.name=grpo-async-qwen3-8b-base-1n8g-megatron-lora \
    logger.monitor_gpus=True \
    logger.tensorboard_enabled=False \
    checkpointing.enabled=True \
    checkpointing.checkpoint_dir=results/grpo-async-qwen3-8b-base-1n8g-megatron-lora

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

https://wandb.ai/nvidia/nemo-rl?nw=s1m0n39d4le

Summary by CodeRabbit

New Features
- Added GRPO training support with LoRA fine-tuning for Megatron models, including single-node and distributed configurations.
- Introduced example configurations for Qwen3 8B and other models using the new PEFT/LoRA framework.
Tests
- Added functional test coverage for GRPO with LoRA in synchronous, asynchronous, and non-colocated deployment scenarios.
Chores
- Updated dependencies including accelerate, transformers, transformer-engine, and new packages (fastapi, flash-linear-attention).
- Updated Megatron-LM and Megatron-Bridge submodule references.

coderabbitai · 2026-02-10T02:58:13Z

📝 Walkthrough

Walkthrough

This PR updates Megatron-LM and Megatron-Bridge submodule references to new repositories and commits, updates dependencies in setup.py files to support newer versions, introduces PEFT/LoRA configuration blocks for GRPO, adds a state dict remapping utility method to MegatronPolicyWorker, and introduces new functional test scripts for Megatron-LORA GRPO training experiments.

Changes

Cohort / File(s)	Summary
Submodule References `.gitmodules`, `3rdparty/Megatron-Bridge-workspace/Megatron-Bridge`, `3rdparty/Megatron-LM-workspace/Megatron-LM`	Updated Megatron-LM repository URL from terrykong to yaoyu-33 and branch from yuya/nemo-rl-use-dev to main; updated Megatron-Bridge commit pointer to a3fc5d5; updated Megatron-LM commit pointer to 11dcbaca.
Dependency Updates `3rdparty/Megatron-Bridge-workspace/setup.py`, `3rdparty/Megatron-LM-workspace/setup.py`	Added "accelerate" dependency; pinned transformers to "==4.57.1"; updated transformer-engine bounds from ">=2.9.0a0,<2.10.0" to ">=2.10.0a0,<2.12.0"; added "datasets"; relaxed numpy and nvidia-modelopt version constraints; added "fastapi~=0.50" and "flash-linear-attention~=0.3.2"; updated flashinfer-python to "~=0.5.0".
PEFT/LoRA Configuration `examples/configs/grpo_math_1B_megatron.yaml`, `examples/configs/grpo_math_1B_megatron_lora.yaml`, `examples/configs/recipes/llm/grpo-qwen3-8b-base-1n8g-megatron-lora.yaml`	Introduced PEFT configuration block under megatron_cfg with enabled flag, target/exclude modules, LoRA dimensions, alpha, dropout, initialization methods, and experimental flags; created comprehensive GRPO Megatron-LoRA config for 1B math model and Qwen3 8B recipe with detailed hyperparameters for loss, checkpointing, optimization, and distributed training.
Policy Worker Implementation `nemo_rl/models/policy/workers/megatron_policy_worker.py`	Added `_remap_reference_state_dict()` utility method for mapping LoRA-unwrapped state dict names; propagated ProcessGroupCollection usage in reference model loading paths.
Functional Test Scripts `tests/functional/grpo_megatron_lora.sh`, `tests/functional/grpo_megatron_lora_async.sh`, `tests/functional/grpo_megatron_lora_non_colocated.sh`	Added three new GPU functional test scripts orchestrating Megatron-based GRPO experiments with LoRA, covering synchronous, asynchronous, and non-colocated distributed training configurations; each script sets up directories, executes training via uv run, extracts metrics, and validates reward thresholds.
Test Infrastructure Updates `tests/L1_Functional_Tests_GPU.sh`, `tests/test_suites/llm/grpo-qwen3-8b-base-1n8g-megatron-lora.sh`, `tests/test_suites/nightly.txt`	Added three new functional test invocations to L1 GPU tests; created new Megatron LoRA GRPO test suite script; registered two nightly test entries under GRPO and SFT sections.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

chore: update megatron dev (11/21/2025) / mbridge (11/28/2025) #1568: Updates same Megatron-LM and Megatron-Bridge submodule pointers, indicating coordinated infrastructure changes.
cp: feat: add Megatron support for on-policy distillation (1324) into r0.4.0 #1398: Modifies MegatronPolicyWorker codepaths with Megatron-specific imports and methods, directly related to policy worker enhancements.
feat: Megatron SFT LoRA #1629: Introduces Megatron LoRA/PEFT support and modifies the same megatron_policy_worker.py integration points.

Suggested labels

CI:L1, CI, documentation

Suggested reviewers

yaoyu-33
terrykong
yuki-97

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR introduces major GRPO+LoRA+Megatron features with new test scripts but lacks documented test results, convergence metrics, or performance benchmarks in the PR description.	Include actual test results showing convergence metrics, training reward values, and weight merging validation across sync, async, and non-colocated scenarios.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch vadams/sync-lora-grpo-megatron

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

nemo_rl/models/policy/workers/megatron_policy_worker.py (1)
1565-1588: ⚠️ Potential issue | 🟠 Major

Manual state_dict swap will KeyError on LoRA-only keys.

When use_peft=True, self.model.state_dict() contains LoRA adapter keys (e.g., lora_A, lora_B) that don't exist in self.reference_state_dict. Line 1569 (self.reference_state_dict[k]) and line 1588 (model_state_dict[k]) will raise KeyError for these keys.

The commented-out load_state_dict with strict=True (lines 1565, 1585) had the same problem, which is presumably why it was replaced. But the manual approach needs to handle missing keys too.
Proposed fix
                 # Swap reference model state_dict to self.model
                 for k, v in self.model.state_dict().items():
                     if isinstance(v, torch.Tensor):
-                        v.copy_(self.reference_state_dict[k])
+                        if k in self.reference_state_dict:
+                            v.copy_(self.reference_state_dict[k])
-                for k, v in self.model.state_dict().items():
-                    if isinstance(v, torch.Tensor):
-                        v.copy_(model_state_dict[k])
+                for k, v in self.model.state_dict().items():
+                    if isinstance(v, torch.Tensor) and k in model_state_dict:
+                        v.copy_(model_state_dict[k])

🤖 Fix all issues with AI agents

In @.gitmodules:
- Around line 3-4: Update the submodule declaration that currently sets "url =
https://github.com/yaoyu-33/Megatron-LM.git" (with "branch = main") so it points
to the official upstream "https://github.com/NVIDIA/Megatron-LM.git"; if the
fork is intentionally required instead, replace the URL only after adding a
short justification in the repo docs (e.g., SECURITY.md or README) explaining
why the fork is needed and noting any maintained diffs, and include a maintainer
sign-off in the justification so reviewers can accept the deviation.

In `@3rdparty/Megatron-LM-workspace/Megatron-LM`:
- Line 1: The submodule 3rdparty/Megatron-LM-workspace/Megatron-LM points at a
non-existent commit (11dcbaca317133cc5c77c8bc4f54ed71d3b5d656); update the
submodule to a valid commit/branch on the upstream Megatron-LM remote by
entering the submodule (cd 3rdparty/Megatron-LM-workspace/Megatron-LM), running
git fetch origin, checking out a known-good commit or branch (e.g., origin/main
or a specific existing SHA), then git add the submodule change in the
superproject, commit the update, and push the branch so the PR references a
valid submodule commit.

In `@examples/configs/grpo_math_1B_megatron_lora.yaml`:
- Line 114: Replace the YAML value that sets lora_dtype so it yields a true null
rather than the string "None": change the mapping key/value where lora_dtype is
defined (currently `lora_dtype: None`) to use YAML null (`lora_dtype: null` or
`lora_dtype: ~`) so that downstream code constructing LoRA (e.g.,
LoRA(lora_dtype=...)) receives a null/None value instead of the string "None".

In `@examples/configs/grpo_math_1B_megatron.yaml`:
- Around line 100-111: The base Megatron config currently enables LoRA by
default and sets lora_dtype to the literal string "None"; change peft.enabled to
false so downstream non‑LoRA runs (e.g., grpo_megatron.sh) don't inadvertently
enable LoRA, and have LoRA-specific configs or the grpo_megatron_lora.sh
override set peft.enabled=true when needed; also replace lora_dtype: None with a
YAML null (e.g., lora_dtype: null or lora_dtype: ~) so it parses as null rather
than the string "None".

In `@nemo_rl/models/policy/workers/megatron_policy_worker.py`:
- Around line 925-933: The current check uses "if ref_megatron_cfg is not None"
which is always true because ref_megatron_cfg is always created; change the
guard to verify PEFT is enabled (e.g. if self.use_peft and ref_megatron_cfg is
not None) before creating and registering the PEFT pre-wrap hook via
_create_peft_pre_wrap_hook(ref_megatron_cfg, ref_state), calling
ref_megatron_cfg.model.register_pre_wrap_hook(pre_peft_hook), composing
composed_peft_hook, and extending ref_pre_wrap_hooks so LoRA wrapping only
applies when self.use_peft is true.
- Around line 946-960: When self.use_peft is true the current
should_load_checkpoint only checks ref_megatron_cfg.checkpoint.load and ignores
ref_megatron_cfg.checkpoint.pretrained_checkpoint; update the PEFT branch in
megatron_policy_worker.py so should_load_checkpoint mirrors the non-PEFT logic
by checking both ref_megatron_cfg.checkpoint.load and
ref_megatron_cfg.checkpoint.pretrained_checkpoint with checkpoint_exists, and
preserve the existing ref_megatron_cfg.checkpoint.finetune toggling behavior
(still set finetune=False when loading a checkpoint) so the reference model
loads pretrained weights in PEFT scenarios.

🧹 Nitpick comments (2)

nemo_rl/models/policy/workers/megatron_policy_worker.py (2)

1571-1573: Commented-out code without explanation.

Per coding guidelines, commented-out code should include a comment describing why it is retained, or be removed before merging. Lines 1571-1573 and 1565 have commented-out load_state_dict calls with no rationale.

904-943: Duplicate LoRA construction — extract a helper.

The LoRA instantiation block (lines 906-919) is nearly identical to lines 308-320 in setup_megatron_model. Consider extracting a shared helper to avoid copy-paste divergence.

.gitmodules

3rdparty/Megatron-LM-workspace/Megatron-LM

examples/configs/grpo_math_1B_megatron_lora.yaml

examples/configs/grpo_math_1B_megatron.yaml

nemo_rl/models/policy/workers/megatron_policy_worker.py

github-actions · 2026-02-11T00:50:59Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 4c0e216 (PR #1889 from vadams/sync-lora-grpo-megatron)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Megatron-LM: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/yaoyu-33/Megatron-LM/commits/b73ae5cdab9d409fcface2b2f3c375710abe6911/
CURRENT (PR #1889 from vadams/sync-lora-grpo-megatron): https://github.com/yaoyu-33/Megatron-LM/commits/11dcbaca317133cc5c77c8bc4f54ed71d3b5d656/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions · 2026-02-11T01:05:38Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 2b335f7 (PR #1889 from vadams/sync-lora-grpo-megatron)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Megatron-LM: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/yaoyu-33/Megatron-LM/commits/b73ae5cdab9d409fcface2b2f3c375710abe6911/
CURRENT (PR #1889 from vadams/sync-lora-grpo-megatron): https://github.com/yaoyu-33/Megatron-LM/commits/11dcbaca317133cc5c77c8bc4f54ed71d3b5d656/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

terrykong

great work @vadam5 !

could you also include lp error and gen kl error in the plots?

@cuichenx @yaoyu-33 to review the megatron worker part

tests/functional/L1_Functional_Tests_GPU.sh

tests/functional/grpo_megatron_lora.sh

examples/configs/grpo_math_1B_megatron_lora.yaml

examples/configs/recipes/llm/grpo-nanov3-30BA3B-1n8g-megatron-lora.yaml

nemo_rl/models/megatron/setup.py

tests/functional/grpo_megatron_lora_non_colocated.sh

tests/functional/grpo_megatron_lora_async.sh

tests/functional/grpo_megatron_lora.sh

nemo_rl/models/megatron/setup.py

terrykong

another round of review

nemo_rl/models/megatron/setup.py

nemo_rl/models/policy/workers/megatron_policy_worker.py

terrykong · 2026-02-19T06:14:25Z

also, DCO and lint need to be resolved before final merge

vadam5 · 2026-02-20T01:07:06Z

Fixed DCO and ran the linter. Some files I didn't touch for this PR also had small lint fixes.

terrykong

@yaoyu-33 @ananthsub can you take a pass? some of the megatron changes could use your expertise

nemo_rl/models/megatron/setup.py

terrykong · 2026-02-25T21:25:19Z

hey @vadam5 . @yaoyu-33 raised some good points offline. give me some time to review some of the api changes, to confirm with these changes. i'll circle back today or tomorrow on this

vadam5 · 2026-02-27T22:00:17Z

@terrykong Does anything else need to be done here?

terrykong · 2026-02-27T22:08:38Z

@vadam5 this PR is all good. i just merged in another PR to create another CI level to help speed up the evaluation of PRs, i'll help resolve this and kick off those tests

Signed-off-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Virginia Wu <vadams@nvidia.com> Signed-off-by: Virginia Wu <78445382+vadam5@users.noreply.github.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: root <root@pool0-00689.cm.cluster> Co-authored-by: Terry Kong <terryk@nvidia.com>

GRPO LoRA for Megatron Core has landed (#1889), so remove the "coming soon" note and reword the LoRA news bullets for consistency. Signed-off-by: Terry Kong <terryk@nvidia.com>

vadam5 changed the title ~~feat: Megatron Sync LoRA GRPO w/ Weight Merging~~ feat: Megatron LoRA GRPO w/ Weight Merging Feb 10, 2026

vadam5 marked this pull request as ready for review February 10, 2026 02:51

vadam5 requested review from a team as code owners February 10, 2026 02:51

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

vadam5 force-pushed the vadams/sync-lora-grpo-megatron branch from 7c9e021 to 4c0e216 Compare February 11, 2026 00:50

vadam5 force-pushed the vadams/sync-lora-grpo-megatron branch from 2a926f7 to 8168855 Compare February 17, 2026 20:13

vadam5 requested review from RayenTian and terrykong February 18, 2026 07:26

terrykong reviewed Feb 19, 2026

View reviewed changes

terrykong requested review from cuichenx and yaoyu-33 February 19, 2026 00:09

cuichenx reviewed Feb 19, 2026

View reviewed changes

nemo_rl/models/megatron/setup.py Show resolved Hide resolved

nemo_rl/models/megatron/setup.py Outdated Show resolved Hide resolved

terrykong reviewed Feb 19, 2026

View reviewed changes

vadam5 force-pushed the vadams/sync-lora-grpo-megatron branch from 3bb86a6 to c799cdf Compare February 20, 2026 00:34

vadam5 requested a review from a team as a code owner February 20, 2026 00:52

vadam5 force-pushed the vadams/sync-lora-grpo-megatron branch from 9c7e4e2 to d5fa658 Compare February 20, 2026 00:55

vadam5 requested review from cuichenx and terrykong February 20, 2026 01:07

terrykong reviewed Feb 22, 2026

View reviewed changes

yaoyu-33 reviewed Feb 23, 2026

View reviewed changes

nemo_rl/models/megatron/setup.py Outdated Show resolved Hide resolved

terrykong previously approved these changes Feb 27, 2026

View reviewed changes

terrykong enabled auto-merge (squash) February 27, 2026 08:04

terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Feb 27, 2026

terrykong temporarily deployed to nemo-ci February 27, 2026 08:05 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci February 27, 2026 08:08 — with GitHub Actions Inactive

terrykong force-pushed the vadams/sync-lora-grpo-megatron branch from abab66f to 34726e4 Compare February 27, 2026 22:20

terrykong dismissed their stale review via 0ce335f February 27, 2026 22:23

terrykong force-pushed the vadams/sync-lora-grpo-megatron branch from 34726e4 to 0ce335f Compare February 27, 2026 22:23

terrykong added CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 27, 2026

terrykong temporarily deployed to nemo-ci February 27, 2026 22:23 — with GitHub Actions Inactive

terrykong previously approved these changes Feb 27, 2026

View reviewed changes

feat: Megatron GRPO Lora

83b2d9a

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong dismissed their stale review via 83b2d9a March 2, 2026 18:21

terrykong force-pushed the vadams/sync-lora-grpo-megatron branch from 0ce335f to 83b2d9a Compare March 2, 2026 18:21

missing

cd1d23b

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong added CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) and removed CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) labels Mar 2, 2026

terrykong approved these changes Mar 2, 2026

View reviewed changes

terrykong temporarily deployed to nemo-ci March 2, 2026 18:39 — with GitHub Actions Inactive

terrykong merged commit a426896 into main Mar 2, 2026
43 of 46 checks passed

terrykong deleted the vadams/sync-lora-grpo-megatron branch March 2, 2026 19:27

chtruong814 mentioned this pull request Mar 4, 2026

ci: Temp disable megatron lora grpo tests #2062

Open

4 tasks

terrykong mentioned this pull request Mar 30, 2026

docs: update README LoRA news for Megatron Core GRPO LoRA #2166

Closed

1 task

Conversation

vadam5 commented Feb 5, 2026 • edited by terrykong Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 11, 2026

❌ Submodule Fast-Forward Check Failed

✅ Submodules that are properly updated:

❌ Submodules that need attention:

Uh oh!

github-actions bot commented Feb 11, 2026

❌ Submodule Fast-Forward Check Failed

✅ Submodules that are properly updated:

❌ Submodules that need attention:

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

terrykong commented Feb 19, 2026

Uh oh!

vadam5 commented Feb 20, 2026

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

terrykong commented Feb 25, 2026

Uh oh!

vadam5 commented Feb 27, 2026

Uh oh!

terrykong commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

vadam5 commented Feb 5, 2026 •

edited by terrykong

Loading

coderabbitai bot commented Feb 10, 2026 •

edited

Loading

terrykong commented Feb 27, 2026 •

edited

Loading