Skip to content

feat: Megatron LoRA GRPO w/ Weight Merging#1889

Merged
terrykong merged 2 commits intomainfrom
vadams/sync-lora-grpo-megatron
Mar 2, 2026
Merged

feat: Megatron LoRA GRPO w/ Weight Merging#1889
terrykong merged 2 commits intomainfrom
vadams/sync-lora-grpo-megatron

Conversation

@vadam5
Copy link
Copy Markdown
Contributor

@vadam5 vadam5 commented Feb 5, 2026

What does this PR do ?

Supports sync, async, and non-colocated LoRA GRPO via the megatron path with weight merging for rollouts. This PR merges lora adapter weights into model weights before exporting to VLLM for rollouts.

Issues

closes #1372
closes #1371
closes #833

Usage

uv run examples/run_grpo.py \
    --config examples/configs/recipes/llm/grpo-qwen3-8b-base-1n8g-megatron-lora.yaml \
    grpo.max_num_steps=20 \
    grpo.num_prompts_per_step=8 \
    policy.train_global_batch_size=128 \
    policy.generation.colocated.enabled=false \
    policy.generation.colocated.resources.gpus_per_node=4 \
    policy.generation.colocated.resources.num_nodes=1 \
    policy.generation.vllm_cfg.tensor_parallel_size=4 \
    cluster.gpus_per_node=8 \
    policy.megatron_cfg.tensor_model_parallel_size=4 \
    policy.generation.vllm_cfg.async_engine=true \
    grpo.async_grpo.enabled=true \
    loss_fn.use_importance_sampling_correction=true \
    logger.log_dir=results/grpo-async-qwen3-8b-base-1n8g-megatron-lora/logs \
    logger.wandb_enabled=True \
    logger.wandb.project=lora-rl \
    logger.wandb.name=grpo-async-qwen3-8b-base-1n8g-megatron-lora \
    logger.monitor_gpus=True \
    logger.tensorboard_enabled=False \
    checkpointing.enabled=True \
    checkpointing.checkpoint_dir=results/grpo-async-qwen3-8b-base-1n8g-megatron-lora

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

https://wandb.ai/nvidia/nemo-rl?nw=s1m0n39d4le

image image image image Screenshot 2026-02-17 at 11 14 17 PM image image image

Summary by CodeRabbit

  • New Features

    • Added GRPO training support with LoRA fine-tuning for Megatron models, including single-node and distributed configurations.
    • Introduced example configurations for Qwen3 8B and other models using the new PEFT/LoRA framework.
  • Tests

    • Added functional test coverage for GRPO with LoRA in synchronous, asynchronous, and non-colocated deployment scenarios.
  • Chores

    • Updated dependencies including accelerate, transformers, transformer-engine, and new packages (fastapi, flash-linear-attention).
    • Updated Megatron-LM and Megatron-Bridge submodule references.

@vadam5 vadam5 changed the title feat: Megatron Sync LoRA GRPO w/ Weight Merging feat: Megatron LoRA GRPO w/ Weight Merging Feb 10, 2026
@vadam5 vadam5 marked this pull request as ready for review February 10, 2026 02:51
@vadam5 vadam5 requested review from a team as code owners February 10, 2026 02:51
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 10, 2026

📝 Walkthrough

Walkthrough

This PR updates Megatron-LM and Megatron-Bridge submodule references to new repositories and commits, updates dependencies in setup.py files to support newer versions, introduces PEFT/LoRA configuration blocks for GRPO, adds a state dict remapping utility method to MegatronPolicyWorker, and introduces new functional test scripts for Megatron-LORA GRPO training experiments.

Changes

Cohort / File(s) Summary
Submodule References
.gitmodules, 3rdparty/Megatron-Bridge-workspace/Megatron-Bridge, 3rdparty/Megatron-LM-workspace/Megatron-LM
Updated Megatron-LM repository URL from terrykong to yaoyu-33 and branch from yuya/nemo-rl-use-dev to main; updated Megatron-Bridge commit pointer to a3fc5d5; updated Megatron-LM commit pointer to 11dcbaca.
Dependency Updates
3rdparty/Megatron-Bridge-workspace/setup.py, 3rdparty/Megatron-LM-workspace/setup.py
Added "accelerate" dependency; pinned transformers to "==4.57.1"; updated transformer-engine bounds from ">=2.9.0a0,<2.10.0" to ">=2.10.0a0,<2.12.0"; added "datasets"; relaxed numpy and nvidia-modelopt version constraints; added "fastapi~=0.50" and "flash-linear-attention~=0.3.2"; updated flashinfer-python to "~=0.5.0".
PEFT/LoRA Configuration
examples/configs/grpo_math_1B_megatron.yaml, examples/configs/grpo_math_1B_megatron_lora.yaml, examples/configs/recipes/llm/grpo-qwen3-8b-base-1n8g-megatron-lora.yaml
Introduced PEFT configuration block under megatron_cfg with enabled flag, target/exclude modules, LoRA dimensions, alpha, dropout, initialization methods, and experimental flags; created comprehensive GRPO Megatron-LoRA config for 1B math model and Qwen3 8B recipe with detailed hyperparameters for loss, checkpointing, optimization, and distributed training.
Policy Worker Implementation
nemo_rl/models/policy/workers/megatron_policy_worker.py
Added _remap_reference_state_dict() utility method for mapping LoRA-unwrapped state dict names; propagated ProcessGroupCollection usage in reference model loading paths.
Functional Test Scripts
tests/functional/grpo_megatron_lora.sh, tests/functional/grpo_megatron_lora_async.sh, tests/functional/grpo_megatron_lora_non_colocated.sh
Added three new GPU functional test scripts orchestrating Megatron-based GRPO experiments with LoRA, covering synchronous, asynchronous, and non-colocated distributed training configurations; each script sets up directories, executes training via uv run, extracts metrics, and validates reward thresholds.
Test Infrastructure Updates
tests/L1_Functional_Tests_GPU.sh, tests/test_suites/llm/grpo-qwen3-8b-base-1n8g-megatron-lora.sh, tests/test_suites/nightly.txt
Added three new functional test invocations to L1 GPU tests; created new Megatron LoRA GRPO test suite script; registered two nightly test entries under GRPO and SFT sections.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

CI:L1, CI, documentation

Suggested reviewers

  • yaoyu-33
  • terrykong
  • yuki-97
🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR introduces major GRPO+LoRA+Megatron features with new test scripts but lacks documented test results, convergence metrics, or performance benchmarks in the PR description. Include actual test results showing convergence metrics, training reward values, and weight merging validation across sync, async, and non-colocated scenarios.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch vadams/sync-lora-grpo-megatron

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nemo_rl/models/policy/workers/megatron_policy_worker.py (1)

1565-1588: ⚠️ Potential issue | 🟠 Major

Manual state_dict swap will KeyError on LoRA-only keys.

When use_peft=True, self.model.state_dict() contains LoRA adapter keys (e.g., lora_A, lora_B) that don't exist in self.reference_state_dict. Line 1569 (self.reference_state_dict[k]) and line 1588 (model_state_dict[k]) will raise KeyError for these keys.

The commented-out load_state_dict with strict=True (lines 1565, 1585) had the same problem, which is presumably why it was replaced. But the manual approach needs to handle missing keys too.

Proposed fix
                 # Swap reference model state_dict to self.model
                 for k, v in self.model.state_dict().items():
                     if isinstance(v, torch.Tensor):
-                        v.copy_(self.reference_state_dict[k])
+                        if k in self.reference_state_dict:
+                            v.copy_(self.reference_state_dict[k])
-                for k, v in self.model.state_dict().items():
-                    if isinstance(v, torch.Tensor):
-                        v.copy_(model_state_dict[k])
+                for k, v in self.model.state_dict().items():
+                    if isinstance(v, torch.Tensor) and k in model_state_dict:
+                        v.copy_(model_state_dict[k])
🤖 Fix all issues with AI agents
In @.gitmodules:
- Around line 3-4: Update the submodule declaration that currently sets "url =
https://github.com/yaoyu-33/Megatron-LM.git" (with "branch = main") so it points
to the official upstream "https://github.com/NVIDIA/Megatron-LM.git"; if the
fork is intentionally required instead, replace the URL only after adding a
short justification in the repo docs (e.g., SECURITY.md or README) explaining
why the fork is needed and noting any maintained diffs, and include a maintainer
sign-off in the justification so reviewers can accept the deviation.

In `@3rdparty/Megatron-LM-workspace/Megatron-LM`:
- Line 1: The submodule 3rdparty/Megatron-LM-workspace/Megatron-LM points at a
non-existent commit (11dcbaca317133cc5c77c8bc4f54ed71d3b5d656); update the
submodule to a valid commit/branch on the upstream Megatron-LM remote by
entering the submodule (cd 3rdparty/Megatron-LM-workspace/Megatron-LM), running
git fetch origin, checking out a known-good commit or branch (e.g., origin/main
or a specific existing SHA), then git add the submodule change in the
superproject, commit the update, and push the branch so the PR references a
valid submodule commit.

In `@examples/configs/grpo_math_1B_megatron_lora.yaml`:
- Line 114: Replace the YAML value that sets lora_dtype so it yields a true null
rather than the string "None": change the mapping key/value where lora_dtype is
defined (currently `lora_dtype: None`) to use YAML null (`lora_dtype: null` or
`lora_dtype: ~`) so that downstream code constructing LoRA (e.g.,
LoRA(lora_dtype=...)) receives a null/None value instead of the string "None".

In `@examples/configs/grpo_math_1B_megatron.yaml`:
- Around line 100-111: The base Megatron config currently enables LoRA by
default and sets lora_dtype to the literal string "None"; change peft.enabled to
false so downstream non‑LoRA runs (e.g., grpo_megatron.sh) don't inadvertently
enable LoRA, and have LoRA-specific configs or the grpo_megatron_lora.sh
override set peft.enabled=true when needed; also replace lora_dtype: None with a
YAML null (e.g., lora_dtype: null or lora_dtype: ~) so it parses as null rather
than the string "None".

In `@nemo_rl/models/policy/workers/megatron_policy_worker.py`:
- Around line 925-933: The current check uses "if ref_megatron_cfg is not None"
which is always true because ref_megatron_cfg is always created; change the
guard to verify PEFT is enabled (e.g. if self.use_peft and ref_megatron_cfg is
not None) before creating and registering the PEFT pre-wrap hook via
_create_peft_pre_wrap_hook(ref_megatron_cfg, ref_state), calling
ref_megatron_cfg.model.register_pre_wrap_hook(pre_peft_hook), composing
composed_peft_hook, and extending ref_pre_wrap_hooks so LoRA wrapping only
applies when self.use_peft is true.
- Around line 946-960: When self.use_peft is true the current
should_load_checkpoint only checks ref_megatron_cfg.checkpoint.load and ignores
ref_megatron_cfg.checkpoint.pretrained_checkpoint; update the PEFT branch in
megatron_policy_worker.py so should_load_checkpoint mirrors the non-PEFT logic
by checking both ref_megatron_cfg.checkpoint.load and
ref_megatron_cfg.checkpoint.pretrained_checkpoint with checkpoint_exists, and
preserve the existing ref_megatron_cfg.checkpoint.finetune toggling behavior
(still set finetune=False when loading a checkpoint) so the reference model
loads pretrained weights in PEFT scenarios.
🧹 Nitpick comments (2)
nemo_rl/models/policy/workers/megatron_policy_worker.py (2)

1571-1573: Commented-out code without explanation.

Per coding guidelines, commented-out code should include a comment describing why it is retained, or be removed before merging. Lines 1571-1573 and 1565 have commented-out load_state_dict calls with no rationale.


904-943: Duplicate LoRA construction — extract a helper.

The LoRA instantiation block (lines 906-919) is nearly identical to lines 308-320 in setup_megatron_model. Consider extracting a shared helper to avoid copy-paste divergence.

@vadam5 vadam5 force-pushed the vadams/sync-lora-grpo-megatron branch from 7c9e021 to 4c0e216 Compare February 11, 2026 00:50
@github-actions
Copy link
Copy Markdown

❌ Submodule Fast-Forward Check Failed

Check based on commit: 4c0e216 (PR #1889 from vadams/sync-lora-grpo-megatron)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Megatron-LM: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/yaoyu-33/Megatron-LM/commits/b73ae5cdab9d409fcface2b2f3c375710abe6911/
CURRENT (PR #1889 from vadams/sync-lora-grpo-megatron): https://github.com/yaoyu-33/Megatron-LM/commits/11dcbaca317133cc5c77c8bc4f54ed71d3b5d656/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@github-actions
Copy link
Copy Markdown

❌ Submodule Fast-Forward Check Failed

Check based on commit: 2b335f7 (PR #1889 from vadams/sync-lora-grpo-megatron)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Megatron-LM: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/yaoyu-33/Megatron-LM/commits/b73ae5cdab9d409fcface2b2f3c375710abe6911/
CURRENT (PR #1889 from vadams/sync-lora-grpo-megatron): https://github.com/yaoyu-33/Megatron-LM/commits/11dcbaca317133cc5c77c8bc4f54ed71d3b5d656/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@vadam5 vadam5 force-pushed the vadams/sync-lora-grpo-megatron branch from 2a926f7 to 8168855 Compare February 17, 2026 20:13
Copy link
Copy Markdown
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work @vadam5 !

could you also include lp error and gen kl error in the plots?

@cuichenx @yaoyu-33 to review the megatron worker part

Copy link
Copy Markdown
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another round of review

@terrykong
Copy link
Copy Markdown
Collaborator

also, DCO and lint need to be resolved before final merge

@vadam5 vadam5 force-pushed the vadams/sync-lora-grpo-megatron branch from 3bb86a6 to c799cdf Compare February 20, 2026 00:34
@vadam5 vadam5 requested a review from a team as a code owner February 20, 2026 00:52
@vadam5 vadam5 force-pushed the vadams/sync-lora-grpo-megatron branch from 9c7e4e2 to d5fa658 Compare February 20, 2026 00:55
@vadam5
Copy link
Copy Markdown
Contributor Author

vadam5 commented Feb 20, 2026

Fixed DCO and ran the linter. Some files I didn't touch for this PR also had small lint fixes.

@vadam5 vadam5 requested review from cuichenx and terrykong February 20, 2026 01:07
Copy link
Copy Markdown
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaoyu-33 @ananthsub can you take a pass? some of the megatron changes could use your expertise

@terrykong
Copy link
Copy Markdown
Collaborator

hey @vadam5 . @yaoyu-33 raised some good points offline. give me some time to review some of the api changes, to confirm with these changes. i'll circle back today or tomorrow on this

terrykong
terrykong previously approved these changes Feb 27, 2026
@terrykong terrykong enabled auto-merge (squash) February 27, 2026 08:04
@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Feb 27, 2026
@vadam5
Copy link
Copy Markdown
Contributor Author

vadam5 commented Feb 27, 2026

@terrykong Does anything else need to be done here?

@terrykong
Copy link
Copy Markdown
Collaborator

terrykong commented Feb 27, 2026

@vadam5 this PR is all good. i just merged in another PR to create another CI level to help speed up the evaluation of PRs, i'll help resolve this and kick off those tests

@terrykong terrykong force-pushed the vadams/sync-lora-grpo-megatron branch from abab66f to 34726e4 Compare February 27, 2026 22:20
@terrykong terrykong force-pushed the vadams/sync-lora-grpo-megatron branch from 34726e4 to 0ce335f Compare February 27, 2026 22:23
@terrykong terrykong added CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 27, 2026
terrykong
terrykong previously approved these changes Feb 27, 2026
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong force-pushed the vadams/sync-lora-grpo-megatron branch from 0ce335f to 83b2d9a Compare March 2, 2026 18:21
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong added CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) and removed CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) labels Mar 2, 2026
@terrykong terrykong merged commit a426896 into main Mar 2, 2026
43 of 46 checks passed
@terrykong terrykong deleted the vadams/sync-lora-grpo-megatron branch March 2, 2026 19:27
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Virginia Wu <vadams@nvidia.com>
Signed-off-by: Virginia Wu <78445382+vadam5@users.noreply.github.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: root <root@pool0-00689.cm.cluster>
Co-authored-by: Terry Kong <terryk@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Virginia Wu <vadams@nvidia.com>
Signed-off-by: Virginia Wu <78445382+vadam5@users.noreply.github.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: root <root@pool0-00689.cm.cluster>
Co-authored-by: Terry Kong <terryk@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 9, 2026
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Virginia Wu <vadams@nvidia.com>
Signed-off-by: Virginia Wu <78445382+vadam5@users.noreply.github.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: root <root@pool0-00689.cm.cluster>
Co-authored-by: Terry Kong <terryk@nvidia.com>
terrykong added a commit that referenced this pull request Mar 30, 2026
GRPO LoRA for Megatron Core has landed (#1889), so remove the "coming
soon" note and reword the LoRA news bullets for consistency.

Signed-off-by: Terry Kong <terryk@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LoRA Mcore GRPO Support LoRA in GRPO Support for PEFT in GPRO

4 participants