-
Notifications
You must be signed in to change notification settings - Fork 682
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[JAX] Fix: Use jitted kernels for generating THD (and BSHD) segment pos
#2823
opened Apr 1, 2026 by
KshitijLakhani
•
Draft
5 of 13 tasks
Remove integration test for Lightning-Thunder
testing
Improvements to tests or testing infrastructure
#2822
opened Apr 1, 2026 by
timmoon10
Loading…
8 of 14 tasks
Fix fused router for large top-K and expert counts
#2821
opened Apr 1, 2026 by
harryzhou2000
Loading…
7 of 13 tasks
Refactor Amax Kernel ldmatrix loads, TMA/compute barriers, swizzle_idx
#2820
opened Apr 1, 2026 by
cael-ling
Loading…
6 of 13 tasks
Pass input_output_alias to TritonAutotunedKernelCall
#2814
opened Mar 31, 2026 by
tdophung
Loading…
5 of 13 tasks
[PyTorch] [torch.compile] Split linear forward into forward and setup context.
#2811
opened Mar 30, 2026 by
pggPL
Loading…
8 of 13 tasks
Streamline group Hadamard ComputeKernel loads
#2810
opened Mar 29, 2026 by
cael-ling
Loading…
5 of 13 tasks
Single __syncthreads per stage in GroupHadamardAmaxTmaKernel
#2809
opened Mar 29, 2026 by
cael-ling
Loading…
8 of 13 tasks
Precomputed swizzle_idx into group Hadamard ComputeKernel
#2808
opened Mar 29, 2026 by
cael-ling
Loading…
8 of 13 tasks
[PyTorch][Flash Attn] Add fallback import for FA3
#2806
opened Mar 26, 2026 by
eattia-nvidia
Loading…
7 of 13 tasks
[PyT] Fix FSDP2 memory leaks for FP8 weight workspaces and transpose caches
#2805
opened Mar 26, 2026 by
pstjohn
Loading…
3 tasks done
[PyT][Test] Add xfailing FSDP2 memory leak detection tests
#2803
opened Mar 25, 2026 by
pstjohn
Loading…
4 tasks done
[PyTorch] [CI] Capture subprocess stderr in distributed tests for better CI error re…
#2802
opened Mar 25, 2026 by
sudhakarsingh27
Loading…
13 tasks
[JAX] Warmup FFIs with "initialize" stage
#2800
opened Mar 25, 2026 by
jberchtold-nvidia
Loading…
1 of 13 tasks
[FSDP2/Megatron-FSDP/DCP] If model parameters are DTensors, optimizer states should also be DTensors.
#2795
opened Mar 24, 2026 by
cspades
Loading…
1 of 13 tasks
Avoid CPU offload wait_event for validation
#2793
opened Mar 23, 2026 by
vasunvidia
Loading…
13 tasks
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.