Enable CUDA mixed precision in NeuralNet by avidaldo · Pull Request #15 · fachu000/GSim-Python

avidaldo · 2026-03-14T16:49:29Z

Summary

Enable TF32 matrix multiplication on CUDA (torch.set_float32_matmul_precision('high')).
Run forward passes with CUDA autocast in bfloat16 for training and prediction.
Keep loss computation in fp32 to avoid bf16 instability for MAE/MAPE style losses.

Why

Improve throughput and Tensor Core utilization on CUDA while preserving training stability.
Keep non-CUDA behavior unchanged (enabled=self.device_type == 'cuda').

Scope

include/neural_net/neural_net.py only

Validation

Local test execution could not be run in this environment because pytest is not installed (No module named pytest).

Copilot

Pull request overview

This PR introduces CUDA-focused mixed-precision behavior in NeuralNet to improve throughput by enabling TF32 matmul precision and running forward passes under CUDA autocast (bf16), while attempting to keep loss computation in fp32 for stability.

Changes:

Enable TF32 matmul precision on CUDA via torch.set_float32_matmul_precision('high').
Wrap training/eval forward passes in torch.autocast(..., dtype=torch.bfloat16) when running on CUDA.
Cast model outputs to fp32 before loss computation (currently only for tensor outputs).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

include/neural_net/neural_net.py

Enable CUDA mixed precision in NeuralNet (TF32 + bf16 autocast)

f9266b8

Copilot AI review requested due to automatic review settings March 14, 2026 16:49

Copilot started reviewing on behalf of avidaldo March 14, 2026 16:49 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

include/neural_net/neural_net.py Show resolved Hide resolved

include/neural_net/neural_net.py Show resolved Hide resolved

include/neural_net/neural_net.py Show resolved Hide resolved

include/neural_net/neural_net.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable CUDA mixed precision in NeuralNet#15

Enable CUDA mixed precision in NeuralNet#15
avidaldo wants to merge 1 commit intofachu000:flexible_trainingfrom
avidaldo:improvements__cuda

avidaldo commented Mar 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

avidaldo commented Mar 14, 2026

Summary

Why

Scope

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants