Skip to content

[BUG] race condition in OpenMLSplitTest when running tests in parallel#1643

Merged
fkiraly merged 4 commits intoopenml:mainfrom
Alm0stSurely:fix/race-condition-test-split
Mar 3, 2026
Merged

[BUG] race condition in OpenMLSplitTest when running tests in parallel#1643
fkiraly merged 4 commits intoopenml:mainfrom
Alm0stSurely:fix/race-condition-test-split

Conversation

@Alm0stSurely
Copy link
Copy Markdown
Contributor

Problem

When running tests in parallel with pytest-xdist (e.g., "pytest -n 3 tests/test_tasks/test_split.py"), one test under OpenMLSplitTest fails intermittently with an EOFError during pickle.load().

This was identified in CI job/63346513831 and reproduces roughly 1 out of 10 runs locally.

Analysis

The root cause is that all test instances share the same pickle cache file path (self.pd_filename). When multiple workers run concurrently:

  1. Worker A creates the pickle cache file during test execution
  2. Worker B reads the pickle cache file
  3. Worker A's tearDown() deletes the file
  4. Worker B's pickle.load() encounters a partially deleted file → EOFError

This is a classic race condition on shared filesystem state.

Solution

Use tempfile.mkdtemp() to create a unique temporary directory for each test instance, then copy the ARFF source file there. This ensures:

  • Each test worker has its own isolated pickle cache file
  • No shared state between parallel workers
  • Automatic cleanup via shutil.rmtree() in tearDown()

The fix is minimal (10 insertions, 3 deletions) and doesn't change the test logic - only the test isolation mechanism.

Benchmarks / Testing

Ran 5 consecutive parallel test executions:

pytest -n 4 tests/test_tasks/test_split.py  # 5 times

All 15 test runs (3 tests × 5 runs) passed successfully. Before the fix, failures occurred ~10% of the time with parallel execution.

Fixes #1641

Use unique temp directories for each test instance to prevent
concurrent workers from interfering with shared pickle cache files.

Fixes openml#1641
@omkar-334
Copy link
Copy Markdown
Contributor

Looks like @Alm0stSurely is an Openclaw bot

Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this fixes the problem.

Congrats on your first "good first issue"!
How about something more challenging next?

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Feb 18, 2026

Looks like @Alm0stSurely is an Openclaw bot

Yes, almost surely 😁

No reason though to not let it have its good first issue and let it move on to more challenging tasks. We can always make more good first issues tomorrow.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.82%. Comparing base (7feb2a3) to head (46685ba).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1643   +/-   ##
=======================================
  Coverage   52.82%   52.82%           
=======================================
  Files          37       37           
  Lines        4371     4371           
=======================================
  Hits         2309     2309           
  Misses       2062     2062           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this looks good and solves the race condition problem. I have left a comment, though not a hard blocker.

# Use a unique temp directory for each test to avoid race conditions
# when running tests in parallel (see issue #1641)
self._temp_dir = tempfile.mkdtemp()
self.arff_filepath = Path(self._temp_dir) / "datasplits.arff"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mkdtemp is fine, but I'd prefer TemporaryDirectory since it's safer and reduces cleanup code.

Then in tearDown you can simply call self._temp_dir.cleanup()

Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @geetu040

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Feb 20, 2026

@Alm0stSurely, can you please react to the reviews?

Replace tempfile.mkdtemp() with tempfile.TemporaryDirectory() for
safer automatic cleanup, as suggested by @geetu040.
@Alm0stSurely
Copy link
Copy Markdown
Contributor Author

Thanks for the review @geetu040 and @fkiraly! Switched to tempfile.TemporaryDirectory() as suggested — much cleaner with automatic cleanup. Let me know if anything else needs adjusting.

@fkiraly fkiraly changed the title fix: race condition in OpenMLSplitTest when running tests in parallel [BUG] race condition in OpenMLSplitTest when running tests in parallel Mar 3, 2026
@fkiraly fkiraly merged commit dbd432c into openml:main Mar 3, 2026
31 checks passed
fkiraly pushed a commit that referenced this pull request Mar 3, 2026
Fixes #1641

This PR adds separate temp directories per test in test_split.py to
avoid race conditions when running with multiple workers.

cc @geetu040  
I made this PR because I suspect #1643 is made by an OpenClaw bot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MNT] Race condition in OpenMLSplit._from_arff_file when running tests in parallel

5 participants