Skip to content

analyze graph: fix incorrect cross-package module resolution in ruff analyze graph (#24182)#24183

Open
markjm wants to merge 1 commit intoastral-sh:mainfrom
markjm:mmolinaro/analyze
Open

analyze graph: fix incorrect cross-package module resolution in ruff analyze graph (#24182)#24183
markjm wants to merge 1 commit intoastral-sh:mainfrom
markjm:mmolinaro/analyze

Conversation

@markjm
Copy link
Copy Markdown
Contributor

@markjm markjm commented Mar 25, 2026

Problem

The previous implementation added all detected package roots to the module search paths (src_roots). In a monorepo with many packages, this caused import foo in one package to incorrectly resolve to an unrelated foo package elsewhere in the repository.

Solution

Two key changes:

  1. src_roots collection: Instead of adding all package roots, collect src paths from discovered configs using resolver.settings(). This uses ruff's hierarchical config discovery - each file's settings come from its closest pyproject.toml, and only explicitly configured src paths are used for resolution.

  2. module_path computation: Fix the path calculation in lib.rs to use package.parent() as the src_root when computing the module path. This ensures relative imports resolve correctly (the package directory must be included in the module path, not used as the strip prefix).

Performance

The reduced search space dramatically improves performance on a large
monorepo:

  • Before: ~27 seconds
  • After: ~3.3 seconds (~8x faster)

Notes

  • One test unexpectedly relied on finding code from other roots. i looked at the example from uv and modified the test to more representatively look like a uv workspace
  • The other tests added are ones which would fail today on main to represent the issue

See #24182

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot bot commented Mar 25, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@markjm markjm changed the title Fix incorrect cross-package module resolution in ruff analyze graph (#24182) analyze graph: fix incorrect cross-package module resolution in ruff analyze graph (#24182) Mar 25, 2026
Copy link
Copy Markdown
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This overall makes sense. I left two comments: I'm not sure if what I propose works but ty's module resolver has evolved since we wrote ruff analyze and it's plausible that some of the workarounds implemented in analyze are no longer necessary.

The main caveat of this change is that analyze now behaves differently from lint, but I think that's fine.

);
// Collect `src` paths from all discovered configs (root + hierarchically discovered).
// This is more efficient than iterating over files since we only process unique configs.
for settings in resolver.settings() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is still somewhat prone to packages resolving across project boundaries. Ultimately, Ruff's hierarchical configuration here is directly add odds with requiring a global src_roots.

It makes me wonder if ty's desperate module resolution works well enough in most cases and, if not, could we make it so that it works out of the box?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this but i think it doesnt exactly work for the uv workspace case. desperate module resolution simply walks up to the root, but we also need to be aware of subprojects with pyproject.toml files setting src-s to a sibling directory.

Please feel free to give it a shot and let me know if im missing something, but removing this and relying on ty desperate resolution "later" demonstrated failures in the uv workspace testcase which is already here (that I am modifying)

Comment on lines +38 to +55
// Compute the module path relative to the src_root (package's parent).
// For a file at `src_root/ruff/foo/bar.py`, the module path is `["ruff", "foo", "bar"]`.
// We use package.parent() because `package` points to the package directory itself,
// and we need to include the package name in the module path for relative imports to work.
let module_path = package.and_then(|package| {
let src_root = package.parent()?;
path.as_std_path()
.strip_prefix(src_root.as_std_path())
.ok()?
.iter()
.map(|component| {
Path::new(component)
.file_stem()
.and_then(|s| s.to_str())
.map(String::from)
})
.collect::<Option<Vec<String>>>()
});
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder if we could use path_to_module here instead

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this we can do and i think I made the change!

@MichaReiser MichaReiser added the analyze Related to Ruff analyze functionality label Mar 26, 2026
@markjm
Copy link
Copy Markdown
Contributor Author

markjm commented Mar 26, 2026

Thanks for the review, I think I have addressed your feedback @MichaReiser in a way where tests are unaffected and the changes still look right in my jumbo repo. But i will let you resolve your comments when comfortable because I'm new to the tool and am not 100% sure I've done things right

@markjm markjm force-pushed the mmolinaro/analyze branch from 2f6da0b to af4ec41 Compare March 26, 2026 16:29
markjm added a commit to markjm/ruff that referenced this pull request Mar 27, 2026
As discussed in astral-sh#24183 (specifically astral-sh#24183 (comment))

In this change, we pull out the heuristics for finding search paths (currently in in `options.rs`) into a seperate function `discover_src_layout_roots` so that we can also use it in `absolute_desperate_search_paths` and `relative_desperate_search_paths`.

Now, when desperate resolution encounters a `pyproject.toml` or `ty.toml`, it will also checks for these well-known src-layout subdirectories "on the way up" as we are searching up the ancestors.

Note: naming is hard so may also consider some function rename to indicate that these are kind of just heuristics 🤷
@markjm markjm force-pushed the mmolinaro/analyze branch from af4ec41 to 7ba919b Compare March 27, 2026 23:40
@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot bot commented Mar 27, 2026

Typing conformance results

No changes detected ✅

Current numbers
The percentage of diagnostics emitted that were expected errors held steady at 86.59%. The percentage of expected errors that received a diagnostic held steady at 80.96%. The number of fully passing files held steady at 68/132.

@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot bot commented Mar 27, 2026

Memory usage report

Memory usage unchanged ✅

@markjm markjm force-pushed the mmolinaro/analyze branch from 7ba919b to 1f8c5a2 Compare March 27, 2026 23:43
@astral-sh-bot
Copy link
Copy Markdown

astral-sh-bot bot commented Mar 27, 2026

ecosystem-analyzer results

No diagnostic changes detected ✅

Full report with detailed diff (timing results)

@markjm markjm force-pushed the mmolinaro/analyze branch from 1f8c5a2 to 687f745 Compare March 27, 2026 23:47
@carljm carljm removed their request for review March 28, 2026 00:05
Comment on lines -55 to -69
let package_roots = resolver
.package_roots(
&paths
.iter()
.flatten()
.map(ResolvedFile::path)
.collect::<Vec<_>>(),
)
.into_iter()
.map(|(path, package)| {
(
path.to_path_buf(),
package.map(PackageRoot::path).map(Path::to_path_buf),
)
})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just spitballing ideas here to find a way to implement this performance optimization without breaking with Ruff's understanding of packages.

Instead of ignoring all package roots, could we filter out roots that we know overlap with what ty does in its module resolver? In your project, the issue seems to be mainly about adding a src root to every directory containing an __init__.py. Could we skip all PackageRoot::Nested? Or does some other heuristic work?

Copy link
Copy Markdown
Contributor Author

@markjm markjm Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking back @MichaReiser -

I did investigate that and the results also have some incorrect cross-package module resolution.

I think the crux of the issue here is that ruff only cares about pyproject.toml files which specifically set some settings in tool.ruff.

pub fn ruff_enabled<P: AsRef<Path>>(path: P) -> Result<bool> {

if let Some(pyproject) = settings_toml(ancestor)? {

One additional way we can solve this issue to to make ruff consider pyproject.toml files under a ruff-enabled root as being a ruff-related config file. This would allow this whole thing to be simplified because we would naturally slot into how ruff already handles hierarchical project resolution.

Now

`pyproject.toml` with `tool.ruff`
`src`
      `__init__.py`
       `a.py`
`packages`
      `other`
          `pyproject.toml` no `tool.ruff` <- not considered when ruff does its hierarchical config finding

Potential Change

`pyproject.toml` with `tool.ruff`
`src`
      `__init__.py`
       `a.py`
`packages`
      `other`
          `pyproject.toml` no `tool.ruff` <- would essentially be considered the same as if `tool.ruff` was set with no overridden values

There is some complexity in the code to make sure such pyproject.toml files inherit the right configs, which may not look so nice (example here, see resolver.rs note the test snapshot differences, but now actual output tests change https://github.com/astral-sh/ruff/compare/main...markjm:ruff:mmolinaro/analyze-pyproject-inheritance?expand=1)

Its also possible to hack in this heuristic just to analyze_graph. This resolves the issue for analyze in a slightly hacked-in way while leaving everything else along https://github.com/astral-sh/ruff/compare/main...markjm:ruff:mmolinaro/just-graph?expand=1

Note: please use those examples as directions for approach, not complete/tested PRs. thats why i left them as just branch compares for now


However, it seems like isort ruff rules kind of fall prey to this same issue of blank pyproject.tomls messing with its resolution. Specifically, the setting https://docs.astral.sh/ruff/settings/#lint_isort_detect-same-package seems to be there to address exactly this type of issue. Looking back at the example workspace in the test I changed, if i were to see how isort handles it:

isort categorization for albatross/__init__.py:

Import Category Reason Note
albatross FirstParty SamePackage Only works because of the same-package heuristic

I believe its not a problem with isort because that setting defaults to true :)

Additionally, if we look specifically at the example in the test which was changed
https://github.com/astral-sh/uv/blob/aa629c4a54c31d6132ab1655b90dd7542c17d120/scripts/workspaces/albatross-virtual-workspace/packages/albatross/src/albatross/__init__.py

ruff check packages/albatross/src/albatross/__init__.py --show-settings | grep linter.src -A3
linter.src = [
	"/tmp/uv",
	"/tmp/uv/src",
]

we see that ruff indeed does not naturally consider packages/albatross/ or packages/albatross/src to be a linter.src entry

Edit: editted a few times for more info or clarity

Copy link
Copy Markdown
Contributor Author

@markjm markjm Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put succinctly:

I believe that that the original approach which caused that test to need to change its settings was the "most correct" (and also the fastest :) ) because it most closely matches how ruff resolves settings to a particular file today. The examples with isort and the uv workspace example above, I think, are evidence towards that direction.
The need to include the test change was not because we suddenly stopped including configured or inferred src_roots based on the configs available, but only because analyze graph over-adds src roots so much so now that those directories just happened to be included in spite of a ruff their missing ruff configurations.

Copy link
Copy Markdown
Member

@MichaReiser MichaReiser Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time I can invest into this PR is very limited at the moment. That's why I'd prefer a local, ideally minimal, change.

I did investigate that and the results also have some incorrect cross-package module resolution.

I don't think this can be fully avoided. It's inherent to having a single search path and mimics Python's behavior.

I changed the venv test and added a packages/albatross/src/albatross/sub/__init__.py. As suspected, package_roots now returns an entry for the sub directory, which means src_roots will have as many entries as there are directories with an __init__.py, leading to horrible performance.

             13 │+[crates/ruff/src/commands/analyze_graph.rs:55:25] resolver.package_roots(&paths.iter().flatten().map(ResolvedFile::path).collect::<Vec<_>>(),) = {
             14 │+    "/private/var/folders/7s/n00zmc2d0qgchs1ns43xq47h0000gn/T/.tmpK2kE6t/packages/albatross": None,
             15 │+    "/private/var/folders/7s/n00zmc2d0qgchs1ns43xq47h0000gn/T/.tmpK2kE6t/packages/albatross/src/albatross": Some(
             16 │+        Root {
             17 │+            path: "/private/var/folders/7s/n00zmc2d0qgchs1ns43xq47h0000gn/T/.tmpK2kE6t/packages/albatross/src/albatross",
             18 │+        },
             19 │+    ),
             20 │+    "/private/var/folders/7s/n00zmc2d0qgchs1ns43xq47h0000gn/T/.tmpK2kE6t/packages/albatross/src/albatross/sub": Some(
             21 │+        Root {
             22 │+            path: "/private/var/folders/7s/n00zmc2d0qgchs1ns43xq47h0000gn/T/.tmpK2kE6t/packages/albatross/src/albatross",
             23 │+        },
             24 │+    ),
             25 │+}

I think what we should do is:

  • Always add PackageRoot::Nested because (or decide we don't wnat to support this case, but we should add a test for it, unless there's a pre-existing test)
  • For PackageRoot::Root: Only add the outermost directories using deduplicate_nested_paths

Unless @charliermarsh has some ideas. He originally wrote all this :)

Copy link
Copy Markdown
Member

@MichaReiser MichaReiser Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, my comment about the sub package is incorrect. The IndexMap already handles it. @markjm can you share an example of the paths that the auto discovery adds in your repository that shouldn't be added (there should be a very large number of them, given the performance issues you're running into).

One solution here could be to not add the discovered package roots if a project explicitly sets src_roots.

Copy link
Copy Markdown
Contributor Author

@markjm markjm Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cant share the specific code, but i pulled these numbers

Approach Src Roots Package Discovery Runtime Correctness
Hierarchical (mmolinaro/analyze) 2 Explicit config: ["libs", "."] (just where our code happens to ve configured) ~4.5s ✅ No false dependencies
Main branch (original) 174 Auto-discover all packages (10,539 found → deduplicated to 174) ~27s ❌ False cross-service dependencies
keep nested + dedup roots 174 8,427 Root → deduplicated to 1 outermost + nested ~13s ❌ False cross-service dependencies, and as slow as main branch

Also pulled some stats on # of resolutions:

  • Hierarchical (2 roots): 691,261 total imports, 434,637 resolved (62.9%), 256,624 failed (37.1%)
  • Main (174 roots): 691,575 total imports, 436,721 resolved (63.1%), 254,854 failed (36.9%)

The performance difference comes from checking 2 roots vs 174 roots for every import resolution attempt (~691K attempts). The major performance hit comes from all the failed resolutions which are needing to look through all those src roots.


First, I changed the PR back to close-to the original approach (pre-messing with ty desperate resolution) in eb29f54.

Since the "include nested" approach mentioned above acts the same as main, I updated to an approach which just adds all package_roots but relies on deduplicate_nested_paths to remove most of them (though it removes the nested packages discussed above, it does mean the test you were concerned about now passes :) ). This change also ensures we dont touch anything outside of ruff analyze graph (scoped as requested above!).

I am happy to either (1) the eb29f54 approach, (2) limit to only when src_roots are explicitely set (though i dont know if there is any current signal if a setting is explicit vs defaulted), or (3) this approach shown (perhaps close to "decide we don't wnat to support this case"). Whatever you would prefer, I can make happen.


P.S I made this small demo which just shows the type of resolution issues which can happen with current approach (I know we both agree this can happen, but i figured its helpful to have an example)
markjm/demos-by-commit@347feb2#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711

Edit: made a bunch of edits to this message as i explored and experimented

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you share an example of the paths that the auto discovery adds in your repository that shouldn't be added (there should be a very large number of them, given the performance issues you're running into).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I thought the demo does show this
markjm/demos-by-commit@347feb2

I made another with the exact src we have configured here, with some naming changes, the file structure is just ripped from the repo
markjm/demos-by-commit@49e27fe

You will notice in this case that the conflict is finding a package named openai (an existing 3p pip package) because all/most other imports in the repo are absolute (from services.llm_actions.src.llm_actions.openai import call_openai), making these conflicts much more likely with 3p package names

The major performance hit comes from all the failed resolutions which are needing to look through all those src roots.

I was sharing this ⬆️ , though in response to ⬇️

(there should be a very large number of them, given the performance issues you're running into)

because there actually are not a very large number of them. Instead there are a relatively small number of them, as they represent cases where the resolution with so many extra roots find some arbitrary name-matching root elsewhere, while the vast majority of cases dont and just have a much slower exhaustive "failed resolution" case where each resolution doesn't find anything but does exhaustively search all the many auto-discovered roots.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I somehow missed the demo. I don't think I've the time now to play and look into the demo myself. I can help drive this forward if we have a clear analysis which extra paths are added and if there's any pattern (all nested roots, something else?).

@MichaReiser MichaReiser assigned charliermarsh and unassigned ntBre Apr 8, 2026
@MichaReiser MichaReiser removed the request for review from ntBre April 8, 2026 06:57
@MichaReiser MichaReiser requested review from charliermarsh and removed request for AlexWaygood, BurntSushi, Gankra, dcreager, ibraheemdev and sharkdp April 8, 2026 06:57
@markjm markjm force-pushed the mmolinaro/analyze branch from 687f745 to 62c6e8a Compare April 13, 2026 19:55
@markjm
Copy link
Copy Markdown
Contributor Author

markjm commented Apr 13, 2026

((reverted the branch back to the state it was in before exploring the ty resolver changes)) in eb29f54

@markjm markjm force-pushed the mmolinaro/analyze branch 3 times, most recently from 62c6e8a to a44bcd7 Compare April 14, 2026 02:26
@markjm markjm force-pushed the mmolinaro/analyze branch from a44bcd7 to e0bc18b Compare April 14, 2026 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

analyze Related to Ruff analyze functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants