refactor(agents): replace mixin-based retry/rate-limiting with middle…#2281
refactor(agents): replace mixin-based retry/rate-limiting with middle…#2281
Conversation
…ware pipeline Replace WithRetryPolicy and WithRateLimiter mixins with a composable CompletionMiddleware pipeline on BaseGenerator. This decouples cross-cutting concerns (retry, rate-limiting) from the generator inheritance hierarchy, making them stackable and independently testable.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the agent generation system by transitioning from an inheritance-based mixin pattern for handling concerns like retries and rate-limiting to a more flexible and robust middleware pipeline. This change promotes better separation of concerns, allowing these functionalities to be composed and tested independently, ultimately leading to a more maintainable and extensible codebase for AI agent interactions. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is an excellent refactoring that replaces the mixin-based approach for retries and rate-limiting with a more flexible and composable middleware pipeline. The new architecture decouples cross-cutting concerns from the generator's inheritance hierarchy, which greatly improves modularity and testability. The implementation of the CompletionMiddleware pipeline is clean and robust, using functools.reduce for an elegant composition of middleware. The changes are consistently applied across the codebase, including the tests, which have been updated to reflect the new design. This is a high-quality contribution that significantly improves the agent generator's architecture.
…d docs Update requires-python to 3.12, fix stale docstrings and types in base generator, align tests with new middleware API (remove RetryPolicy, with_retries, with_rate_limiter), and document the middleware pipeline in the README.
…o.sleep mock Patch asyncio.sleep instead of overriding _before_sleep to verify exponential backoff and max_delay capping without production hooks.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request is a well-executed refactoring that replaces the mixin-based approach for retries and rate-limiting with a more flexible and composable middleware pipeline. This change significantly improves the architecture by decoupling cross-cutting concerns from the generator's inheritance hierarchy, which enhances modularity and testability. The implementation is clean, and the related tests have been thoroughly updated to reflect the new design. I have one minor suggestion to improve the robustness of a test helper function.
Narrow _retry_mw helper to LiteLLMRetryMiddleware and assert middleware round-trips correctly in test_chat_workflow_serialization.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates the minimum Python version to 3.12 and introduces a middleware pipeline for generators to handle cross-cutting concerns like retries and rate limiting. The LiteLLMGenerator now includes a retry middleware by default, and custom middleware can be created by subclassing CompletionMiddleware. The code also removes the WithRateLimiter and WithRetryPolicy mixins. Review comments suggest communicating the breaking change of the Python version update to users, improving the efficiency of accessing RateLimiterMiddleware in tests by directly accessing it instead of iterating through the middleware list, adding a parameter to control middleware inclusion in the _make_generator function, and adding checks to ensure middleware exists before accessing its properties in helper functions.
| rl_mw = next( | ||
| mw for mw in generator.middleware if isinstance(mw, RateLimiterMiddleware) | ||
| ) |
| rl_mw_copy = next( | ||
| mw | ||
| for mw in generator_with_params.middleware | ||
| if isinstance(mw, RateLimiterMiddleware) | ||
| ) |
| def _make_generator(**retry_kwargs) -> MockGenerator: | ||
| mw = ( | ||
| _RetriableOnlyMiddleware(**retry_kwargs) | ||
| if retry_kwargs | ||
| else _RetriableOnlyMiddleware() | ||
| ) | ||
| return MockGenerator(middleware=[mw]) |
| def _retry_mw(gen: BaseGenerator) -> LiteLLMRetryMiddleware: | ||
| return next(mw for mw in gen.middleware if isinstance(mw, LiteLLMRetryMiddleware)) |
| def _rl_mw(gen: BaseGenerator) -> RateLimiterMiddleware: | ||
| return next(mw for mw in gen.middleware if isinstance(mw, RateLimiterMiddleware)) |
mattbit
left a comment
There was a problem hiding this comment.
The architectural change is solid and definitely an improvement, however the usability is impacted as we are making harder to configure retries and rate limiting on the fly.
E.g. say I have a Generator instance and I want to update the number of attempts, the current API is simple and predictable
generator.with_retries(3)While with the proposed middleware:
# do we already have a retry middleware somewhere?
found = False
for i, mw in enumerate(generator.middleware):
if isinstance(mw, RetryMiddleware):
found = True
# if we found it let's clone the generator and then replace the middleware in place
new_generator = generator.model_copy()
new_generator.middleware[i] = mw.model_copy(update={"max_attempts":5})
# otherwise, we'll add a new one
if not found:
# where do I need to put the retry, at the beginning or at the end?
# well it needs to be at the beginning, because otherwise it would break how the rate limiter middleware works
new_generator = generator.model_copy()
new_generator.middleware = [RetryMiddleware(max_attempts=5)] + generator.middlewareOverall, my feeling is that the middleware is flexible, but RateLimiter and Retry mechanisms are in practice very specific, for example they need to be added in precise order. It's easy to forget and add Retry after RateLimit.
I propose a compromise:
- Keep the middleware chain as the internal architecture. This is solid and very effective for advanced extension (hooks, etc.)
- Modify the
_build_chainto add these two fundamental mechanisms. Some kind of built-in middleware, rough example:
def _build_chain(self, core: NextFn) -> NextFn:
all_mw: list[CompletionMiddleware] = []
if self._retry_mw:
all_mw.append(self._retry_mw)
if self._rate_limiter_mw:
all_mw.append(self._rate_limiter_mw)
all_mw.extend(self.middleware)
return reduce(_wrap, reversed(all_mw), core)(could probably be written better).
We can also keep exposing the internal configs via properties:
@property
def rate_limiter(self):
return self._rate_limiter_mw.rate_limiter if self._rate_limiter_mw else NoneAnd we can also implement the with_retries and with_rate_limiter.
In this way, we would have:
- the convenient methods and easy access to a generator configuration (e.g.
generator.retry_policy) - strong guarantees that these middlewares are executed in the right order
- composable architecture with the middleware pattern
- extensibility with the exposed middlewares
Let me know what you think or if you have better ideas.
| max_attempts: int = Field(default=3) | ||
| base_delay: float = Field(default=1.0) | ||
| max_delay: float | None = Field(default=None) |
There was a problem hiding this comment.
We should retain the RetryPolicy object (which is the configuration) and encapsulate in the middleware, as done for the RateLimiter.
| model: str = Field( | ||
| description="The model identifier to use (e.g. 'gemini/gemini-2.0-flash')" | ||
| ) | ||
| middleware: list[CompletionMiddleware] = Field( |
…nce API Extract RetryPolicy config model and give BaseGenerator dedicated retry_policy and rate_limiter fields with guaranteed middleware ordering. Adds with_retries() and with_rate_limiter() convenience methods. Renames middleware -> middlewares for codebase consistency.
…ware pipeline
Replace WithRetryPolicy and WithRateLimiter mixins with a composable CompletionMiddleware pipeline on BaseGenerator. This decouples cross-cutting concerns (retry, rate-limiting) from the generator inheritance hierarchy, making them stackable and independently testable.
Description
Related Issue
Type of Change
Checklist
CODE_OF_CONDUCT.mddocument.CONTRIBUTING.mdguide.pdm.lockrunningpdm update-lock(only applicable whenpyproject.tomlhas beenmodified)