Skip to content

refactor(agents): replace mixin-based retry/rate-limiting with middle…#2281

Merged
mattbit merged 9 commits intomainfrom
feat/generator-middleware-pipeline
Mar 11, 2026
Merged

refactor(agents): replace mixin-based retry/rate-limiting with middle…#2281
mattbit merged 9 commits intomainfrom
feat/generator-middleware-pipeline

Conversation

@Hartorn
Copy link
Member

@Hartorn Hartorn commented Feb 27, 2026

…ware pipeline

Replace WithRetryPolicy and WithRateLimiter mixins with a composable CompletionMiddleware pipeline on BaseGenerator. This decouples cross-cutting concerns (retry, rate-limiting) from the generator inheritance hierarchy, making them stackable and independently testable.

Description

Related Issue

Type of Change

  • 📚 Examples / docs / tutorials / dependencies update
  • 🔧 Bug fix (non-breaking change which fixes an issue)
  • 🥂 Improvement (non-breaking change which improves an existing feature)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 🔐 Security fix

Checklist

  • I've read the CODE_OF_CONDUCT.md document.
  • I've read the CONTRIBUTING.md guide.
  • I've written tests for all new methods and classes that I created.
  • I've written the docstring in Google format for all the methods and classes that I used.
  • I've updated the pdm.lock running pdm update-lock (only applicable when pyproject.toml has been
    modified)

…ware pipeline

Replace WithRetryPolicy and WithRateLimiter mixins with a composable
CompletionMiddleware pipeline on BaseGenerator. This decouples
cross-cutting concerns (retry, rate-limiting) from the generator
inheritance hierarchy, making them stackable and independently testable.
@Hartorn Hartorn self-assigned this Feb 27, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the agent generation system by transitioning from an inheritance-based mixin pattern for handling concerns like retries and rate-limiting to a more flexible and robust middleware pipeline. This change promotes better separation of concerns, allowing these functionalities to be composed and tested independently, ultimately leading to a more maintainable and extensible codebase for AI agent interactions.

Highlights

  • Refactored Retry and Rate-Limiting: The previous mixin-based approach for retry policies and rate-limiting (WithRetryPolicy, WithRateLimiter) has been replaced with a composable CompletionMiddleware pipeline. This decouples these cross-cutting concerns from the generator inheritance hierarchy.
  • Introduced CompletionMiddleware Pipeline: A new middleware.py file defines a CompletionMiddleware base class, RetryMiddleware, and RateLimiterMiddleware. The BaseGenerator now includes a middleware attribute and a _build_chain method to dynamically apply these middlewares.
  • Improved Modularity and Testability: The new middleware architecture makes retry and rate-limiting policies stackable, independently testable, and easier to manage, enhancing the overall modularity of the generator system.
  • Centralized Type Definitions: A new _types.py file was added to house shared type definitions like Response and GenerationParams, resolving circular dependencies and improving code organization.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • libs/giskard-agents/src/giskard/agents/generators/init.py
    • Updated imports to reflect the new middleware structure, removing references to old mixins and adding new middleware classes.
    • Modified the __all__ export list to include the new middleware components and remove the deprecated mixins.
  • libs/giskard-agents/src/giskard/agents/generators/_types.py
    • Added a new file to define Response and GenerationParams models, breaking circular dependencies between base and middleware modules.
  • libs/giskard-agents/src/giskard/agents/generators/base.py
    • Removed the in-file definitions of Response and GenerationParams, now imported from _types.py.
    • Added a middleware field to BaseGenerator to hold a list of CompletionMiddleware instances.
    • Implemented a _build_chain method to construct the middleware pipeline, applying middleware around the core _complete method.
    • Updated the complete method to use the new middleware chain for processing requests.
    • Modified batch_complete to call the public complete method, ensuring middleware is applied to each batch item.
    • Updated type hints and docstrings for chat, template, and with_params methods to reflect ChatWorkflow and Self returns.
  • libs/giskard-agents/src/giskard/agents/generators/litellm_generator.py
    • Removed inheritance from WithRateLimiter and WithRetryPolicy mixins.
    • Introduced LiteLLMRetryMiddleware as a new CompletionMiddleware subclass, specifically for LiteLLM's retry logic.
    • Updated LiteLLMGenerator to include LiteLLMRetryMiddleware in its default middleware list.
    • Refactored the _complete method to remove the _throttle context manager and directly call acompletion, as rate limiting is now handled by middleware.
  • libs/giskard-agents/src/giskard/agents/generators/middleware.py
    • Added a new file to define the CompletionMiddleware abstract base class.
    • Implemented RetryMiddleware using tenacity for exponential back-off, allowing custom retry conditions.
    • Implemented RateLimiterMiddleware to throttle completions using a BaseRateLimiter.
  • libs/giskard-agents/src/giskard/agents/generators/rate_limiting.py
    • Removed the file, as rate-limiting functionality has been migrated to middleware.py.
  • libs/giskard-agents/src/giskard/agents/generators/retries.py
    • Removed the file, as retry functionality has been migrated to middleware.py.
  • libs/giskard-agents/tests/test_generator.py
    • Updated imports to include RateLimiterMiddleware.
    • Modified test_litellm_generator_gets_rate_limiter to instantiate LiteLLMGenerator with RateLimiterMiddleware in its middleware list.
    • Adjusted assertions in test_generator_with_params_and_rate_limiter to verify rate limiter presence via the new middleware structure.
  • libs/giskard-agents/tests/test_generator_retry.py
    • Updated imports to remove old retry mixins and include RetryMiddleware.
    • Refactored MockGenerator to no longer inherit from WithRetryPolicy and to accept middleware.
    • Introduced _RetriableOnlyMiddleware to simulate specific retry conditions.
    • Modified test cases (test_raises_exception_after_retries_exhausted, test_raises_exception_if_not_retriable, test_retries_with_result, test_retries_works_with_batch_complete, test_retries_with_max_delay, test_retries_exponential_backoff) to use the new middleware-based retry mechanism and removed assertions related to _sleep_times.
  • libs/giskard-agents/tests/test_serialization.py
    • Updated imports to include new middleware classes (LiteLLMRetryMiddleware, RateLimiterMiddleware, RetryMiddleware).
    • Added helper functions _retry_mw and _rl_mw to easily access middleware instances for testing.
    • Modified test_generator_serialization to instantiate Generator with the new middleware list containing LiteLLMRetryMiddleware and RateLimiterMiddleware.
    • Adjusted assertions in serialization tests to correctly verify middleware properties instead of old mixin attributes.
    • Removed redundant comments about verification steps.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is an excellent refactoring that replaces the mixin-based approach for retries and rate-limiting with a more flexible and composable middleware pipeline. The new architecture decouples cross-cutting concerns from the generator's inheritance hierarchy, which greatly improves modularity and testability. The implementation of the CompletionMiddleware pipeline is clean and robust, using functools.reduce for an elegant composition of middleware. The changes are consistently applied across the codebase, including the tests, which have been updated to reflect the new design. This is a high-quality contribution that significantly improves the agent generator's architecture.

Hartorn added 2 commits February 27, 2026 10:29
…d docs

Update requires-python to 3.12, fix stale docstrings and types in
base generator, align tests with new middleware API (remove RetryPolicy,
with_retries, with_rate_limiter), and document the middleware pipeline
in the README.
…o.sleep mock

Patch asyncio.sleep instead of overriding _before_sleep to verify
exponential backoff and max_delay capping without production hooks.
@Hartorn
Copy link
Member Author

Hartorn commented Feb 27, 2026

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a well-executed refactoring that replaces the mixin-based approach for retries and rate-limiting with a more flexible and composable middleware pipeline. This change significantly improves the architecture by decoupling cross-cutting concerns from the generator's inheritance hierarchy, which enhances modularity and testability. The implementation is clean, and the related tests have been thoroughly updated to reflect the new design. I have one minor suggestion to improve the robustness of a test helper function.

Narrow _retry_mw helper to LiteLLMRetryMiddleware and assert middleware
round-trips correctly in test_chat_workflow_serialization.
@Hartorn
Copy link
Member Author

Hartorn commented Feb 27, 2026

/gemini review

@Hartorn Hartorn marked this pull request as ready for review February 27, 2026 14:24
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the minimum Python version to 3.12 and introduces a middleware pipeline for generators to handle cross-cutting concerns like retries and rate limiting. The LiteLLMGenerator now includes a retry middleware by default, and custom middleware can be created by subclassing CompletionMiddleware. The code also removes the WithRateLimiter and WithRetryPolicy mixins. Review comments suggest communicating the breaking change of the Python version update to users, improving the efficiency of accessing RateLimiterMiddleware in tests by directly accessing it instead of iterating through the middleware list, adding a parameter to control middleware inclusion in the _make_generator function, and adding checks to ensure middleware exists before accessing its properties in helper functions.

Comment on lines +151 to +153
rl_mw = next(
mw for mw in generator.middleware if isinstance(mw, RateLimiterMiddleware)
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Medium: Instead of iterating through all middleware to find the RateLimiterMiddleware, it would be more efficient to directly access it if you know its position in the list or store a reference to it during initialization.

Comment on lines +161 to +165
rl_mw_copy = next(
mw
for mw in generator_with_params.middleware
if isinstance(mw, RateLimiterMiddleware)
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Medium: Similar to the previous comment, directly accessing the RateLimiterMiddleware would be more efficient than iterating through the list.

Comment on lines +33 to +39
def _make_generator(**retry_kwargs) -> MockGenerator:
mw = (
_RetriableOnlyMiddleware(**retry_kwargs)
if retry_kwargs
else _RetriableOnlyMiddleware()
)
return MockGenerator(middleware=[mw])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Medium: The _make_generator function creates a MockGenerator with retry middleware. Consider adding a parameter to control whether the middleware is included, providing more flexibility in test setup.

Comment on lines +20 to +21
def _retry_mw(gen: BaseGenerator) -> LiteLLMRetryMiddleware:
return next(mw for mw in gen.middleware if isinstance(mw, LiteLLMRetryMiddleware))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Medium: These helper functions are used to extract specific middleware from the generator's middleware list. Consider adding a check to ensure that the middleware exists before attempting to access its properties, preventing potential StopIteration errors.

Comment on lines +24 to +25
def _rl_mw(gen: BaseGenerator) -> RateLimiterMiddleware:
return next(mw for mw in gen.middleware if isinstance(mw, RateLimiterMiddleware))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Medium: These helper functions are used to extract specific middleware from the generator's middleware list. Consider adding a check to ensure that the middleware exists before attempting to access its properties, preventing potential StopIteration errors.

Copy link
Member

@mattbit mattbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The architectural change is solid and definitely an improvement, however the usability is impacted as we are making harder to configure retries and rate limiting on the fly.

E.g. say I have a Generator instance and I want to update the number of attempts, the current API is simple and predictable

generator.with_retries(3)

While with the proposed middleware:

# do we already have a retry middleware somewhere?
found = False
for i, mw in enumerate(generator.middleware):
    if isinstance(mw, RetryMiddleware):
        found = True
        # if we found it let's clone the generator and then replace the middleware in place
        new_generator = generator.model_copy()
        new_generator.middleware[i] = mw.model_copy(update={"max_attempts":5})

# otherwise, we'll add a new one
if not found:
    # where do I need to put the retry, at the beginning or at the end?
    # well it needs to be at the beginning, because otherwise it would break how the rate limiter middleware works
    new_generator = generator.model_copy()
    new_generator.middleware = [RetryMiddleware(max_attempts=5)] + generator.middleware

Overall, my feeling is that the middleware is flexible, but RateLimiter and Retry mechanisms are in practice very specific, for example they need to be added in precise order. It's easy to forget and add Retry after RateLimit.

I propose a compromise:

  • Keep the middleware chain as the internal architecture. This is solid and very effective for advanced extension (hooks, etc.)
  • Modify the _build_chain to add these two fundamental mechanisms. Some kind of built-in middleware, rough example:
def _build_chain(self, core: NextFn) -> NextFn:
    all_mw: list[CompletionMiddleware] = []
    if self._retry_mw:
        all_mw.append(self._retry_mw)
    if self._rate_limiter_mw:
        all_mw.append(self._rate_limiter_mw)
    all_mw.extend(self.middleware)
    return reduce(_wrap, reversed(all_mw), core)

(could probably be written better).
We can also keep exposing the internal configs via properties:

@property
def rate_limiter(self):
    return self._rate_limiter_mw.rate_limiter if self._rate_limiter_mw else None

And we can also implement the with_retries and with_rate_limiter.

In this way, we would have:

  • the convenient methods and easy access to a generator configuration (e.g. generator.retry_policy)
  • strong guarantees that these middlewares are executed in the right order
  • composable architecture with the middleware pattern
  • extensibility with the exposed middlewares

Let me know what you think or if you have better ideas.

Comment on lines +48 to +50
max_attempts: int = Field(default=3)
base_delay: float = Field(default=1.0)
max_delay: float | None = Field(default=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should retain the RetryPolicy object (which is the configuration) and encapsulate in the middleware, as done for the RateLimiter.

model: str = Field(
description="The model identifier to use (e.g. 'gemini/gemini-2.0-flash')"
)
middleware: list[CompletionMiddleware] = Field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

middlewares (plural)

mattbit and others added 3 commits March 3, 2026 18:59
…nce API

Extract RetryPolicy config model and give BaseGenerator dedicated
retry_policy and rate_limiter fields with guaranteed middleware ordering.
Adds with_retries() and with_rate_limiter() convenience methods.
Renames middleware -> middlewares for codebase consistency.
@Hartorn Hartorn requested a review from mattbit March 9, 2026 08:31
Copy link
Member

@mattbit mattbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattbit mattbit merged commit 088b203 into main Mar 11, 2026
23 checks passed
@mattbit mattbit deleted the feat/generator-middleware-pipeline branch March 11, 2026 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants