refactor(agents): replace mixin-based retry/rate-limiting with middle… by Hartorn · Pull Request #2281 · Giskard-AI/giskard-oss

Hartorn · 2026-02-27T09:16:00Z

…ware pipeline

Replace WithRetryPolicy and WithRateLimiter mixins with a composable CompletionMiddleware pipeline on BaseGenerator. This decouples cross-cutting concerns (retry, rate-limiting) from the generator inheritance hierarchy, making them stackable and independently testable.

Description

Related Issue

Type of Change

📚 Examples / docs / tutorials / dependencies update
🔧 Bug fix (non-breaking change which fixes an issue)
🥂 Improvement (non-breaking change which improves an existing feature)
🚀 New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to change)
🔐 Security fix

Checklist

I've read the CODE_OF_CONDUCT.md document.
I've read the CONTRIBUTING.md guide.
I've written tests for all new methods and classes that I created.
I've written the docstring in Google format for all the methods and classes that I used.
I've updated the pdm.lock running pdm update-lock (only applicable when pyproject.toml has been
modified)

…ware pipeline Replace WithRetryPolicy and WithRateLimiter mixins with a composable CompletionMiddleware pipeline on BaseGenerator. This decouples cross-cutting concerns (retry, rate-limiting) from the generator inheritance hierarchy, making them stackable and independently testable.

gemini-code-assist · 2026-02-27T09:16:25Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the agent generation system by transitioning from an inheritance-based mixin pattern for handling concerns like retries and rate-limiting to a more flexible and robust middleware pipeline. This change promotes better separation of concerns, allowing these functionalities to be composed and tested independently, ultimately leading to a more maintainable and extensible codebase for AI agent interactions.

Highlights

Refactored Retry and Rate-Limiting: The previous mixin-based approach for retry policies and rate-limiting (WithRetryPolicy, WithRateLimiter) has been replaced with a composable CompletionMiddleware pipeline. This decouples these cross-cutting concerns from the generator inheritance hierarchy.
Introduced CompletionMiddleware Pipeline: A new middleware.py file defines a CompletionMiddleware base class, RetryMiddleware, and RateLimiterMiddleware. The BaseGenerator now includes a middleware attribute and a _build_chain method to dynamically apply these middlewares.
Improved Modularity and Testability: The new middleware architecture makes retry and rate-limiting policies stackable, independently testable, and easier to manage, enhancing the overall modularity of the generator system.
Centralized Type Definitions: A new _types.py file was added to house shared type definitions like Response and GenerationParams, resolving circular dependencies and improving code organization.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

libs/giskard-agents/src/giskard/agents/generators/init.py
- Updated imports to reflect the new middleware structure, removing references to old mixins and adding new middleware classes.
- Modified the __all__ export list to include the new middleware components and remove the deprecated mixins.
libs/giskard-agents/src/giskard/agents/generators/_types.py
- Added a new file to define Response and GenerationParams models, breaking circular dependencies between base and middleware modules.
libs/giskard-agents/src/giskard/agents/generators/base.py
- Removed the in-file definitions of Response and GenerationParams, now imported from _types.py.
- Added a middleware field to BaseGenerator to hold a list of CompletionMiddleware instances.
- Implemented a _build_chain method to construct the middleware pipeline, applying middleware around the core _complete method.
- Updated the complete method to use the new middleware chain for processing requests.
- Modified batch_complete to call the public complete method, ensuring middleware is applied to each batch item.
- Updated type hints and docstrings for chat, template, and with_params methods to reflect ChatWorkflow and Self returns.
libs/giskard-agents/src/giskard/agents/generators/litellm_generator.py
- Removed inheritance from WithRateLimiter and WithRetryPolicy mixins.
- Introduced LiteLLMRetryMiddleware as a new CompletionMiddleware subclass, specifically for LiteLLM's retry logic.
- Updated LiteLLMGenerator to include LiteLLMRetryMiddleware in its default middleware list.
- Refactored the _complete method to remove the _throttle context manager and directly call acompletion, as rate limiting is now handled by middleware.
libs/giskard-agents/src/giskard/agents/generators/middleware.py
- Added a new file to define the CompletionMiddleware abstract base class.
- Implemented RetryMiddleware using tenacity for exponential back-off, allowing custom retry conditions.
- Implemented RateLimiterMiddleware to throttle completions using a BaseRateLimiter.
libs/giskard-agents/src/giskard/agents/generators/rate_limiting.py
- Removed the file, as rate-limiting functionality has been migrated to middleware.py.
libs/giskard-agents/src/giskard/agents/generators/retries.py
- Removed the file, as retry functionality has been migrated to middleware.py.
libs/giskard-agents/tests/test_generator.py
- Updated imports to include RateLimiterMiddleware.
- Modified test_litellm_generator_gets_rate_limiter to instantiate LiteLLMGenerator with RateLimiterMiddleware in its middleware list.
- Adjusted assertions in test_generator_with_params_and_rate_limiter to verify rate limiter presence via the new middleware structure.
libs/giskard-agents/tests/test_generator_retry.py
- Updated imports to remove old retry mixins and include RetryMiddleware.
- Refactored MockGenerator to no longer inherit from WithRetryPolicy and to accept middleware.
- Introduced _RetriableOnlyMiddleware to simulate specific retry conditions.
- Modified test cases (test_raises_exception_after_retries_exhausted, test_raises_exception_if_not_retriable, test_retries_with_result, test_retries_works_with_batch_complete, test_retries_with_max_delay, test_retries_exponential_backoff) to use the new middleware-based retry mechanism and removed assertions related to _sleep_times.
libs/giskard-agents/tests/test_serialization.py
- Updated imports to include new middleware classes (LiteLLMRetryMiddleware, RateLimiterMiddleware, RetryMiddleware).
- Added helper functions _retry_mw and _rl_mw to easily access middleware instances for testing.
- Modified test_generator_serialization to instantiate Generator with the new middleware list containing LiteLLMRetryMiddleware and RateLimiterMiddleware.
- Adjusted assertions in serialization tests to correctly verify middleware properties instead of old mixin attributes.
- Removed redundant comments about verification steps.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is an excellent refactoring that replaces the mixin-based approach for retries and rate-limiting with a more flexible and composable middleware pipeline. The new architecture decouples cross-cutting concerns from the generator's inheritance hierarchy, which greatly improves modularity and testability. The implementation of the CompletionMiddleware pipeline is clean and robust, using functools.reduce for an elegant composition of middleware. The changes are consistently applied across the codebase, including the tests, which have been updated to reflect the new design. This is a high-quality contribution that significantly improves the agent generator's architecture.

libs/giskard-agents/tests/test_serialization.py

…d docs Update requires-python to 3.12, fix stale docstrings and types in base generator, align tests with new middleware API (remove RetryPolicy, with_retries, with_rate_limiter), and document the middleware pipeline in the README.

…o.sleep mock Patch asyncio.sleep instead of overriding _before_sleep to verify exponential backoff and max_delay capping without production hooks.

Hartorn · 2026-02-27T09:37:19Z

/gemini review

gemini-code-assist

Code Review

This pull request is a well-executed refactoring that replaces the mixin-based approach for retries and rate-limiting with a more flexible and composable middleware pipeline. This change significantly improves the architecture by decoupling cross-cutting concerns from the generator's inheritance hierarchy, which enhances modularity and testability. The implementation is clean, and the related tests have been thoroughly updated to reflect the new design. I have one minor suggestion to improve the robustness of a test helper function.

libs/giskard-agents/tests/test_serialization.py

Narrow _retry_mw helper to LiteLLMRetryMiddleware and assert middleware round-trips correctly in test_chat_workflow_serialization.

Hartorn · 2026-02-27T14:24:31Z

/gemini review

gemini-code-assist

Code Review

This pull request updates the minimum Python version to 3.12 and introduces a middleware pipeline for generators to handle cross-cutting concerns like retries and rate limiting. The LiteLLMGenerator now includes a retry middleware by default, and custom middleware can be created by subclassing CompletionMiddleware. The code also removes the WithRateLimiter and WithRetryPolicy mixins. Review comments suggest communicating the breaking change of the Python version update to users, improving the efficiency of accessing RateLimiterMiddleware in tests by directly accessing it instead of iterating through the middleware list, adding a parameter to control middleware inclusion in the _make_generator function, and adding checks to ensure middleware exists before accessing its properties in helper functions.

libs/giskard-agents/README.md

gemini-code-assist · 2026-02-27T14:29:00Z

libs/giskard-agents/tests/test_generator.py

+    rl_mw = next(
+        mw for mw in generator.middleware if isinstance(mw, RateLimiterMiddleware)
+    )


Medium: Instead of iterating through all middleware to find the RateLimiterMiddleware, it would be more efficient to directly access it if you know its position in the list or store a reference to it during initialization.

gemini-code-assist · 2026-02-27T14:29:00Z

libs/giskard-agents/tests/test_generator.py

+    rl_mw_copy = next(
+        mw
+        for mw in generator_with_params.middleware
+        if isinstance(mw, RateLimiterMiddleware)
+    )


Medium: Similar to the previous comment, directly accessing the RateLimiterMiddleware would be more efficient than iterating through the list.

gemini-code-assist · 2026-02-27T14:29:00Z

libs/giskard-agents/tests/test_generator_retry.py

+def _make_generator(**retry_kwargs) -> MockGenerator:
+    mw = (
+        _RetriableOnlyMiddleware(**retry_kwargs)
+        if retry_kwargs
+        else _RetriableOnlyMiddleware()
+    )
+    return MockGenerator(middleware=[mw])


Medium: The _make_generator function creates a MockGenerator with retry middleware. Consider adding a parameter to control whether the middleware is included, providing more flexibility in test setup.

gemini-code-assist · 2026-02-27T14:29:00Z

libs/giskard-agents/tests/test_serialization.py

+def _retry_mw(gen: BaseGenerator) -> LiteLLMRetryMiddleware:
+    return next(mw for mw in gen.middleware if isinstance(mw, LiteLLMRetryMiddleware))


Medium: These helper functions are used to extract specific middleware from the generator's middleware list. Consider adding a check to ensure that the middleware exists before attempting to access its properties, preventing potential StopIteration errors.

gemini-code-assist · 2026-02-27T14:29:00Z

libs/giskard-agents/tests/test_serialization.py

+def _rl_mw(gen: BaseGenerator) -> RateLimiterMiddleware:
+    return next(mw for mw in gen.middleware if isinstance(mw, RateLimiterMiddleware))


Medium: These helper functions are used to extract specific middleware from the generator's middleware list. Consider adding a check to ensure that the middleware exists before attempting to access its properties, preventing potential StopIteration errors.

mattbit

The architectural change is solid and definitely an improvement, however the usability is impacted as we are making harder to configure retries and rate limiting on the fly.

E.g. say I have a Generator instance and I want to update the number of attempts, the current API is simple and predictable

generator.with_retries(3)

While with the proposed middleware:

# do we already have a retry middleware somewhere?
found = False
for i, mw in enumerate(generator.middleware):
    if isinstance(mw, RetryMiddleware):
        found = True
        # if we found it let's clone the generator and then replace the middleware in place
        new_generator = generator.model_copy()
        new_generator.middleware[i] = mw.model_copy(update={"max_attempts":5})

# otherwise, we'll add a new one
if not found:
    # where do I need to put the retry, at the beginning or at the end?
    # well it needs to be at the beginning, because otherwise it would break how the rate limiter middleware works
    new_generator = generator.model_copy()
    new_generator.middleware = [RetryMiddleware(max_attempts=5)] + generator.middleware

Overall, my feeling is that the middleware is flexible, but RateLimiter and Retry mechanisms are in practice very specific, for example they need to be added in precise order. It's easy to forget and add Retry after RateLimit.

I propose a compromise:

Keep the middleware chain as the internal architecture. This is solid and very effective for advanced extension (hooks, etc.)
Modify the _build_chain to add these two fundamental mechanisms. Some kind of built-in middleware, rough example:

def _build_chain(self, core: NextFn) -> NextFn:
    all_mw: list[CompletionMiddleware] = []
    if self._retry_mw:
        all_mw.append(self._retry_mw)
    if self._rate_limiter_mw:
        all_mw.append(self._rate_limiter_mw)
    all_mw.extend(self.middleware)
    return reduce(_wrap, reversed(all_mw), core)

(could probably be written better).
We can also keep exposing the internal configs via properties:

@property
def rate_limiter(self):
    return self._rate_limiter_mw.rate_limiter if self._rate_limiter_mw else None

And we can also implement the with_retries and with_rate_limiter.

In this way, we would have:

the convenient methods and easy access to a generator configuration (e.g. generator.retry_policy)
strong guarantees that these middlewares are executed in the right order
composable architecture with the middleware pattern
extensibility with the exposed middlewares

Let me know what you think or if you have better ideas.

mattbit · 2026-03-03T17:24:15Z

libs/giskard-agents/src/giskard/agents/generators/middleware.py

+    max_attempts: int = Field(default=3)
+    base_delay: float = Field(default=1.0)
+    max_delay: float | None = Field(default=None)


We should retain the RetryPolicy object (which is the configuration) and encapsulate in the middleware, as done for the RateLimiter.

mattbit · 2026-03-03T17:59:16Z

libs/giskard-agents/src/giskard/agents/generators/litellm_generator.py

    model: str = Field(
        description="The model identifier to use (e.g. 'gemini/gemini-2.0-flash')"
    )
+    middleware: list[CompletionMiddleware] = Field(


middlewares (plural)

…nce API Extract RetryPolicy config model and give BaseGenerator dedicated retry_policy and rate_limiter fields with guaranteed middleware ordering. Adds with_retries() and with_rate_limiter() convenience methods. Renames middleware -> middlewares for codebase consistency.

mattbit

Thanks!

Hartorn self-assigned this Feb 27, 2026

github-actions bot added the Scope: Agents label Feb 27, 2026

gemini-code-assist bot reviewed Feb 27, 2026

View reviewed changes

libs/giskard-agents/tests/test_serialization.py Show resolved Hide resolved

Hartorn added 2 commits February 27, 2026 10:29

test(agents): restore sleep-time assertions in retry tests via asynci…

0e25fe8

…o.sleep mock Patch asyncio.sleep instead of overriding _before_sleep to verify exponential backoff and max_delay capping without production hooks.

gemini-code-assist bot reviewed Feb 27, 2026

View reviewed changes

libs/giskard-agents/tests/test_serialization.py Outdated Show resolved Hide resolved

test(agents): tighten serialization test assertions per review feedback

166601d

Narrow _retry_mw helper to LiteLLMRetryMiddleware and assert middleware round-trips correctly in test_chat_workflow_serialization.

Hartorn marked this pull request as ready for review February 27, 2026 14:24

gemini-code-assist bot reviewed Feb 27, 2026

View reviewed changes

Merge branch 'main' into feat/generator-middleware-pipeline

198bdb9

mattbit requested changes Mar 3, 2026

View reviewed changes

mattbit and others added 3 commits March 3, 2026 18:59

Merge branch 'main' into feat/generator-middleware-pipeline

3c5c709

Merge branch 'main' into feat/generator-middleware-pipeline

5e69aed

Hartorn requested a review from mattbit March 9, 2026 08:31

Merge branch 'main' into feat/generator-middleware-pipeline

afd2864

mattbit approved these changes Mar 11, 2026

View reviewed changes

mattbit merged commit 088b203 into main Mar 11, 2026
23 checks passed

mattbit deleted the feat/generator-middleware-pipeline branch March 11, 2026 08:47

		def _retry_mw(gen: BaseGenerator) -> LiteLLMRetryMiddleware:
		return next(mw for mw in gen.middleware if isinstance(mw, LiteLLMRetryMiddleware))

		def _rl_mw(gen: BaseGenerator) -> RateLimiterMiddleware:
		return next(mw for mw in gen.middleware if isinstance(mw, RateLimiterMiddleware))

Uh oh!

Conversation

Hartorn commented Feb 27, 2026

Description

Related Issue

Type of Change

Checklist

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Hartorn commented Feb 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Hartorn commented Feb 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

mattbit left a comment

Choose a reason for hiding this comment

Uh oh!

mattbit Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

mattbit Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

mattbit left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants