Skip to main content

Rate Limiting Guide

FerricLink provides comprehensive rate limiting functionality similar to LangChain's rate_limiters.py, with Rust-specific optimizations and additional features.

Overview

Rate limiting is essential for managing API calls to language models and other services that have usage restrictions. FerricLink's rate limiting system uses a token bucket algorithm that allows for both steady-state rate limiting and burst capacity.

Basic Usage

InMemoryRateLimiter

The InMemoryRateLimiter is the core rate limiting implementation, based on a token bucket algorithm:

use ferriclink_core::{InMemoryRateLimiter, BaseRateLimiter};

// Create a rate limiter: 1 request per second, max burst of 2
let rate_limiter = InMemoryRateLimiter::new(1.0, 0.1, 2.0);

// Acquire a token (blocking)
let acquired = rate_limiter.acquire(true)?;
if acquired {
// Make your API call
println!("Request allowed");
} else {
println!("Request denied");
}

// Acquire a token (non-blocking)
let acquired = rate_limiter.acquire(false)?;

Async Usage

use ferriclink_core::{InMemoryRateLimiter, BaseRateLimiter};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let rate_limiter = InMemoryRateLimiter::new(2.0, 0.1, 5.0);

// Async acquire (blocking)
let acquired = rate_limiter.aacquire(true).await?;

// Async acquire (non-blocking)
let acquired = rate_limiter.aacquire(false).await?;

Ok(())
}

Advanced Rate Limiting

AdvancedRateLimiter with Retry Logic

The AdvancedRateLimiter adds retry logic and exponential backoff:

use ferriclink_core::{AdvancedRateLimiter, RateLimiterConfig};
use std::time::Duration;

let config = RateLimiterConfig {
use_exponential_backoff: true,
max_backoff_duration: Duration::from_secs(10),
initial_backoff_duration: Duration::from_millis(100),
max_retries: 5,
log_events: true,
};

let rate_limiter = AdvancedRateLimiter::new(1.0, 0.1, 2.0, config);

// This will automatically retry with exponential backoff
let acquired = rate_limiter.aacquire(true).await?;

Configuration Options

use ferriclink_core::RateLimiterConfig;
use std::time::Duration;

let config = RateLimiterConfig {
// Enable exponential backoff on failures
use_exponential_backoff: true,

// Maximum backoff duration
max_backoff_duration: Duration::from_secs(60),

// Initial backoff duration
initial_backoff_duration: Duration::from_millis(100),

// Maximum number of retries
max_retries: 10,

// Enable logging of rate limiting events
log_events: true,
};

Rate Limiting Strategies

Conservative Rate Limiting

For production environments with strict rate limits:

let conservative = InMemoryRateLimiter::new(
1.0, // 1 request per second
0.1, // Check every 100ms
2.0 // Max burst of 2 requests
);

Aggressive Rate Limiting

For testing or development with higher limits:

let aggressive = InMemoryRateLimiter::new(
10.0, // 10 requests per second
0.01, // Check every 10ms
5.0 // Max burst of 5 requests
);

Adaptive Rate Limiting

For dynamic environments with retry logic:

let config = RateLimiterConfig {
use_exponential_backoff: true,
max_backoff_duration: Duration::from_secs(5),
initial_backoff_duration: Duration::from_millis(50),
max_retries: 10,
log_events: true,
};

let adaptive = AdvancedRateLimiter::new(2.0, 0.05, 3.0, config);

LangChain Compatibility

FerricLink's rate limiters are designed to be compatible with LangChain's interface:

use ferriclink_core::{InMemoryRateLimiter, BaseRateLimiter};

// LangChain-compatible usage
let rate_limiter = InMemoryRateLimiter::new(0.1, 0.1, 1.0); // 1 request every 10 seconds

// Simulate LangChain model calls
for i in 1..=3 {
println!("Call {}: Acquiring rate limit token...", i);

let acquire_start = std::time::Instant::now();
let acquired = rate_limiter.aacquire(true).await?;
let acquire_duration = acquire_start.elapsed();

if acquired {
println!("Call {}: Token acquired (waited {:?})", i, acquire_duration);

// Simulate the actual model call
println!("Call {}: Making model request...", i);
tokio::time::sleep(Duration::from_millis(100)).await;
println!("Call {}: Model request completed", i);
} else {
println!("Call {}: Failed to acquire token", i);
}
}

Serialization and Configuration

Saving Rate Limiter Configuration

use ferriclink_core::{InMemoryRateLimiter, InMemoryRateLimiterConfig};

let rate_limiter = InMemoryRateLimiter::new(2.0, 0.1, 5.0);

// Convert to serializable config
let config = rate_limiter.to_config();

// Serialize to JSON
let json = serde_json::to_string_pretty(&config)?;
println!("Rate limiter config: {}", json);

// Save to file
std::fs::write("rate_limiter_config.json", json)?;

Loading Rate Limiter Configuration

use ferriclink_core::{InMemoryRateLimiter, InMemoryRateLimiterConfig};

// Load from file
let json = std::fs::read_to_string("rate_limiter_config.json")?;
let config: InMemoryRateLimiterConfig = serde_json::from_str(&json)?;

// Create rate limiter from config
let rate_limiter = InMemoryRateLimiter::from_config(config);

Monitoring and Debugging

Check Available Tokens

let rate_limiter = InMemoryRateLimiter::new(1.0, 0.1, 2.0);

// Check current token count
let tokens = rate_limiter.available_tokens().await;
println!("Available tokens: {:.2}", tokens);

// Check rate limiter properties
println!("Requests per second: {}", rate_limiter.requests_per_second());
println!("Max bucket size: {}", rate_limiter.max_bucket_size());
println!("Check interval: {}s", rate_limiter.check_every_n_seconds());

Performance Monitoring

use std::time::Instant;

let rate_limiter = InMemoryRateLimiter::new(100.0, 0.001, 10.0);
let mut successful = 0;
let mut failed = 0;

let start = Instant::now();

for i in 1..=100 {
let acquired = rate_limiter.aacquire(false).await?;
if acquired {
successful += 1;
} else {
failed += 1;
}

if i % 10 == 0 {
println!("Progress: {}/100 ({} successful, {} failed)", i, successful, failed);
}
}

let duration = start.elapsed();
let requests_per_second = 100.0 / duration.as_secs_f64();

println!("Results:");
println!(" - Successful: {}", successful);
println!(" - Failed: {}", failed);
println!(" - Total time: {:?}", duration);
println!(" - Effective rate: {:.2} requests/second", requests_per_second);

Error Handling

Rate Limit Errors

use ferriclink_core::{FerricLinkError, ErrorCode};

match rate_limiter.aacquire(false).await {
Ok(true) => {
// Request allowed
make_api_call().await?;
}
Ok(false) => {
// Request denied, try again later
println!("Rate limited, retrying later...");
tokio::time::sleep(Duration::from_millis(100)).await;
}
Err(e) => {
// Handle error
if let Some(code) = e.error_code() {
match code {
ErrorCode::ModelRateLimit => {
println!("Model rate limit exceeded");
}
_ => {
println!("Other error: {}", e);
}
}
}
}
}

Retry with Exponential Backoff

use ferriclink_core::AdvancedRateLimiter;

let config = RateLimiterConfig {
use_exponential_backoff: true,
max_backoff_duration: Duration::from_secs(60),
initial_backoff_duration: Duration::from_millis(100),
max_retries: 5,
log_events: true,
};

let rate_limiter = AdvancedRateLimiter::new(1.0, 0.1, 2.0, config);

// This will automatically handle retries with exponential backoff
let acquired = rate_limiter.aacquire(true).await?;

Best Practices

1. Choose Appropriate Rates

  • Conservative: Use lower rates for production APIs with strict limits
  • Aggressive: Use higher rates for testing or APIs with generous limits
  • Adaptive: Use retry logic for unreliable or variable-rate APIs

2. Monitor Performance

  • Track successful vs failed requests
  • Monitor effective request rates
  • Adjust parameters based on actual usage patterns

3. Handle Errors Gracefully

  • Implement proper error handling for rate limit failures
  • Use exponential backoff for retries
  • Log rate limiting events for debugging

4. Use Configuration Files

  • Save rate limiter configurations to files
  • Load configurations at runtime
  • Allow easy adjustment without code changes

5. Test Thoroughly

  • Test with different rate limiting scenarios
  • Verify behavior under load
  • Ensure proper token accumulation and consumption