Caching Guide
FerricLink provides comprehensive caching functionality similar to LangChain's caches.py
, with Rust-specific optimizations and additional features.
Overview
Caching is essential for optimizing LLM applications by:
- Reducing API costs by avoiding duplicate requests
- Improving response times for repeated queries
- Providing resilience against API failures
- Enabling offline development and testing
Basic Usage
InMemoryCache
The InMemoryCache
is the core caching implementation, storing cached values in memory:
use ferriclink_core::{InMemoryCache, BaseCache};
// Create a cache with no size limit
let cache = InMemoryCache::new();
// Create a cache with size limit (LRU eviction)
let cache = InMemoryCache::with_max_size(Some(100));
Basic Operations
use ferriclink_core::{InMemoryCache, BaseCache};
let cache = InMemoryCache::new();
let prompt = "What's the weather?";
let llm_string = "gpt-4o-mini:temperature=0.7";
// Look up cached result
if let Some(cached_result) = cache.lookup(prompt, llm_string)? {
println!("Cache hit: {}", cached_result[0].text);
} else {
// Make expensive LLM call
let generations = expensive_llm_call(prompt).await;
cache.update(prompt, llm_string, generations)?;
}
Async Operations
use ferriclink_core::{InMemoryCache, BaseCache};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let cache = InMemoryCache::new();
// Async lookup
if let Some(cached_result) = cache.alookup(prompt, llm_string).await? {
println!("Cache hit: {}", cached_result[0].text);
} else {
// Make expensive LLM call
let generations = expensive_llm_call(prompt).await;
cache.aupdate(prompt, llm_string, generations).await?;
}
Ok(())
}
Advanced Caching
TTL Cache
The TtlCache
adds time-to-live functionality:
use ferriclink_core::{TtlCache, BaseCache};
use std::time::Duration;
// Create a TTL cache with 1 hour expiration
let cache = TtlCache::new(Duration::from_secs(3600), None);
// Entries automatically expire after 1 hour
cache.aupdate(prompt, llm_string, generations).await?;
Cache Statistics
Monitor cache performance with built-in statistics:
use ferriclink_core::{InMemoryCache, BaseCache};
let cache = InMemoryCache::new();
// ... use cache ...
let stats = cache.stats().await;
println!("Hit Rate: {:.1}%", stats.hit_rate());
println!("Total Requests: {}", stats.total_requests());
println!("Cache Size: {}", stats.current_size);
Integration with Language Models
Basic Integration Pattern
use ferriclink_core::{InMemoryCache, BaseCache, language_models::Generation};
struct CachedLLM {
cache: InMemoryCache,
// ... other fields
}
impl CachedLLM {
async fn generate_with_cache(
&self,
prompt: &str,
llm_string: &str,
) -> Result<Vec<Generation>> {
// Check cache first
if let Some(cached) = self.cache.alookup(prompt, llm_string).await? {
return Ok(cached);
}
// Make LLM call
let generations = self.make_llm_call(prompt).await?;
// Cache the result
self.cache.aupdate(prompt, llm_string, generations.clone()).await?;
Ok(generations)
}
}
Advanced Integration with Error Handling
use ferriclink_core::{InMemoryCache, BaseCache, errors::Result};
struct RobustCachedLLM {
cache: InMemoryCache,
fallback_cache: Option<InMemoryCache>,
}
impl RobustCachedLLM {
async fn generate_with_fallback(
&self,
prompt: &str,
llm_string: &str,
) -> Result<Vec<Generation>> {
// Try primary cache
if let Some(cached) = self.cache.alookup(prompt, llm_string).await? {
return Ok(cached);
}
// Try fallback cache if available
if let Some(fallback) = &self.fallback_cache {
if let Some(cached) = fallback.alookup(prompt, llm_string).await? {
// Update primary cache
self.cache.aupdate(prompt, llm_string, cached.clone()).await?;
return Ok(cached);
}
}
// Make LLM call
let generations = self.make_llm_call(prompt).await?;
// Update both caches
self.cache.aupdate(prompt, llm_string, generations.clone()).await?;
if let Some(fallback) = &self.fallback_cache {
let _ = fallback.aupdate(prompt, llm_string, generations.clone()).await;
}
Ok(generations)
}
}
Performance Optimization
Cache Key Strategy
The cache uses (prompt, llm_string)
as the key. Optimize by:
// Good: Include all relevant parameters in llm_string
let llm_string = format!("gpt-4o-mini:temperature={}:max_tokens={}",
temperature, max_tokens);
// Bad: Missing parameters can cause cache misses
let llm_string = "gpt-4o-mini";
Memory Management
// Monitor cache size
let stats = cache.stats().await;
if stats.current_size > 1000 {
println!("Cache is getting large: {} entries", stats.current_size);
}
// Clear cache when needed
cache.clear().await?;
Batch Operations
// Process multiple prompts efficiently
async fn process_prompts_batch(
cache: &InMemoryCache,
prompts: Vec<&str>,
llm_string: &str,
) -> Result<Vec<Vec<Generation>>> {
let mut results = Vec::new();
for prompt in prompts {
if let Some(cached) = cache.alookup(prompt, llm_string).await? {
results.push(cached);
} else {
let generations = expensive_llm_call(prompt).await;
cache.aupdate(prompt, llm_string, generations.clone()).await?;
results.push(generations);
}
}
Ok(results)
}
Best Practices
1. Choose Appropriate Cache Size
// For development: small cache
let dev_cache = InMemoryCache::with_max_size(Some(100));
// For production: larger cache
let prod_cache = InMemoryCache::with_max_size(Some(10000));
2. Use TTL for Time-Sensitive Data
// Short TTL for real-time data
let weather_cache = TtlCache::new(Duration::from_secs(300), None); // 5 minutes
// Longer TTL for stable data
let knowledge_cache = TtlCache::new(Duration::from_secs(3600), None); // 1 hour
3. Monitor Cache Performance
async fn monitor_cache_performance(cache: &InMemoryCache) {
let stats = cache.stats().await;
if stats.hit_rate() < 50.0 {
println!("Warning: Low cache hit rate: {:.1}%", stats.hit_rate());
}
if stats.current_size > 10000 {
println!("Warning: Cache size is large: {}", stats.current_size);
}
}
4. Handle Cache Failures Gracefully
async fn safe_cache_lookup(
cache: &InMemoryCache,
prompt: &str,
llm_string: &str,
) -> Result<Option<Vec<Generation>>> {
match cache.alookup(prompt, llm_string).await {
Ok(result) => Ok(result),
Err(e) => {
eprintln!("Cache lookup failed: {}", e);
Ok(None) // Continue without cache
}
}
}
Comparison with LangChain
Feature | LangChain Python | FerricLink Rust |
---|---|---|
Base Interface | BaseCache | BaseCache |
In-Memory Cache | InMemoryCache | InMemoryCache |
Size Limits | ✅ | ✅ (with LRU) |
TTL Support | ❌ | ✅ (TtlCache ) |
Statistics | ❌ | ✅ (CacheStats ) |
Thread Safety | ✅ | ✅ (Arc<RwLock>) |
Async Support | ✅ | ✅ |
Memory Efficiency | Medium | High |
Performance | Medium | High |
Troubleshooting
Common Issues
- Low Hit Rate: Check if cache keys are consistent
- Memory Usage: Monitor cache size and use appropriate limits
- Stale Data: Use TTL cache for time-sensitive data
- Thread Safety: Ensure proper async/await usage
Debug Cache Behavior
async fn debug_cache(cache: &InMemoryCache, prompt: &str, llm_string: &str) {
println!("Cache key: '{}' + '{}'", prompt, llm_string);
let stats_before = cache.stats().await;
let result = cache.alookup(prompt, llm_string).await.unwrap();
let stats_after = cache.stats().await;
println!("Before: hits={}, misses={}", stats_before.hits, stats_before.misses);
println!("After: hits={}, misses={}", stats_after.hits, stats_after.misses);
println!("Result: {:?}", result.is_some());
}
Examples
See the cache usage example for a complete working demonstration of all caching features.