Recallr seamlessly integrates with Google Gemini by acting as a forward proxy. Configure your Gemini client to use our proxy URL and we’ll inject relevant context from user memory into each request.
Quick Start
from google import genai
from google.genai import types
client = genai.Client(
api_key = 'YOUR_GEMINI_API_KEY' ,
http_options = types.HttpOptions(
client_args = {
'base_url' : 'https://api.recallrai.com/api/v1/forward/https://generativelanguage.googleapis.com' ,
'headers' : {
'X-Recallr-API-Key' : 'rai-...' ,
'X-Recallr-Project-Id' : 'your-project-id' ,
'X-Recallr-Allow-New-User-Creation' : 'true' ,
'X-Recallr-Session-Timeout-Seconds' : '600' ,
}
}
)
)
# Use normally - memory is automatically injected
response = client.models.generate_content(
model = 'gemini-2.5-pro' ,
contents = 'My name is Alice and I love Python programming.' ,
config = types.GenerateContentConfig(
system_instruction = 'You are a helpful assistant.' ,
thinking_config = types.ThinkingConfig( thinking_budget = 0 ) # Optional
}
)
print (response.text)
Supported APIs
Generate Content Standard text generation with non-streaming support
Generate Content Stream Real-time streaming responses for interactive experiences
These headers must be included via the client_args
configuration:
Your Recallr API key. Get it from the dashboard .
Your Recallr Project ID. Get it from the dashboard .
Unique identifier for the user. Used to maintain separate memory graphs per user. Can also be passed as user
field in the request body for OpenAI compatibility.
Session Management
X-Recallr-Allow-New-User-Creation
Automatically create a new user if the specified User-ID doesn’t exist. Set to true
to avoid errors for new users.
X-Recallr-Session-Timeout-Seconds
Inactivity period (in seconds) before creating a new session. Minimum value is 600 (10 minutes). Messages within a session are always passed directly to the LLM. Only memories from previous sessions are retrieved and injected as context.
Continue a specific past session by providing its ID. Get session IDs from response headers.
Recall Configuration
X-Recallr-Recall-Strategy
Controls the recall method used for retrieving memories. Affects latency and accuracy. low_latency
balanced
deep
Best for: Voice agents and real-time applications
Fastest response time
Retrieves more memories to compensate for reduced accuracy
Use when sub-second latency is critical
Minimum number of memories to retrieve from the knowledge graph.
Maximum number of memories to retrieve from the knowledge graph.
X-Recallr-Memories-Threshold
Similarity threshold for retrieving individual memories (0.0 to 1.0). Lower values retrieve more memories.
X-Recallr-Summaries-Threshold
Similarity threshold for retrieving session summaries (0.0 to 1.0). Lower values retrieve more summaries.
X-Recallr-Last-N-User-Messages
Include only the last N messages from past sessions when building context.
X-Recallr-Last-N-Summaries
Include only the last N session summaries when building context.
User’s timezone for temporal context (e.g., “America/New_York”). Helps with time-based memories.
X-Recallr-Include-System-Prompt
Whether to include Recallr AI’s system prompt (~ 3k tokens) in the context. This prompt includes instructions for how to use the injected memories. Set to false
if you already have those instructions in your system prompt.
Recallr returns these headers in the response for debugging and session tracking:
The internal session ID used by Recallr. Use this to continue the same session in future requests.
Unique identifier for the user. Matches the X-Recallr-User-Id
sent in the request.
Unique identifier for this request. Use for debugging and tracing.
Time taken to process the request on Recallr’s side (in milliseconds).
Examples
Chat Completions - Non-Streaming
from google import genai
from google.genai import types
client = genai.Client(
api_key = 'YOUR_GEMINI_API_KEY' ,
http_options = types.HttpOptions(
client_args = {
'base_url' : 'https://api.recallrai.com/api/v1/forward/https://generativelanguage.googleapis.com' ,
'headers' : {
'X-Recallr-API-Key' : 'rai-...' ,
'X-Recallr-Project-Id' : 'project-id' ,
'X-Recallr-Allow-New-User-Creation' : 'true' ,
'X-Recallr-Session-Timeout-Seconds' : '600' ,
}
}
)
)
# Get raw response with headers
raw_response = client.chat.completions.with_raw_response.create(
model = "gpt-4o-mini" ,
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." },
{ "role" : "user" , "content" : "My name is Alice and I love Python." }
],
extra_headers = {
'X-Recallr-User-Id' : 'alice-123' ,
'X-Recallr-Recall-Strategy' : 'low_latency' , # Optional
}
)
# Access headers
session_id = raw_response.headers.get( 'X-Recallr-Session-Id' )
request_id = raw_response.headers.get( 'X-Recallr-Request-Id' )
# Parse response
response = raw_response.parse()
print (response.choices[ 0 ].message.content)
Chat Completions - Streaming
from openai import OpenAI
client = OpenAI(
base_url = 'https://api.recallrai.com/api/v1/forward/https://api.openai.com/v1' ,
api_key = 'sk-...' ,
default_headers = {
'X-Recallr-API-Key' : 'rai-...' ,
'X-Recallr-Project-Id' : 'project-id' ,
'X-Recallr-Allow-New-User-Creation' : 'true' ,
'X-Recallr-Session-Timeout-Seconds' : '600' , # Optional
}
)
response = client.chat.completions.create(
model = "gpt-4o-mini" ,
messages = [
{ "role" : "user" , "content" : "Tell me a joke about programming" }
],
stream = True ,
extra_headers = {
'X-Recallr-User-Id' : 'alice-123' ,
'X-Recallr-Recall-Strategy' : 'low_latency' ,
}
)
for chunk in response:
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = '' , flush = True )
Responses API - Non-Streaming
from openai import OpenAI
client = OpenAI(
base_url = 'https://api.recallrai.com/api/v1/forward/https://api.openai.com/v1' ,
api_key = 'sk-...' ,
default_headers = {
'X-Recallr-API-Key' : 'rai-...' ,
'X-Recallr-Project-Id' : 'project-id' ,
'X-Recallr-Allow-New-User-Creation' : 'true' ,
'X-Recallr-Session-Timeout-Seconds' : '600' , # Optional
}
)
response = client.responses.create(
model = "gpt-4o-mini" ,
input = "I'm allergic to peanuts and love Italian food" ,
max_output_tokens = 150 ,
extra_headers = {
'X-Recallr-User-Id' : 'alice-123' ,
'X-Recallr-Recall-Strategy' : 'low_latency' ,
}
)
print (response.output[ 0 ].content[ 0 ].text)
Responses API - Streaming
from openai import OpenAI
client = OpenAI(
base_url = 'https://api.recallrai.com/api/v1/forward/https://api.openai.com/v1' ,
api_key = 'sk-...' ,
default_headers = {
'X-Recallr-API-Key' : 'rai-...' ,
'X-Recallr-Project-Id' : 'project-id' ,
'X-Recallr-Allow-New-User-Creation' : 'true' ,
'X-Recallr-Session-Timeout-Seconds' : '600' , # Optional
}
)
response = client.responses.create(
model = "gpt-4o-mini" ,
input = "What are my food preferences?" ,
stream = True ,
extra_headers = {
'X-Recallr-User-Id' : 'alice-123' ,
'X-Recallr-Recall-Strategy' : 'low_latency' ,
}
)
for event in response:
if event.type == 'response.output_text.delta' :
print (event.delta, end = '' , flush = True )
How It Works
Need Help? Contact our support team for assistance with OpenAI integration