The challenge Strategy: context-aware cache keys Per-user cache isolation with TAG filters Cache isolation strategies

Multi-turn conversation caching

For applications with multi-turn conversations, the same user message can mean different things depending on context. For example, "Tell me more" in a conversation about Valkey means something different from "Tell me more" in a conversation about Python.

The challenge

Single-prompt caching works well for stateless queries. In multi-turn conversations, you must cache the full conversation context, not just the last message:


# "Tell me more" means nothing without context
# Conversation A: "What is Valkey?" -> "Tell me more"  (about Valkey)
# Conversation B: "What is Python?" -> "Tell me more"  (about Python)

Strategy: context-aware cache keys

Instead of embedding only the last user message, embed a summary of the full conversation context. This way, similar follow-up questions in similar conversation flows can reuse cached answers.


def build_context_string(messages: list) -> str:
    """Build a cacheable context string from conversation messages."""
    # Use last 3 turns (6 messages: user + assistant pairs)
    recent = messages[-6:]
    parts = []
    for msg in recent:
        role = msg["role"]
        content = msg["content"][:200]  # Truncate long messages
        parts.append(f"{role}: {content}")
    return " | ".join(parts)

Per-user cache isolation with TAG filters

Use TAG fields to isolate cached conversations by user, session, or other dimensions. This prevents one user's cached conversations from being returned for another user:


# Create index with TAG field for per-user isolation
valkey_client.execute_command(
    "FT.CREATE", "conv_cache_idx",
    "SCHEMA",
    "context_summary", "TEXT",
    "response", "TEXT",
    "user_id", "TAG",
    "turn_count", "NUMERIC",
    "embedding", "VECTOR", "HNSW", "6",
    "TYPE", "FLOAT32",
    "DIM", "1024",
    "DISTANCE_METRIC", "COSINE",
)

Search with hybrid filtering (TAG + KNN):


def lookup_conversation_cache(messages: list, user_id: str, threshold: float = 0.12):
    """Search cache for similar conversation contexts, scoped to a user.

    Note: FT.SEARCH with COSINE distance returns a distance score where
    0 = identical and 2 = opposite. A lower score means higher similarity.
    The threshold here is a maximum distance: only return results closer
    than this value.
    """
    context = build_context_string(messages)
    query_vec = get_embedding(context)

    # Hybrid search: filter by user_id TAG + KNN on context embedding
    results = valkey_client.execute_command(
        "FT.SEARCH", "conv_cache_idx",
        f"@user_id:{{{user_id}}}=>[KNN 1 @embedding $query_vec]",
        "PARAMS", "2", "query_vec", query_vec,
        "DIALECT", "2",
    )

    if results[0] > 0:
        fields = results[2]
        field_dict = {fields[j]: fields[j+1] for j in range(0, len(fields), 2)}
        distance = float(field_dict.get("__embedding_score", "999"))
        if distance < threshold:  # Lower distance = more similar
            return {"hit": True, "response": field_dict.get("response", ""), "distance": distance}

    return {"hit": False}

Note

The @user_id:{user_123} TAG filter ensures that User A's cached conversations don't leak to User B. The hybrid query (TAG + KNN) runs as a single atomic operation — pre-filtering by user, then finding the nearest conversation context.

Cache isolation strategies

Strategy	TAG filter	Best for
Per-user	`@user_id:{user_123}`	Personalized assistants
Per-session	`@session_id:{sess_abc}`	Short-lived chats
Global (shared)	No filter (`*`)	FAQ bots, common queries
Per-model	`@model:{gpt-4}`	Multi-model deployments
Per-product	`@product_id:{prod_456}`	E-commerce assistants

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Impact and benchmarks

Best practices