Skip to content

KnowledgeΒΆ

Pluggable knowledge retrieval with LRU caching and thread-safe operations.

Register custom retrieval functions, query them by name, and benefit from automatic caching via an OrderedDict-based LRU cache.

v2.0 Improvements

KnowledgeRetriever now uses an OrderedDict LRU cache (max 1024 entries), threading.Lock for thread safety, and structured logging.


OverviewΒΆ

The KnowledgeRetriever class provides a unified interface for knowledge retrieval across multiple sources:

Feature Description
Pluggable sources Register any callable as a retrieval function
LRU cache Automatic caching with bounded OrderedDict (max 1024)
Direct storage Add key-value knowledge entries directly
Thread safety All operations protected by threading.Lock

Quick StartΒΆ

Python
from agenticaiframework import KnowledgeRetriever

retriever = KnowledgeRetriever()

# Register a retrieval source
def search_docs(query: str) -> str:
    # Your retrieval logic (vector DB, API, file search, etc.)
    return f"Result for: {query}"

retriever.register_source("docs", search_docs)

# Query the source
result = retriever.retrieve("docs", "How do I configure agents?")

# Results are cached automatically β€” second call is instant
cached = retriever.retrieve("docs", "How do I configure agents?")

Registering SourcesΒΆ

A source is any callable that accepts a query string and returns a result:

Python
# Simple function
def search_wiki(query: str) -> str:
    return wiki_api.search(query)

retriever.register_source("wiki", search_wiki)

# Lambda
retriever.register_source("echo", lambda q: f"Echo: {q}")

# Class method
class VectorDB:
    def search(self, query: str) -> list[dict]:
        return self.index.query(query, top_k=5)

db = VectorDB()
retriever.register_source("vectors", db.search)

Direct Knowledge StorageΒΆ

Add static knowledge entries directly without a retrieval function:

Python
retriever.add_knowledge("company_name", "Acme Corp")
retriever.add_knowledge("max_tokens", "4096")

LRU CacheΒΆ

The cache uses OrderedDict with a maximum of 1024 entries. When the limit is reached, the least recently used entry is evicted.

Python
1
2
3
4
5
# Check cache contents
cache = retriever.cache

# Clear the cache (e.g. after updating source data)
retriever.clear_cache()

Cache Key

The cache key is (source_name, query) β€” so the same query to different sources produces separate cache entries.


Bypassing the CacheΒΆ

Pass use_cache=False to skip the cache for a specific query:

Python
# Always fetch fresh data
fresh = retriever.retrieve("docs", "latest updates", use_cache=False)

API ReferenceΒΆ

KnowledgeRetrieverΒΆ

MethodsΒΆ

Method Returns Description
register_source(name, retrieval_fn) None Register a named retrieval function
add_knowledge(key, content) None Add a static knowledge entry
retrieve(source, query, use_cache=True) Any Query a source with optional caching
clear_cache() None Clear the LRU cache

PropertiesΒΆ

Property Type Description
cache OrderedDict Current cache contents

InternalΒΆ

Attribute Type Description
_sources dict Registered retrieval functions
_knowledge dict Direct knowledge entries
_cache OrderedDict LRU cache (max 1024)
_lock threading.Lock Thread synchronisation

Integration with RAGΒΆ

Combine KnowledgeRetriever with an LLM for retrieval-augmented generation:

Python
from agenticaiframework import KnowledgeRetriever

retriever = KnowledgeRetriever()
retriever.register_source("docs", doc_search_fn)

# Retrieve context
context = retriever.retrieve("docs", user_question)

# Build prompt with context
prompt = f"Context: {context}\n\nQuestion: {user_question}\nAnswer:"

Best PracticesΒΆ

Do

  • Register sources during application startup.
  • Use clear_cache() when underlying data changes.
  • Use use_cache=False for time-sensitive queries.
  • Keep retrieval functions fast β€” they block the calling thread.

Don't

  • Register sources with the same name (the second overwrites the first).
  • Store large binary blobs via add_knowledge() β€” use object storage instead.
  • Forget to call clear_cache() after data updates.

  • Memory β€” persistent agent memory
  • Agents β€” agent lifecycle
  • Tools β€” tool definitions for agents
  • Hub β€” component registry