🚀 Advanced Usage Example¶
This guide demonstrates a full end-to-end workflow using EmbeddingFramework with advanced features.
1️⃣ Install the package¶
pip install embeddingframework
2️⃣ Import required modules¶
from embeddingframework.adapters.openai_embedding_adapter import OpenAIEmbeddingAdapter
from embeddingframework.adapters.vector_dbs import MilvusAdapter
from embeddingframework.processors.file_processor import FileProcessor
3️⃣ Initialize components¶
embedding_provider = OpenAIEmbeddingAdapter(api_key="YOUR_OPENAI_API_KEY")
vector_db = MilvusAdapter(host="localhost", port="19530")
processor = FileProcessor()
4️⃣ Process files¶
text = processor.process_file("sample.pdf")
5️⃣ Generate embeddings¶
embeddings = embedding_provider.embed_texts([text])
6️⃣ Store embeddings¶
vector_db.add_texts([text], embeddings)
7️⃣ Query the database¶
results = vector_db.query("search term", top_k=5)
print(results)
📊 Example Output¶
Input File: sample.pdf
(contains "Artificial Intelligence is transforming the world.")
Extracted Text:
Artificial Intelligence is transforming the world.
Generated Embedding (example):
[0.2345, -0.1234, 0.9876, ...]
Query:
results = vector_db.query("AI transformation", top_k=2)
Output:
[
{"text": "Artificial Intelligence is transforming the world.", "score": 0.98},
{"text": "AI is changing industries rapidly.", "score": 0.92}
]
🧩 Additional Examples¶
Example 1: Batch Processing Multiple Files¶
files = ["doc1.pdf", "doc2.txt"]
texts = [processor.process_file(f) for f in files]
embeddings = embedding_provider.embed_texts(texts)
vector_db.add_texts(texts, embeddings)
Example 2: Using Metadata in Queries¶
results = vector_db.query("AI", top_k=3, filter={"source": "doc1.pdf"})
Example 3: Async Embedding Generation¶
import asyncio
async def main():
embeddings = await embedding_provider.aembed_texts(["Async example"])
print(embeddings)
asyncio.run(main())
✅ Summary¶
You have successfully: - Processed a file - Generated embeddings - Stored them in a distributed vector database - Queried the database for semantic search - Used batch processing, metadata filtering, and async embedding generation