📈 Evaluation & Scoring¶
Comprehensive Scoring Strategies
Multiple metrics for accurate evaluation
🎯 Scoring Strategies¶
Exact Match¶
from llm_evaluation_framework.evaluation.scoring_strategies import ExactMatchScorer
scorer = ExactMatchScorer()
score = scorer.score(prediction, reference)
Semantic Similarity¶
from llm_evaluation_framework.evaluation.scoring_strategies import SemanticSimilarityScorer
scorer = SemanticSimilarityScorer()
score = scorer.score(prediction, reference)
BLEU Score¶
from llm_evaluation_framework.evaluation.scoring_strategies import BLEUScorer
scorer = BLEUScorer()
score = scorer.score(prediction, reference)
📊 Metrics¶
| Metric | Use Case | Range |
|---|---|---|
| Accuracy | Overall performance | 0-1 |
| Precision | False positives | 0-1 |
| Recall | False negatives | 0-1 |
| F1 Score | Balanced metric | 0-1 |
| Cost | Economic efficiency | $0+ |