๐ค Contributing to LLM Evaluation Framework¶
๐ Ways to Contribute¶
๐ Quick Start for Contributors¶
โก Lightning Setup (5 minutes)¶
# 1๏ธโฃ Fork & Clone
git clone https://github.com/YOUR_USERNAME/LLMEvaluationFramework.git
cd LLMEvaluationFramework
# 2๏ธโฃ Setup Environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3๏ธโฃ Install Dependencies
pip install -e ".[dev,test,docs]"
# 4๏ธโฃ Setup Pre-commit Hooks
pre-commit install
# 5๏ธโฃ Verify Setup
pytest tests/test_quick_setup.py -v
python -c "from llm_evaluation_framework import ModelRegistry; print('โ
Ready to contribute!')"
๐ง Development Environment Details¶
#### **Required Tools** - **Python 3.8+** (recommended: 3.11) - **Git** for version control - **IDE**: VS Code (recommended) or PyCharm - **Docker** (optional, for testing environments) #### **Project Dependencies** #### **IDE Configuration** **VS Code Settings** (`.vscode/settings.json`):
# Core development tools
pytest # Testing framework
pytest-cov # Coverage reporting
black # Code formatting
flake8 # Code linting
mypy # Type checking
pre-commit # Git hooks
isort # Import sorting
# Documentation tools
mkdocs # Documentation generator
mkdocs-material # Documentation theme
# Optional tools
docker # Containerization
tox # Testing across Python versions
{
"python.defaultInterpreterPath": "./venv/bin/python",
"python.formatting.provider": "black",
"python.linting.enabled": true,
"python.linting.flake8Enabled": true,
"python.linting.mypyEnabled": true,
"python.testing.pytestEnabled": true,
"python.testing.pytestArgs": [
"tests",
"--cov=llm_evaluation_framework"
],
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.organizeImports": true
}
}
๐ฟ Development Workflow¶
๐ Step-by-Step Contribution Process¶
#### **1. Planning & Discovery** #### **2. Repository Setup** #### **3. Branch Creation** #### **4. Development Cycle** #### **5. Submission**
# Check existing issues and discussions
https://github.com/isathish/LLMEvaluationFramework/issues
https://github.com/isathish/LLMEvaluationFramework/discussions
# Create issue if needed (use templates)
# - Bug Report
# - Feature Request
# - Documentation Improvement
# Fork repository on GitHub
# Add upstream remote
git remote add upstream https://github.com/isathish/LLMEvaluationFramework.git
# Keep your main branch updated
git fetch upstream
git checkout main
git merge upstream/main
# Create feature branch with descriptive name
git checkout -b feature/add-semantic-scoring
git checkout -b fix/async-timeout-issue
git checkout -b docs/improve-api-examples
๐ท๏ธ Commit Message Standards¶
We follow **Conventional Commits** for clear, semantic commit messages: #### **Format** #### **Types** | Type | Description | Example | |------|-------------|---------| | **feat** | New feature | `feat(scoring): add F1 score strategy` | | **fix** | Bug fix | `fix(engine): resolve timeout in async calls` | | **docs** | Documentation | `docs(api): update ModelRegistry examples` | | **style** | Code style changes | `style: format code with black` | | **refactor** | Code refactoring | `refactor(persistence): simplify storage interface` | | **test** | Test additions/fixes | `test(scoring): add edge case tests for accuracy` | | **perf** | Performance improvements | `perf(engine): optimize batch processing` | | **chore** | Maintenance tasks | `chore: update dependencies` | #### **Examples**
feat(registry): add model validation with capability checking
fix(async): resolve race condition in concurrent evaluations
docs(examples): add comprehensive async usage patterns
test(integration): add end-to-end workflow testing
refactor(scoring): extract common scoring utilities
perf(engine): reduce memory usage in large batch processing
๐งช Testing Guidelines¶
๐ฏ Testing Philosophy¶
**Quality is non-negotiable** โ Every line of code must be tested for reliability: - **Minimum Coverage**: 85% (target: 90%+) - **Test Types**: Unit, Integration, End-to-End, Performance - **Test First**: Write tests before or alongside implementation - **Documentation**: Tests serve as living documentation
๐ฌ Testing Commands¶
# Run all tests
pytest
# Run with coverage report
pytest --cov=llm_evaluation_framework --cov-report=html
# Run specific test categories
pytest tests/unit/ # Unit tests only
pytest tests/integration/ # Integration tests only
pytest -m slow # Slow tests only
pytest -m "not slow" # Exclude slow tests
# Run tests with specific patterns
pytest -k "test_model_registry"
pytest tests/test_specific_file.py::test_specific_function
# Performance testing
pytest tests/benchmarks/ --benchmark-only
# Parallel testing (faster)
pytest -n auto
โ Writing Quality Tests¶
#### **Test Structure Template**
"""
Example test following best practices
"""
import pytest
from unittest.mock import Mock, patch
from llm_evaluation_framework import ModelRegistry
class TestModelRegistry:
"""Comprehensive tests for ModelRegistry"""
def setup_method(self):
"""Setup before each test"""
self.registry = ModelRegistry()
self.valid_config = {
"provider": "openai",
"api_cost_input": 0.001,
"api_cost_output": 0.002,
"capabilities": ["reasoning", "creativity"]
}
def test_register_model_success(self):
"""Test successful model registration with valid config"""
# Arrange
model_name = "test-gpt-3.5"
# Act
result = self.registry.register_model(model_name, self.valid_config)
# Assert
assert result is True
assert model_name in self.registry._models
assert self.registry.get_model(model_name) == self.valid_config
@pytest.mark.parametrize("invalid_config,expected_error", [
({"provider": "unknown"}, "Invalid provider"),
({"provider": "openai"}, "Missing required fields"),
({"provider": "openai", "api_cost_input": -1}, "Invalid cost"),
])
def test_register_model_validation_errors(self, invalid_config, expected_error):
"""Test model registration validation with various invalid configs"""
with pytest.raises(ValueError, match=expected_error):
self.registry.register_model("test-model", invalid_config)
@patch('llm_evaluation_framework.model_registry.validate_api_key')
def test_register_model_with_api_validation(self, mock_validate):
"""Test model registration with API key validation"""
# Arrange
mock_validate.return_value = True
config = self.valid_config.copy()
config["api_key"] = "test-key"
# Act
result = self.registry.register_model("test-model", config)
# Assert
assert result is True
mock_validate.assert_called_once_with("test-key", "openai")
# Integration test example
class TestRegistryEngineIntegration:
"""Integration tests between registry and engine"""
@pytest.fixture
def configured_system(self):
"""Fixture providing configured registry and engine"""
registry = ModelRegistry()
engine = ModelInferenceEngine(registry)
registry.register_model("test-model", {
"provider": "openai",
"capabilities": ["reasoning"]
})
return registry, engine
def test_end_to_end_evaluation(self, configured_system):
"""Test complete evaluation workflow"""
registry, engine = configured_system
test_cases = [
{"prompt": "2+2=?", "expected": "4"},
{"prompt": "Capital of France?", "expected": "Paris"}
]
# This would use mocked API calls in real tests
with patch('llm_evaluation_framework.engine.call_model_api') as mock_api:
mock_api.return_value = "4"
results = engine.evaluate_model("test-model", test_cases)
assert results is not None
assert "aggregate_metrics" in results
assert results["aggregate_metrics"]["test_count"] == 2
๐ Code Quality Standards¶
๐ฏ Quality Requirements¶
#### **Code Style** - **PEP 8 Compliance**: Python Enhancement Proposal 8 - **Black Formatting**: Line length 88 characters - **Import Organization**: isort for consistent imports - **Type Hints**: 100% type annotation coverage #### **Code Quality** - **Linting**: flake8 with no warnings - **Type Checking**: mypy strict mode compliance - **Documentation**: Google-style docstrings for all public APIs - **Performance**: No significant performance regressions #### **Testing Requirements** - **Coverage**: Minimum 85% test coverage - **Test Quality**: Comprehensive edge case testing - **Integration**: Component interaction testing - **Performance**: Benchmark critical paths
๐ง Quality Tools Configuration¶
#### **pyproject.toml** #### **Pre-commit Hooks** (`.pre-commit-config.yaml`)
[tool.black]
line-length = 88
target-version = ['py38']
include = '\.pyi?$'
[tool.isort]
profile = "black"
multi_line_output = 3
line_length = 88
[tool.mypy]
python_version = "3.8"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_incomplete_defs = true
check_untyped_defs = true
[tool.pytest.ini_options]
minversion = "6.0"
addopts = "-ra -q --cov=llm_evaluation_framework"
testpaths = ["tests"]
๐ Pull Request Guidelines¶
๐ฏ Before Submitting¶
#### **Pre-submission Checklist** - [ ] **Tests Pass**: All tests pass locally - [ ] **Coverage**: New code has appropriate test coverage (85%+) - [ ] **Type Checking**: mypy passes without errors - [ ] **Linting**: flake8 passes without warnings - [ ] **Formatting**: Code formatted with Black - [ ] **Documentation**: Public APIs documented with examples - [ ] **Performance**: No significant performance regression - [ ] **Conventional Commits**: Commit messages follow standards #### **PR Requirements** - [ ] **Clear Title**: Descriptive, follows conventional commit format - [ ] **Issue Reference**: Links to related issue(s) - [ ] **Description**: What problem does this solve? - [ ] **Solution**: How does this solve the problem? - [ ] **Testing**: How was this tested? - [ ] **Breaking Changes**: Any breaking changes clearly noted - [ ] **Documentation**: Documentation updates included
๐ PR Template¶
## ๐ Description
Brief description of what this PR does and why.
Fixes #(issue_number)
## ๐ Type of Change
- [ ] ๐ Bug fix (non-breaking change which fixes an issue)
- [ ] โจ New feature (non-breaking change which adds functionality)
- [ ] ๐ฅ Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] ๐ Documentation update
- [ ] ๐ง Refactoring (no functional changes)
- [ ] โก Performance improvement
- [ ] ๐งช Test improvements
## ๐งช Testing
Describe the tests you ran to verify your changes:
- [ ] Unit tests
- [ ] Integration tests
- [ ] Manual testing
- [ ] Performance testing
## ๐ Checklist
- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
## ๐ธ Screenshots (if applicable)
Add screenshots to help explain your changes.
## ๐ Related Issues
List any related issues or PRs.
๐ Recognition & Community¶
๐ Contributor Recognition¶
#### **Contribution Levels** | Level | Contributions | Recognition | |-------|---------------|-------------| | **๐ฑ First-time** | First contribution | Welcome message, contributor badge | | **๐ค Regular** | 5+ contributions | Listed in CONTRIBUTORS.md | | **โญ Active** | 15+ contributions | Social media shoutout | | **๐๏ธ Champion** | 50+ contributions | Special contributor badge | | **๐ Maintainer** | Core team member | Commit access, decision-making | #### **Ways We Recognize Contributors** - **๐ CONTRIBUTORS.md**: All contributors listed - **๐ Release Notes**: Major contributors highlighted - **๐ฆ Social Media**: Public appreciation posts - **๐ฌ Discord**: Special contributor roles (coming soon) - **๐ง Newsletter**: Contributor spotlights (coming soon)
๐ฏ Becoming a Maintainer¶
#### **Maintainer Responsibilities** - **Code Review**: Review and approve pull requests - **Issue Triage**: Label and prioritize issues - **Release Management**: Help with releases and versioning - **Community Support**: Help answer questions and guide contributors - **Technical Decisions**: Participate in architecture discussions #### **Path to Maintainership** 1. **Consistent Contributions**: Regular, high-quality contributions 2. **Community Engagement**: Help other contributors and users 3. **Technical Excellence**: Demonstrate deep understanding of codebase 4. **Leadership**: Take initiative on important features or improvements 5. **Invitation**: Current maintainers invite qualified contributors
๐ฌ Community Guidelines¶
๐ค Code of Conduct¶
We are committed to providing a **welcoming, inclusive, and harassment-free** experience for everyone. Our community standards: #### **Expected Behavior** - โ
**Be respectful** and inclusive in all interactions - โ
**Be collaborative** and help others learn and grow - โ
**Be constructive** in feedback and criticism - โ
**Be patient** with newcomers and different perspectives - โ
**Be professional** in all communications #### **Unacceptable Behavior** - โ Harassment, discrimination, or offensive language - โ Personal attacks or trolling - โ Spam or off-topic content - โ Publishing private information without consent - โ Disruptive or destructive behavior #### **Enforcement** Violations will be addressed promptly and may result in: - Warning and education - Temporary suspension - Permanent ban from the community Report issues to: [conduct@llmevalframework.org](mailto:conduct@llmevalframework.org)
๐ Getting Help¶
#### **Support Channels** | Channel | Purpose | Response Time | |---------|---------|---------------| | **๐ [Documentation](https://isathish.github.io/LLMEvaluationFramework/)** | Self-service help | Immediate | | **๐ [GitHub Issues](https://github.com/isathish/LLMEvaluationFramework/issues)** | Bug reports, feature requests | 24-48 hours | | **๐ฌ [GitHub Discussions](https://github.com/isathish/LLMEvaluationFramework/discussions)** | Q&A, ideas, general discussion | Community-driven | | **๐ง Direct Email** | Security issues, conduct violations | 1-3 business days | #### **Before Asking for Help** 1. **Search existing issues** and discussions 2. **Check the documentation** for relevant guides 3. **Try the examples** to understand usage patterns 4. **Provide context** when asking questions (code, error messages, environment)
๐ Project Roadmap & Planning¶
๐ฏ Current Priorities¶
#### **Q4 2025 Goals** - **๐ Performance**: 50% faster evaluation processing - **๐ง Extensions**: Plugin system for custom components - **๐ Analytics**: Advanced reporting and visualization - **๐ Integration**: Support for more LLM providers #### **How to Get Involved** - **๐ Review roadmap issues** labeled with `roadmap` - **๐ฌ Join planning discussions** in GitHub Discussions - **๐ฏ Propose new features** that align with project goals - **๐ค Collaborate** with other contributors on major features #### **Feature Request Process** 1. **Search existing requests** to avoid duplicates 2. **Use feature request template** for new proposals 3. **Provide detailed use cases** and requirements 4. **Participate in discussion** and refinement 5. **Implementation planning** with maintainers
## ๐ Ready to Make Your First Contribution? **Every contribution, no matter how small, makes a meaningful impact!** --- ### ๐ **Thank You to Our Amazing Contributors!**
--- **๐ Your contribution could be featured here next! Join our community of builders making LLM evaluation better for everyone.** *Made with โค๏ธ by the LLM Evaluation Framework community*