🤝 Contributing to LLM Evaluation Framework¶

![Contributing](https://img.shields.io/badge/Contributing-Welcome%20%26%20Appreciated-22c55e?style=for-the-badge&logo=heart&logoColor=white) **Join our mission to build the most reliable LLM evaluation framework** *Your contributions help thousands of developers build better AI systems* [![Code of Conduct](https://img.shields.io/badge/Code%20of%20Conduct-Contributor%20Covenant-ef4444?style=for-the-badge)](CODE_OF_CONDUCT.md) [![License](https://img.shields.io/badge/License-MIT-6366f1?style=for-the-badge)](LICENSE) [![Contributors](https://img.shields.io/badge/Contributors-View%20All-f59e0b?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/graphs/contributors)

🌟 Ways to Contribute¶

🐛 Bug Reports

5-15 minutes

Help us identify and fix issues

Report Bug

📚 Documentation

15-30 minutes

Improve guides, fix typos, add examples

Improve Docs

✨ Features

1-8 hours

Build new capabilities and improvements

Request Feature

🔧 Code

30+ minutes

Fix bugs, implement features, optimize

Start Coding

🚀 Quick Start for Contributors¶

⚡ Lightning Setup (5 minutes)¶

# 1️⃣ Fork & Clone
git clone https://github.com/YOUR_USERNAME/LLMEvaluationFramework.git
cd LLMEvaluationFramework

# 2️⃣ Setup Environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# 3️⃣ Install Dependencies
pip install -e ".[dev,test,docs]"

# 4️⃣ Setup Pre-commit Hooks
pre-commit install

# 5️⃣ Verify Setup
pytest tests/test_quick_setup.py -v
python -c "from llm_evaluation_framework import ModelRegistry; print('✅ Ready to contribute!')"

**🎉 You're ready to make your first contribution!**

🔧 Development Environment Details¶

#### **Required Tools** - **Python 3.8+** (recommended: 3.11) - **Git** for version control - **IDE**: VS Code (recommended) or PyCharm - **Docker** (optional, for testing environments) #### **Project Dependencies**

# Core development tools
pytest          # Testing framework
pytest-cov      # Coverage reporting
black           # Code formatting
flake8          # Code linting
mypy            # Type checking
pre-commit      # Git hooks
isort           # Import sorting

# Documentation tools
mkdocs          # Documentation generator
mkdocs-material # Documentation theme

# Optional tools
docker          # Containerization
tox             # Testing across Python versions

#### **IDE Configuration** **VS Code Settings** (`.vscode/settings.json`):

{
    "python.defaultInterpreterPath": "./venv/bin/python",
    "python.formatting.provider": "black",
    "python.linting.enabled": true,
    "python.linting.flake8Enabled": true,
    "python.linting.mypyEnabled": true,
    "python.testing.pytestEnabled": true,
    "python.testing.pytestArgs": [
        "tests",
        "--cov=llm_evaluation_framework"
    ],
    "editor.formatOnSave": true,
    "editor.codeActionsOnSave": {
        "source.organizeImports": true
    }
}

🌿 Development Workflow¶

📋 Step-by-Step Contribution Process¶

#### **1. Planning & Discovery**

# Check existing issues and discussions
https://github.com/isathish/LLMEvaluationFramework/issues
https://github.com/isathish/LLMEvaluationFramework/discussions

# Create issue if needed (use templates)
# - Bug Report
# - Feature Request  
# - Documentation Improvement

#### **2. Repository Setup**

# Fork repository on GitHub
# Add upstream remote
git remote add upstream https://github.com/isathish/LLMEvaluationFramework.git

# Keep your main branch updated
git fetch upstream
git checkout main
git merge upstream/main

#### **3. Branch Creation**

# Create feature branch with descriptive name
git checkout -b feature/add-semantic-scoring
git checkout -b fix/async-timeout-issue
git checkout -b docs/improve-api-examples

#### **4. Development Cycle**

# Make your changes
# Write comprehensive tests
# Update documentation
# Check code quality
black .
isort .
flake8
mypy .

# Run tests
pytest --cov=llm_evaluation_framework

# Commit with conventional format
git add .
git commit -m "feat(scoring): add semantic similarity scoring strategy"

#### **5. Submission**

# Push to your fork
git push origin feature/add-semantic-scoring

# Create Pull Request on GitHub
# Use PR template and fill completely

🏷️ Commit Message Standards¶

We follow **Conventional Commits** for clear, semantic commit messages: #### **Format**

<type>[optional scope]: <description>

[optional body]

[optional footer(s)]

#### **Types** | Type | Description | Example | |------|-------------|---------| | **feat** | New feature | `feat(scoring): add F1 score strategy` | | **fix** | Bug fix | `fix(engine): resolve timeout in async calls` | | **docs** | Documentation | `docs(api): update ModelRegistry examples` | | **style** | Code style changes | `style: format code with black` | | **refactor** | Code refactoring | `refactor(persistence): simplify storage interface` | | **test** | Test additions/fixes | `test(scoring): add edge case tests for accuracy` | | **perf** | Performance improvements | `perf(engine): optimize batch processing` | | **chore** | Maintenance tasks | `chore: update dependencies` | #### **Examples**

feat(registry): add model validation with capability checking

fix(async): resolve race condition in concurrent evaluations

docs(examples): add comprehensive async usage patterns

test(integration): add end-to-end workflow testing

refactor(scoring): extract common scoring utilities

perf(engine): reduce memory usage in large batch processing

🧪 Testing Guidelines¶

🎯 Testing Philosophy¶

**Quality is non-negotiable** — Every line of code must be tested for reliability: - **Minimum Coverage**: 85% (target: 90%+) - **Test Types**: Unit, Integration, End-to-End, Performance - **Test First**: Write tests before or alongside implementation - **Documentation**: Tests serve as living documentation

🔬 Testing Commands¶

# Run all tests
pytest

# Run with coverage report
pytest --cov=llm_evaluation_framework --cov-report=html

# Run specific test categories
pytest tests/unit/           # Unit tests only
pytest tests/integration/    # Integration tests only
pytest -m slow              # Slow tests only
pytest -m "not slow"        # Exclude slow tests

# Run tests with specific patterns
pytest -k "test_model_registry"
pytest tests/test_specific_file.py::test_specific_function

# Performance testing
pytest tests/benchmarks/ --benchmark-only

# Parallel testing (faster)
pytest -n auto

✅ Writing Quality Tests¶

#### **Test Structure Template**

"""
Example test following best practices
"""
import pytest
from unittest.mock import Mock, patch
from llm_evaluation_framework import ModelRegistry

class TestModelRegistry:
    """Comprehensive tests for ModelRegistry"""

    def setup_method(self):
        """Setup before each test"""
        self.registry = ModelRegistry()
        self.valid_config = {
            "provider": "openai",
            "api_cost_input": 0.001,
            "api_cost_output": 0.002,
            "capabilities": ["reasoning", "creativity"]
        }

    def test_register_model_success(self):
        """Test successful model registration with valid config"""
        # Arrange
        model_name = "test-gpt-3.5"

        # Act
        result = self.registry.register_model(model_name, self.valid_config)

        # Assert
        assert result is True
        assert model_name in self.registry._models
        assert self.registry.get_model(model_name) == self.valid_config

    @pytest.mark.parametrize("invalid_config,expected_error", [
        ({"provider": "unknown"}, "Invalid provider"),
        ({"provider": "openai"}, "Missing required fields"),
        ({"provider": "openai", "api_cost_input": -1}, "Invalid cost"),
    ])
    def test_register_model_validation_errors(self, invalid_config, expected_error):
        """Test model registration validation with various invalid configs"""
        with pytest.raises(ValueError, match=expected_error):
            self.registry.register_model("test-model", invalid_config)

    @patch('llm_evaluation_framework.model_registry.validate_api_key')
    def test_register_model_with_api_validation(self, mock_validate):
        """Test model registration with API key validation"""
        # Arrange
        mock_validate.return_value = True
        config = self.valid_config.copy()
        config["api_key"] = "test-key"

        # Act
        result = self.registry.register_model("test-model", config)

        # Assert
        assert result is True
        mock_validate.assert_called_once_with("test-key", "openai")

# Integration test example
class TestRegistryEngineIntegration:
    """Integration tests between registry and engine"""

    @pytest.fixture
    def configured_system(self):
        """Fixture providing configured registry and engine"""
        registry = ModelRegistry()
        engine = ModelInferenceEngine(registry)

        registry.register_model("test-model", {
            "provider": "openai",
            "capabilities": ["reasoning"]
        })

        return registry, engine

    def test_end_to_end_evaluation(self, configured_system):
        """Test complete evaluation workflow"""
        registry, engine = configured_system

        test_cases = [
            {"prompt": "2+2=?", "expected": "4"},
            {"prompt": "Capital of France?", "expected": "Paris"}
        ]

        # This would use mocked API calls in real tests
        with patch('llm_evaluation_framework.engine.call_model_api') as mock_api:
            mock_api.return_value = "4"

            results = engine.evaluate_model("test-model", test_cases)

            assert results is not None
            assert "aggregate_metrics" in results
            assert results["aggregate_metrics"]["test_count"] == 2

📝 Code Quality Standards¶

🎯 Quality Requirements¶

#### **Code Style** - **PEP 8 Compliance**: Python Enhancement Proposal 8 - **Black Formatting**: Line length 88 characters - **Import Organization**: isort for consistent imports - **Type Hints**: 100% type annotation coverage #### **Code Quality** - **Linting**: flake8 with no warnings - **Type Checking**: mypy strict mode compliance - **Documentation**: Google-style docstrings for all public APIs - **Performance**: No significant performance regressions #### **Testing Requirements** - **Coverage**: Minimum 85% test coverage - **Test Quality**: Comprehensive edge case testing - **Integration**: Component interaction testing - **Performance**: Benchmark critical paths

🔧 Quality Tools Configuration¶

#### **pyproject.toml**

[tool.black]
line-length = 88
target-version = ['py38']
include = '\.pyi?$'

[tool.isort]
profile = "black"
multi_line_output = 3
line_length = 88

[tool.mypy]
python_version = "3.8"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_incomplete_defs = true
check_untyped_defs = true

[tool.pytest.ini_options]
minversion = "6.0"
addopts = "-ra -q --cov=llm_evaluation_framework"
testpaths = ["tests"]

#### **Pre-commit Hooks** (`.pre-commit-config.yaml`)

repos:
  - repo: https://github.com/psf/black
    rev: 23.1.0
    hooks:
      - id: black

  - repo: https://github.com/pycqa/isort
    rev: 5.12.0
    hooks:
      - id: isort

  - repo: https://github.com/pycqa/flake8
    rev: 6.0.0
    hooks:
      - id: flake8

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.0.1
    hooks:
      - id: mypy

📋 Pull Request Guidelines¶

🎯 Before Submitting¶

#### **Pre-submission Checklist** - [ ] **Tests Pass**: All tests pass locally - [ ] **Coverage**: New code has appropriate test coverage (85%+) - [ ] **Type Checking**: mypy passes without errors - [ ] **Linting**: flake8 passes without warnings - [ ] **Formatting**: Code formatted with Black - [ ] **Documentation**: Public APIs documented with examples - [ ] **Performance**: No significant performance regression - [ ] **Conventional Commits**: Commit messages follow standards #### **PR Requirements** - [ ] **Clear Title**: Descriptive, follows conventional commit format - [ ] **Issue Reference**: Links to related issue(s) - [ ] **Description**: What problem does this solve? - [ ] **Solution**: How does this solve the problem? - [ ] **Testing**: How was this tested? - [ ] **Breaking Changes**: Any breaking changes clearly noted - [ ] **Documentation**: Documentation updates included

📝 PR Template¶

## 📝 Description

Brief description of what this PR does and why.

Fixes #(issue_number)

## 🔄 Type of Change

- [ ] 🐛 Bug fix (non-breaking change which fixes an issue)
- [ ] ✨ New feature (non-breaking change which adds functionality)
- [ ] 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] 📚 Documentation update
- [ ] 🔧 Refactoring (no functional changes)
- [ ] ⚡ Performance improvement
- [ ] 🧪 Test improvements

## 🧪 Testing

Describe the tests you ran to verify your changes:

- [ ] Unit tests
- [ ] Integration tests
- [ ] Manual testing
- [ ] Performance testing

## 📋 Checklist

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes

## 📸 Screenshots (if applicable)

Add screenshots to help explain your changes.

## 🔗 Related Issues

List any related issues or PRs.

🏆 Recognition & Community¶

🌟 Contributor Recognition¶

#### **Contribution Levels** | Level | Contributions | Recognition | |-------|---------------|-------------| | **🌱 First-time** | First contribution | Welcome message, contributor badge | | **🤝 Regular** | 5+ contributions | Listed in CONTRIBUTORS.md | | **⭐ Active** | 15+ contributions | Social media shoutout | | **🎖️ Champion** | 50+ contributions | Special contributor badge | | **👑 Maintainer** | Core team member | Commit access, decision-making | #### **Ways We Recognize Contributors** - **📝 CONTRIBUTORS.md**: All contributors listed - **🎉 Release Notes**: Major contributors highlighted - **🐦 Social Media**: Public appreciation posts - **💬 Discord**: Special contributor roles (coming soon) - **📧 Newsletter**: Contributor spotlights (coming soon)

🎯 Becoming a Maintainer¶

#### **Maintainer Responsibilities** - **Code Review**: Review and approve pull requests - **Issue Triage**: Label and prioritize issues - **Release Management**: Help with releases and versioning - **Community Support**: Help answer questions and guide contributors - **Technical Decisions**: Participate in architecture discussions #### **Path to Maintainership** 1. **Consistent Contributions**: Regular, high-quality contributions 2. **Community Engagement**: Help other contributors and users 3. **Technical Excellence**: Demonstrate deep understanding of codebase 4. **Leadership**: Take initiative on important features or improvements 5. **Invitation**: Current maintainers invite qualified contributors

💬 Community Guidelines¶

🤝 Code of Conduct¶

We are committed to providing a **welcoming, inclusive, and harassment-free** experience for everyone. Our community standards: #### **Expected Behavior** - ✅ **Be respectful** and inclusive in all interactions - ✅ **Be collaborative** and help others learn and grow - ✅ **Be constructive** in feedback and criticism - ✅ **Be patient** with newcomers and different perspectives - ✅ **Be professional** in all communications #### **Unacceptable Behavior** - ❌ Harassment, discrimination, or offensive language - ❌ Personal attacks or trolling - ❌ Spam or off-topic content - ❌ Publishing private information without consent - ❌ Disruptive or destructive behavior #### **Enforcement** Violations will be addressed promptly and may result in: - Warning and education - Temporary suspension - Permanent ban from the community Report issues to: [conduct@llmevalframework.org](mailto:conduct@llmevalframework.org)

📞 Getting Help¶

#### **Support Channels** | Channel | Purpose | Response Time | |---------|---------|---------------| | **📖 [Documentation](https://isathish.github.io/LLMEvaluationFramework/)** | Self-service help | Immediate | | **🐛 [GitHub Issues](https://github.com/isathish/LLMEvaluationFramework/issues)** | Bug reports, feature requests | 24-48 hours | | **💬 [GitHub Discussions](https://github.com/isathish/LLMEvaluationFramework/discussions)** | Q&A, ideas, general discussion | Community-driven | | **📧 Direct Email** | Security issues, conduct violations | 1-3 business days | #### **Before Asking for Help** 1. **Search existing issues** and discussions 2. **Check the documentation** for relevant guides 3. **Try the examples** to understand usage patterns 4. **Provide context** when asking questions (code, error messages, environment)

📈 Project Roadmap & Planning¶

🎯 Current Priorities¶

#### **Q4 2025 Goals** - **🚀 Performance**: 50% faster evaluation processing - **🔧 Extensions**: Plugin system for custom components - **📊 Analytics**: Advanced reporting and visualization - **🌐 Integration**: Support for more LLM providers #### **How to Get Involved** - **🔍 Review roadmap issues** labeled with `roadmap` - **💬 Join planning discussions** in GitHub Discussions - **🎯 Propose new features** that align with project goals - **🤝 Collaborate** with other contributors on major features #### **Feature Request Process** 1. **Search existing requests** to avoid duplicates 2. **Use feature request template** for new proposals 3. **Provide detailed use cases** and requirements 4. **Participate in discussion** and refinement 5. **Implementation planning** with maintainers

## 🎉 Ready to Make Your First Contribution? **Every contribution, no matter how small, makes a meaningful impact!**

[![🐛 Report Your First Bug](https://img.shields.io/badge/🐛_Report_Your_First_Bug-Get%20Started-ef4444?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/issues/new?template=bug_report.md) [![📚 Improve Documentation](https://img.shields.io/badge/📚_Improve_Documentation-Easy%20Start-22c55e?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/issues?q=is%3Aissue+is%3Aopen+label%3Adocumentation) [![✨ Request a Feature](https://img.shields.io/badge/✨_Request_a_Feature-Share%20Ideas-6366f1?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/issues/new?template=feature_request.md) [![🤝 Start Coding](https://img.shields.io/badge/🤝_Start_Coding-Fork%20%26%20Contribute-f59e0b?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/fork)

--- ### 💝 **Thank You to Our Amazing Contributors!**

--- **🌟 Your contribution could be featured here next! Join our community of builders making LLM evaluation better for everyone.** *Made with ❤️ by the LLM Evaluation Framework community*