Skip to content

๐Ÿค Contributing to LLM Evaluation Framework

![Contributing](https://img.shields.io/badge/Contributing-Welcome%20%26%20Appreciated-22c55e?style=for-the-badge&logo=heart&logoColor=white) **Join our mission to build the most reliable LLM evaluation framework** *Your contributions help thousands of developers build better AI systems* [![Code of Conduct](https://img.shields.io/badge/Code%20of%20Conduct-Contributor%20Covenant-ef4444?style=for-the-badge)](CODE_OF_CONDUCT.md) [![License](https://img.shields.io/badge/License-MIT-6366f1?style=for-the-badge)](LICENSE) [![Contributors](https://img.shields.io/badge/Contributors-View%20All-f59e0b?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/graphs/contributors)

๐ŸŒŸ Ways to Contribute

๐Ÿ› Bug Reports

5-15 minutes

Help us identify and fix issues

Report Bug

๐Ÿ“š Documentation

15-30 minutes

Improve guides, fix typos, add examples

Improve Docs

โœจ Features

1-8 hours

Build new capabilities and improvements

Request Feature

๐Ÿ”ง Code

30+ minutes

Fix bugs, implement features, optimize

Start Coding

๐Ÿš€ Quick Start for Contributors

โšก Lightning Setup (5 minutes)

# 1๏ธโƒฃ Fork & Clone
git clone https://github.com/YOUR_USERNAME/LLMEvaluationFramework.git
cd LLMEvaluationFramework

# 2๏ธโƒฃ Setup Environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# 3๏ธโƒฃ Install Dependencies
pip install -e ".[dev,test,docs]"

# 4๏ธโƒฃ Setup Pre-commit Hooks
pre-commit install

# 5๏ธโƒฃ Verify Setup
pytest tests/test_quick_setup.py -v
python -c "from llm_evaluation_framework import ModelRegistry; print('โœ… Ready to contribute!')"
**๐ŸŽ‰ You're ready to make your first contribution!**

๐Ÿ”ง Development Environment Details

#### **Required Tools** - **Python 3.8+** (recommended: 3.11) - **Git** for version control - **IDE**: VS Code (recommended) or PyCharm - **Docker** (optional, for testing environments) #### **Project Dependencies**
# Core development tools
pytest          # Testing framework
pytest-cov      # Coverage reporting
black           # Code formatting
flake8          # Code linting
mypy            # Type checking
pre-commit      # Git hooks
isort           # Import sorting

# Documentation tools
mkdocs          # Documentation generator
mkdocs-material # Documentation theme

# Optional tools
docker          # Containerization
tox             # Testing across Python versions
#### **IDE Configuration** **VS Code Settings** (`.vscode/settings.json`):
{
    "python.defaultInterpreterPath": "./venv/bin/python",
    "python.formatting.provider": "black",
    "python.linting.enabled": true,
    "python.linting.flake8Enabled": true,
    "python.linting.mypyEnabled": true,
    "python.testing.pytestEnabled": true,
    "python.testing.pytestArgs": [
        "tests",
        "--cov=llm_evaluation_framework"
    ],
    "editor.formatOnSave": true,
    "editor.codeActionsOnSave": {
        "source.organizeImports": true
    }
}

๐ŸŒฟ Development Workflow

๐Ÿ“‹ Step-by-Step Contribution Process

#### **1. Planning & Discovery**
# Check existing issues and discussions
https://github.com/isathish/LLMEvaluationFramework/issues
https://github.com/isathish/LLMEvaluationFramework/discussions

# Create issue if needed (use templates)
# - Bug Report
# - Feature Request  
# - Documentation Improvement
#### **2. Repository Setup**
# Fork repository on GitHub
# Add upstream remote
git remote add upstream https://github.com/isathish/LLMEvaluationFramework.git

# Keep your main branch updated
git fetch upstream
git checkout main
git merge upstream/main
#### **3. Branch Creation**
# Create feature branch with descriptive name
git checkout -b feature/add-semantic-scoring
git checkout -b fix/async-timeout-issue
git checkout -b docs/improve-api-examples
#### **4. Development Cycle**
# Make your changes
# Write comprehensive tests
# Update documentation
# Check code quality
black .
isort .
flake8
mypy .

# Run tests
pytest --cov=llm_evaluation_framework

# Commit with conventional format
git add .
git commit -m "feat(scoring): add semantic similarity scoring strategy"
#### **5. Submission**
# Push to your fork
git push origin feature/add-semantic-scoring

# Create Pull Request on GitHub
# Use PR template and fill completely

๐Ÿท๏ธ Commit Message Standards

We follow **Conventional Commits** for clear, semantic commit messages: #### **Format**
<type>[optional scope]: <description>

[optional body]

[optional footer(s)]
#### **Types** | Type | Description | Example | |------|-------------|---------| | **feat** | New feature | `feat(scoring): add F1 score strategy` | | **fix** | Bug fix | `fix(engine): resolve timeout in async calls` | | **docs** | Documentation | `docs(api): update ModelRegistry examples` | | **style** | Code style changes | `style: format code with black` | | **refactor** | Code refactoring | `refactor(persistence): simplify storage interface` | | **test** | Test additions/fixes | `test(scoring): add edge case tests for accuracy` | | **perf** | Performance improvements | `perf(engine): optimize batch processing` | | **chore** | Maintenance tasks | `chore: update dependencies` | #### **Examples**
feat(registry): add model validation with capability checking

fix(async): resolve race condition in concurrent evaluations

docs(examples): add comprehensive async usage patterns

test(integration): add end-to-end workflow testing

refactor(scoring): extract common scoring utilities

perf(engine): reduce memory usage in large batch processing

๐Ÿงช Testing Guidelines

๐ŸŽฏ Testing Philosophy

**Quality is non-negotiable** โ€” Every line of code must be tested for reliability: - **Minimum Coverage**: 85% (target: 90%+) - **Test Types**: Unit, Integration, End-to-End, Performance - **Test First**: Write tests before or alongside implementation - **Documentation**: Tests serve as living documentation

๐Ÿ”ฌ Testing Commands

# Run all tests
pytest

# Run with coverage report
pytest --cov=llm_evaluation_framework --cov-report=html

# Run specific test categories
pytest tests/unit/           # Unit tests only
pytest tests/integration/    # Integration tests only
pytest -m slow              # Slow tests only
pytest -m "not slow"        # Exclude slow tests

# Run tests with specific patterns
pytest -k "test_model_registry"
pytest tests/test_specific_file.py::test_specific_function

# Performance testing
pytest tests/benchmarks/ --benchmark-only

# Parallel testing (faster)
pytest -n auto

โœ… Writing Quality Tests

#### **Test Structure Template**
"""
Example test following best practices
"""
import pytest
from unittest.mock import Mock, patch
from llm_evaluation_framework import ModelRegistry

class TestModelRegistry:
    """Comprehensive tests for ModelRegistry"""

    def setup_method(self):
        """Setup before each test"""
        self.registry = ModelRegistry()
        self.valid_config = {
            "provider": "openai",
            "api_cost_input": 0.001,
            "api_cost_output": 0.002,
            "capabilities": ["reasoning", "creativity"]
        }

    def test_register_model_success(self):
        """Test successful model registration with valid config"""
        # Arrange
        model_name = "test-gpt-3.5"

        # Act
        result = self.registry.register_model(model_name, self.valid_config)

        # Assert
        assert result is True
        assert model_name in self.registry._models
        assert self.registry.get_model(model_name) == self.valid_config

    @pytest.mark.parametrize("invalid_config,expected_error", [
        ({"provider": "unknown"}, "Invalid provider"),
        ({"provider": "openai"}, "Missing required fields"),
        ({"provider": "openai", "api_cost_input": -1}, "Invalid cost"),
    ])
    def test_register_model_validation_errors(self, invalid_config, expected_error):
        """Test model registration validation with various invalid configs"""
        with pytest.raises(ValueError, match=expected_error):
            self.registry.register_model("test-model", invalid_config)

    @patch('llm_evaluation_framework.model_registry.validate_api_key')
    def test_register_model_with_api_validation(self, mock_validate):
        """Test model registration with API key validation"""
        # Arrange
        mock_validate.return_value = True
        config = self.valid_config.copy()
        config["api_key"] = "test-key"

        # Act
        result = self.registry.register_model("test-model", config)

        # Assert
        assert result is True
        mock_validate.assert_called_once_with("test-key", "openai")

# Integration test example
class TestRegistryEngineIntegration:
    """Integration tests between registry and engine"""

    @pytest.fixture
    def configured_system(self):
        """Fixture providing configured registry and engine"""
        registry = ModelRegistry()
        engine = ModelInferenceEngine(registry)

        registry.register_model("test-model", {
            "provider": "openai",
            "capabilities": ["reasoning"]
        })

        return registry, engine

    def test_end_to_end_evaluation(self, configured_system):
        """Test complete evaluation workflow"""
        registry, engine = configured_system

        test_cases = [
            {"prompt": "2+2=?", "expected": "4"},
            {"prompt": "Capital of France?", "expected": "Paris"}
        ]

        # This would use mocked API calls in real tests
        with patch('llm_evaluation_framework.engine.call_model_api') as mock_api:
            mock_api.return_value = "4"

            results = engine.evaluate_model("test-model", test_cases)

            assert results is not None
            assert "aggregate_metrics" in results
            assert results["aggregate_metrics"]["test_count"] == 2

๐Ÿ“ Code Quality Standards

๐ŸŽฏ Quality Requirements

#### **Code Style** - **PEP 8 Compliance**: Python Enhancement Proposal 8 - **Black Formatting**: Line length 88 characters - **Import Organization**: isort for consistent imports - **Type Hints**: 100% type annotation coverage #### **Code Quality** - **Linting**: flake8 with no warnings - **Type Checking**: mypy strict mode compliance - **Documentation**: Google-style docstrings for all public APIs - **Performance**: No significant performance regressions #### **Testing Requirements** - **Coverage**: Minimum 85% test coverage - **Test Quality**: Comprehensive edge case testing - **Integration**: Component interaction testing - **Performance**: Benchmark critical paths

๐Ÿ”ง Quality Tools Configuration

#### **pyproject.toml**
[tool.black]
line-length = 88
target-version = ['py38']
include = '\.pyi?$'

[tool.isort]
profile = "black"
multi_line_output = 3
line_length = 88

[tool.mypy]
python_version = "3.8"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
disallow_incomplete_defs = true
check_untyped_defs = true

[tool.pytest.ini_options]
minversion = "6.0"
addopts = "-ra -q --cov=llm_evaluation_framework"
testpaths = ["tests"]
#### **Pre-commit Hooks** (`.pre-commit-config.yaml`)
repos:
  - repo: https://github.com/psf/black
    rev: 23.1.0
    hooks:
      - id: black

  - repo: https://github.com/pycqa/isort
    rev: 5.12.0
    hooks:
      - id: isort

  - repo: https://github.com/pycqa/flake8
    rev: 6.0.0
    hooks:
      - id: flake8

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.0.1
    hooks:
      - id: mypy

๐Ÿ“‹ Pull Request Guidelines

๐ŸŽฏ Before Submitting

#### **Pre-submission Checklist** - [ ] **Tests Pass**: All tests pass locally - [ ] **Coverage**: New code has appropriate test coverage (85%+) - [ ] **Type Checking**: mypy passes without errors - [ ] **Linting**: flake8 passes without warnings - [ ] **Formatting**: Code formatted with Black - [ ] **Documentation**: Public APIs documented with examples - [ ] **Performance**: No significant performance regression - [ ] **Conventional Commits**: Commit messages follow standards #### **PR Requirements** - [ ] **Clear Title**: Descriptive, follows conventional commit format - [ ] **Issue Reference**: Links to related issue(s) - [ ] **Description**: What problem does this solve? - [ ] **Solution**: How does this solve the problem? - [ ] **Testing**: How was this tested? - [ ] **Breaking Changes**: Any breaking changes clearly noted - [ ] **Documentation**: Documentation updates included

๐Ÿ“ PR Template

## ๐Ÿ“ Description

Brief description of what this PR does and why.

Fixes #(issue_number)

## ๐Ÿ”„ Type of Change

- [ ] ๐Ÿ› Bug fix (non-breaking change which fixes an issue)
- [ ] โœจ New feature (non-breaking change which adds functionality)
- [ ] ๐Ÿ’ฅ Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] ๐Ÿ“š Documentation update
- [ ] ๐Ÿ”ง Refactoring (no functional changes)
- [ ] โšก Performance improvement
- [ ] ๐Ÿงช Test improvements

## ๐Ÿงช Testing

Describe the tests you ran to verify your changes:

- [ ] Unit tests
- [ ] Integration tests
- [ ] Manual testing
- [ ] Performance testing

## ๐Ÿ“‹ Checklist

- [ ] My code follows the style guidelines of this project
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes

## ๐Ÿ“ธ Screenshots (if applicable)

Add screenshots to help explain your changes.

## ๐Ÿ”— Related Issues

List any related issues or PRs.

๐Ÿ† Recognition & Community

๐ŸŒŸ Contributor Recognition

#### **Contribution Levels** | Level | Contributions | Recognition | |-------|---------------|-------------| | **๐ŸŒฑ First-time** | First contribution | Welcome message, contributor badge | | **๐Ÿค Regular** | 5+ contributions | Listed in CONTRIBUTORS.md | | **โญ Active** | 15+ contributions | Social media shoutout | | **๐ŸŽ–๏ธ Champion** | 50+ contributions | Special contributor badge | | **๐Ÿ‘‘ Maintainer** | Core team member | Commit access, decision-making | #### **Ways We Recognize Contributors** - **๐Ÿ“ CONTRIBUTORS.md**: All contributors listed - **๐ŸŽ‰ Release Notes**: Major contributors highlighted - **๐Ÿฆ Social Media**: Public appreciation posts - **๐Ÿ’ฌ Discord**: Special contributor roles (coming soon) - **๐Ÿ“ง Newsletter**: Contributor spotlights (coming soon)

๐ŸŽฏ Becoming a Maintainer

#### **Maintainer Responsibilities** - **Code Review**: Review and approve pull requests - **Issue Triage**: Label and prioritize issues - **Release Management**: Help with releases and versioning - **Community Support**: Help answer questions and guide contributors - **Technical Decisions**: Participate in architecture discussions #### **Path to Maintainership** 1. **Consistent Contributions**: Regular, high-quality contributions 2. **Community Engagement**: Help other contributors and users 3. **Technical Excellence**: Demonstrate deep understanding of codebase 4. **Leadership**: Take initiative on important features or improvements 5. **Invitation**: Current maintainers invite qualified contributors

๐Ÿ’ฌ Community Guidelines

๐Ÿค Code of Conduct

We are committed to providing a **welcoming, inclusive, and harassment-free** experience for everyone. Our community standards: #### **Expected Behavior** - โœ… **Be respectful** and inclusive in all interactions - โœ… **Be collaborative** and help others learn and grow - โœ… **Be constructive** in feedback and criticism - โœ… **Be patient** with newcomers and different perspectives - โœ… **Be professional** in all communications #### **Unacceptable Behavior** - โŒ Harassment, discrimination, or offensive language - โŒ Personal attacks or trolling - โŒ Spam or off-topic content - โŒ Publishing private information without consent - โŒ Disruptive or destructive behavior #### **Enforcement** Violations will be addressed promptly and may result in: - Warning and education - Temporary suspension - Permanent ban from the community Report issues to: [conduct@llmevalframework.org](mailto:conduct@llmevalframework.org)

๐Ÿ“ž Getting Help

#### **Support Channels** | Channel | Purpose | Response Time | |---------|---------|---------------| | **๐Ÿ“– [Documentation](https://isathish.github.io/LLMEvaluationFramework/)** | Self-service help | Immediate | | **๐Ÿ› [GitHub Issues](https://github.com/isathish/LLMEvaluationFramework/issues)** | Bug reports, feature requests | 24-48 hours | | **๐Ÿ’ฌ [GitHub Discussions](https://github.com/isathish/LLMEvaluationFramework/discussions)** | Q&A, ideas, general discussion | Community-driven | | **๐Ÿ“ง Direct Email** | Security issues, conduct violations | 1-3 business days | #### **Before Asking for Help** 1. **Search existing issues** and discussions 2. **Check the documentation** for relevant guides 3. **Try the examples** to understand usage patterns 4. **Provide context** when asking questions (code, error messages, environment)

๐Ÿ“ˆ Project Roadmap & Planning

๐ŸŽฏ Current Priorities

#### **Q4 2025 Goals** - **๐Ÿš€ Performance**: 50% faster evaluation processing - **๐Ÿ”ง Extensions**: Plugin system for custom components - **๐Ÿ“Š Analytics**: Advanced reporting and visualization - **๐ŸŒ Integration**: Support for more LLM providers #### **How to Get Involved** - **๐Ÿ” Review roadmap issues** labeled with `roadmap` - **๐Ÿ’ฌ Join planning discussions** in GitHub Discussions - **๐ŸŽฏ Propose new features** that align with project goals - **๐Ÿค Collaborate** with other contributors on major features #### **Feature Request Process** 1. **Search existing requests** to avoid duplicates 2. **Use feature request template** for new proposals 3. **Provide detailed use cases** and requirements 4. **Participate in discussion** and refinement 5. **Implementation planning** with maintainers

## ๐ŸŽ‰ Ready to Make Your First Contribution? **Every contribution, no matter how small, makes a meaningful impact!**
[![๐Ÿ› Report Your First Bug](https://img.shields.io/badge/๐Ÿ›_Report_Your_First_Bug-Get%20Started-ef4444?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/issues/new?template=bug_report.md) [![๐Ÿ“š Improve Documentation](https://img.shields.io/badge/๐Ÿ“š_Improve_Documentation-Easy%20Start-22c55e?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/issues?q=is%3Aissue+is%3Aopen+label%3Adocumentation) [![โœจ Request a Feature](https://img.shields.io/badge/โœจ_Request_a_Feature-Share%20Ideas-6366f1?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/issues/new?template=feature_request.md) [![๐Ÿค Start Coding](https://img.shields.io/badge/๐Ÿค_Start_Coding-Fork%20%26%20Contribute-f59e0b?style=for-the-badge)](https://github.com/isathish/LLMEvaluationFramework/fork)
--- ### ๐Ÿ’ **Thank You to Our Amazing Contributors!** --- **๐ŸŒŸ Your contribution could be featured here next! Join our community of builders making LLM evaluation better for everyone.** *Made with โค๏ธ by the LLM Evaluation Framework community*