MonitoringΒΆ
Real-time metrics collection, structured event logging, and garbage collection management.
Thread-safe monitoring with bounded collections, __slots__, and structured logging throughout.
v2.0 Improvements
MonitoringSystem now uses deque(maxlen) for bounded metric storage, threading.Lock for thread safety, __slots__ for memory efficiency, and built-in GC statistics / forced collection helpers.
OverviewΒΆ
The MonitoringSystem class provides a centralised monitoring hub for your agents, processes, and workflows:
| Feature | Description |
|---|---|
| Metrics | Record and retrieve named numeric metrics with bounded history |
| Events | Log structured events with type classification |
| Messages | Append free-form log messages |
| GC Management | Inspect garbage collector stats and force collection |
| Thread Safety | All public methods are protected by threading.Lock |
Quick StartΒΆ
MetricsΒΆ
Metrics are stored per-name in bounded deque collections so memory usage stays constant regardless of how long the system runs.
| Python | |
|---|---|
Event LoggingΒΆ
Events capture structured information with a type label and arbitrary details:
| Python | |
|---|---|
Garbage Collection ManagementΒΆ
The monitoring system exposes helpers for Python's garbage collector:
| Python | |
|---|---|
When to use force_gc()
Use sparingly β only when you have evidence of memory pressure (e.g. after disposing a large batch of agents). Python's automatic GC handles most cases.
Clearing DataΒΆ
Reset all collected monitoring data:
| Python | |
|---|---|
API ReferenceΒΆ
MonitoringSystemΒΆ
Uses __slots__ for memory efficiency. All methods are thread-safe.
MethodsΒΆ
| Method | Returns | Description |
|---|---|---|
record_metric(name, value) | None | Record a numeric metric value |
get_metric(name) | list | Retrieve all recorded values for a metric |
get_metrics() | dict | Retrieve all metrics as {name: [values]} |
log_event(event_type, details) | None | Log a structured event |
get_events() | list[dict] | Retrieve all logged events |
log_message(message) | None | Append a free-form log message |
get_logs() | list[str] | Retrieve all log messages |
get_gc_stats() | dict | Get garbage collector statistics |
force_gc() | int | Force GC and return number of collected objects |
clear() | None | Reset all metrics, events, and logs |
Integration with TracingΒΆ
MonitoringSystem works alongside the Tracing module. Use monitoring for aggregate metrics and tracing for per-request spans.
Best PracticesΒΆ
Do
- Use
record_metric()for numeric KPIs (latency, token count, error rate). - Use
log_event()for discrete occurrences with structured context. - Review
get_gc_stats()periodically in long-running services. - Call
clear()between test runs to avoid data bleed.
Don't
- Store unbounded data in metric names β the collection is bounded, but creating millions of unique metric keys will still consume memory.
- Call
force_gc()in hot paths β it pauses the interpreter.
Related DocumentationΒΆ
- Tracing β distributed tracing and spans
- Performance β tuning and benchmarking
- Infrastructure β deployment and scaling