diff --git a/README.md b/README.md index 8736644..7039656 100644 --- a/README.md +++ b/README.md @@ -155,6 +155,7 @@ Cooldown behavior: - [`docs/reviewer-brief.md`](docs/reviewer-brief.md) gives the short problem, value, evidence, and boundary summary - [`docs/reviewer-path.md`](docs/reviewer-path.md) maps common review questions to the right demo and artifacts - [`docs/architecture.md`](docs/architecture.md) diagrams the local file-based detection workflow +- [`docs/event-time-model.md`](docs/event-time-model.md) defines event, observed, window, and artifact time semantics - [`docs/sample-output.md`](docs/sample-output.md) summarizes the committed sample artifacts - [`docs/roadmap.md`](docs/roadmap.md) sketches the v0.7 / v1.0 consolidation direction - [`data/processed/summary.json`](data/processed/summary.json) captures the default run in machine-readable form diff --git a/docs/README.md b/docs/README.md index 8b5c969..4ea4175 100644 --- a/docs/README.md +++ b/docs/README.md @@ -13,6 +13,7 @@ This directory separates the current reviewer route from supporting design notes ## Supporting docs - [`sample-output.md`](sample-output.md): committed output counts and sample artifacts +- [`event-time-model.md`](event-time-model.md): event, observed, window, and artifact time semantics - [`design-notes.md`](design-notes.md): original telemetry-window design boundaries - [`ai-assisted-detection-design.md`](ai-assisted-detection-design.md): bounded AI-assisted detection design - [`ai-assisted-detection-examples.md`](ai-assisted-detection-examples.md): example AI-assisted detection outputs and guardrail behavior diff --git a/docs/event-time-model.md b/docs/event-time-model.md new file mode 100644 index 0000000..60b8797 --- /dev/null +++ b/docs/event-time-model.md @@ -0,0 +1,38 @@ +# Event Time Model + +`telemetry-lab` keeps event, observation, window, and artifact times separate so local detection demos remain reproducible and auditable. + +This model is informed by the OpenTelemetry Logs Data Model distinction between `Timestamp` and `ObservedTimestamp`: `Timestamp` represents when the event occurred at the source, while `ObservedTimestamp` represents when a collection system observed it. OpenTelemetry recommends using `Timestamp` first when exporting to a format with only one timestamp, falling back to `ObservedTimestamp` only when `Timestamp` is missing. + +`telemetry-lab` is not an OpenTelemetry implementation. The terms below define how this repository names and interprets time in sample inputs, generated features, alerts, reports, and future demo artifacts. + +## Fields + +| Field | Meaning | Used for detection ordering? | Current repository mapping | +| --- | --- | --- | --- | +| `event_time` | Time the source event happened. | Yes | The default input column is named `timestamp`; configs may use `time.timestamp_col` to point at a source column such as `event_time`. | +| `observed_time` | Time a collector, loader, or intermediary observed the event. | No, unless a demo explicitly documents fallback behavior. | Optional future input or artifact field. Current core demos do not require it. | +| `window_start` / `window_end` | Deterministic analysis interval derived from `event_time`. | Yes | Feature rows, alert rows, and dedup artifacts use these boundaries. Windows are treated as `[window_start, window_end)`. | +| `artifact_generated_at` | Time an output artifact was rendered or written. | No | Optional provenance metadata for reports, summaries, or reviewer packs. It must not be used as event evidence. | + +## Rules + +- Prefer `event_time` for ordering, windowing, cooldown reasoning, and evidence correlation. +- Treat the current `timestamp` input column as `event_time` unless a config names another column through `time.timestamp_col`. +- Preserve `observed_time` when a source provides it, but keep it separate from event ordering. +- If a downstream export format only supports one timestamp, use `event_time` when present; fall back to `observed_time` only when the source event time is unavailable. +- Derive `window_start` and `window_end` from event time, not from artifact generation time. +- Use `artifact_generated_at` only for provenance and reproducibility checks. + +## Why It Matters + +Detection workflows become hard to review when ingestion time, source event time, analysis-window time, and report-generation time collapse into one ambiguous `timestamp`. This repository keeps those concepts explicit: + +- raw event records carry source event time +- feature rows and alerts carry deterministic window boundaries +- reports may carry artifact generation metadata +- reviewer evidence can be checked without guessing which clock controlled the decision + +## References + +- [OpenTelemetry Logs Data Model](https://opentelemetry.io/docs/specs/otel/logs/data-model/) diff --git a/tests/test_event_time_model_docs.py b/tests/test_event_time_model_docs.py new file mode 100644 index 0000000..ce83d8a --- /dev/null +++ b/tests/test_event_time_model_docs.py @@ -0,0 +1,33 @@ +from __future__ import annotations + +from pathlib import Path + + +REPO_ROOT = Path(__file__).resolve().parents[1] + + +def _read_repo_file(relative_path: str) -> str: + return (REPO_ROOT / relative_path).read_text(encoding="utf-8") + + +def test_event_time_model_documents_time_field_boundaries() -> None: + doc = _read_repo_file("docs/event-time-model.md") + readme = _read_repo_file("README.md") + docs_index = _read_repo_file("docs/README.md") + + for term in [ + "event_time", + "observed_time", + "window_start", + "window_end", + "artifact_generated_at", + ]: + assert term in doc + + assert "OpenTelemetry Logs Data Model" in doc + assert "Timestamp" in doc + assert "ObservedTimestamp" in doc + assert "[window_start, window_end)" in doc + assert "must not be used as event evidence" in doc + assert "(docs/event-time-model.md)" in readme + assert "(event-time-model.md)" in docs_index