Evaluation¶
The evaluation framework compares detected lines against ground truth using a line-distance metric and greedy matching.
Usage¶
wire-eval --dataset hand_drawn --backend contour
# Evaluate specific config
wire-eval \
--dataset hand_drawn \
--params '{"threshold": {"mode": "otsu"}, "dilate": {"kernel_size": 5}}'
Metrics¶
Line Distance¶
The distance between a detected segment D(p1,p2) and ground truth segment G(g1,g2) is the average of the point-to-segment distances from D's endpoints to G:
Classification¶
A detection is classified using greedy matching with a distance threshold:
| Category | Condition |
|---|---|
| True Positive (TP) | segment_dist(D, G) ≤ threshold for some unmatched GT |
| Redundant | Multiple detections matching the same GT |
| False Positive (FP) | Detection with no matching GT within threshold |
| False Negative (FN) | Unmatched ground truth lines |
Aggregate Metrics¶
Recall = TP / (TP + FN)
Precision = TP / (TP + FP + Redundant)
F1 = 2 × (Recall × Precision) / (Recall + Precision)
The default distance threshold is 20 pixels.
Output¶
The evaluation produces per-image results and an aggregate report:
Dataset: hand_drawn (140 images)
Backend: contour
Params: {threshold: otsu, dilate_ksize: 5, ...}
────────────────────────────────────────────
TP: 5,734 FN: 1,360 Recall: 0.808
FP: 939 Redundant: 304 Precision: 0.819
F1: 0.814
────────────────────────────────────────────
Reports can be exported as CSV and markdown.