Skip to content

Evaluation

The evaluation framework compares detected lines against ground truth using a line-distance metric and greedy matching.

Usage

wire-eval --dataset hand_drawn --backend contour

# Evaluate specific config
wire-eval \
  --dataset hand_drawn \
  --params '{"threshold": {"mode": "otsu"}, "dilate": {"kernel_size": 5}}'

Metrics

Line Distance

The distance between a detected segment D(p1,p2) and ground truth segment G(g1,g2) is the average of the point-to-segment distances from D's endpoints to G:

segment_dist(D, G) = (point_to_segment_dist(p1, g1, g2) + point_to_segment_dist(p2, g1, g2)) / 2

Classification

A detection is classified using greedy matching with a distance threshold:

Category Condition
True Positive (TP) segment_dist(D, G) ≤ threshold for some unmatched GT
Redundant Multiple detections matching the same GT
False Positive (FP) Detection with no matching GT within threshold
False Negative (FN) Unmatched ground truth lines

Aggregate Metrics

Recall    = TP / (TP + FN)
Precision = TP / (TP + FP + Redundant)
F1        = 2 × (Recall × Precision) / (Recall + Precision)

The default distance threshold is 20 pixels.

Output

The evaluation produces per-image results and an aggregate report:

Dataset: hand_drawn (140 images)
Backend: contour
Params:  {threshold: otsu, dilate_ksize: 5, ...}
────────────────────────────────────────────
TP:  5,734    FN:  1,360    Recall:  0.808
FP:    939    Redundant:  304    Precision:  0.819
                 F1:  0.814
────────────────────────────────────────────

Reports can be exported as CSV and markdown.