OCRacle Beta

A Benchmarking Tool for Vision LLMs

Introducing a benchmark for comparing the performance of various SOTA Visual Language Models (VLM) on historical image documents, based on the GT4HistOCR dataset.

DISCLAIMER: This tool is currently in development and provided as a beta version. Results may be incomplete, inconsistent, or subject to change. Use at your own discretion, and do not rely on the tool for critical evaluations or decisions.

Last update: Loading...

Evaluation Metrics

Accuracy

Percentage of characters that match exactly between the OCR output and ground truth text. Higher values indicate better performance.

85% match rate

Character Error Rate (CER)

Ratio of character-level errors (insertions, deletions, substitutions) to the total number of characters in the ground truth. Lower values indicate better performance.

+ - ~

Character-level errors

Word Error Rate (WER)

Ratio of word-level errors (insertions, deletions, substitutions) to the total number of words in the ground truth. Lower values indicate better performance.

the quik brown

Word-level comparison

Execution Time

Average time taken by each model to process and transcribe an image, measured in seconds. Lower values indicate faster processing.

⏱️

2.3s

Processing time