On this page

Optical character recognition

Links

Tesseract OCR - Tesseract Open Source OCR Engine (main repo).
Tesseract.js - JavaScript library that gets words in almost any language out of images.
keras-ocr - Packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.
Awesome Scanning - Curated list of awesome projects to simplify and improve paper scannning.
Awesome OCR
Scale Document - Secure platform for document processing.
Easy OCR - Ready-to-use OCR with 40+ languages supported including Chinese, Japanese, Korean and Thai. (HN)
OCRmyPDF - Adds an OCR text layer to scanned PDF files, allowing them to be searched. (Docs)
OCR with Keras, TensorFlow, and Deep Learning (2020)
What's so hard about PDF text extraction? (2020) (HN)
FilingDB - Database of extracted and structured text from European company filings. Optimised for quant investors.
InvoiceNet - Deep neural network to extract intelligent information from invoice documents.
PaddleOCR - Rich, leading, and practical OCR tools that help users train better models and apply them into practice. (Web) (HN)
TextRecognitionDataGenerator - Synthetic data generator for text recognition.
Paperless - Index and archive all of your scanned paper documents.
Tesseract Teaser
macOCR - Get any text on your screen into your clipboard. (HN)
Solving direct text extraction from PDFs (2021)
MMOCR - OpenMMLab Text Detection, Recognition and Understanding Toolbox.
In-Browser OCR (HN)
Project Naptha - Highlight, copy and translate text from any image in the browser. (HN)
Extract Table - API for extracting a table from an image. (Code)
Amazon Textract - Easily extract printed text, handwriting, and data from any document. (Code Samples)
CalamariOCR - Line based ATR Engine based on OCRopy.
Rust WebAssembly OCR experiments (2022)
Paperless-NGX - Supercharged version of paperless: scan, index and archive all your physical documents. (HN)
scan2drive - Go program (with a web interface) for scanning, converting and uploading physical documents to Google Drive.
docTR - Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch.
RapidOCR - Cross platform OCR Library based on PaddleOCR & OnnxRuntime.
meme_finder - Find locally-saved memes via their meme text using OCR. Written in Rust.
Veryfi OCR API - OCR API for Real-Time Data Extraction from Receipts & Invoices. (Node SDK)
ocrit - Command-line utility for performing OCR using Apple's Vision framework.
Tesseract WASM - WebAssembly build of the Tesseract OCR engine for use in the browser and Node.
tinyocr - Tiny command line OCR utility for recent versions of MacOS.
Our search for the best OCR tool (2019) (HN)
Building an OCR solution for document analysis with AWS Textract and AWS StepFunctions
Donut - Document Understanding Transformer. (HN)
ocrpy - OCR, Archive, Index and Search: Implementation agnostic OCR framework.
Tesseract server (OCR over HTTP) - Small lightweight HTTP server that converts photos, images and scanned documents to text using optical character recognition by utilizing the power of Google Tesseract.
Technical Report on Web-based Visual Corpus Construction for Visual Document Understanding (2022) (Code)
Text Finder - Image-to-text (OCR) app that works offline. Powered by tesseract.js.
OCR at Edge on Cloudflare Constellation (2023) (HN)
Awesome OCR

Genomics

Immunology

Startups

AWS

Serverless computing

Build systems

Computer vision

Algorithms

Formal verification

Blockchain

Figma

Message queue

Remote Procedure Calls

Psychedelics

Lysergamides

Tryptamines

Renewable energy

CSS

Game development

Game engines

CPU

Nutrition

Drinks

2018

2019

2020

2021

2022

Alfred

Keyboard Maestro

Xcode

Neural networks

Linear algebra

Logic

Automated theorem proving

Mathematical optimization

Statistics

Type Theory

Diseases

Music production

GraphQL

Internet of things

Peer to peer

VPN

GitHub

Containers

Kubernetes

iOS

Linux

Nix

Electrical engineering

Quantum physics

Functional programming

Interactive computing

Software testing

Version control

C

Clojure

C++

Dart

Elixir

Elm

Go

Go libraries

Java

JavaScript

JS libraries

React

Julia

Kotlin

Lisp

Nim

Objective C

OCaml

Processing

Prolog

Python

Python libraries

R language

ReasonML