Skip to content
@Early-Modern-OCR

Early Modern OCR Project

Open source tools and training for OCR'ing 15th-18th Century printed documents with Tesseract.

Popular repositories Loading

  1. TesseractTraining TesseractTraining Public

    Training files produced for and by the Tesseract OCR engine for work on the Early Modern OCR Project (eMOP)

    36 7

  2. FrankenPlus FrankenPlus Public

    Part of eMOP: Franken+ tool for creating font training for Tesseract OCR engine from page images.

    C# 24 7

  3. hOCR-De-Noising hOCR-De-Noising Public

    code to remove "noise" from hOCR output of Tesseract OCR.

    Python 14 6

  4. RETAS RETAS Public

    Part of eMOP: the Recursive Text Alignment Tool compares OCR text results to groundtruth by character and computes a score.

    Java 11 4

  5. page-evaluator page-evaluator Public

    Java code to examine the output of Tesseract OCR and generate scores for general page quality and correctabiliby (see page-corrector repo).

    Java 8 1

  6. page-corrector page-corrector Public

    Scala code to correct Tesseract OCR output and generate ALTO XML and text files. Uses dictionary files, rules and a google-3gram DB to make corrections.

    Scala 7

Repositories

Showing 10 of 19 repositories
  • ocular Public Forked from tberg12/ocular

    Ocular is a state-of-the-art historical OCR system.

    Early-Modern-OCR/ocular’s past year of commit activity
    Java 4 GPL-3.0 48 0 0 Updated Aug 31, 2017
  • emop-controller Public

    eMOP Controller

    Early-Modern-OCR/emop-controller’s past year of commit activity
    Python 0 1 0 0 Updated Oct 27, 2016
  • hOCR-De-Noising Public

    code to remove "noise" from hOCR output of Tesseract OCR.

    Early-Modern-OCR/hOCR-De-Noising’s past year of commit activity
    Python 14 Apache-2.0 6 2 0 Updated Oct 24, 2016
  • Early-Modern-OCR/emop-dashboard’s past year of commit activity
    Ruby 0 Apache-2.0 2 5 0 Updated Aug 18, 2016
  • early-modern-ocr.github.io Public

    Github organization page for the Early Modern OCR Project (eMOP)

    Early-Modern-OCR/early-modern-ocr.github.io’s past year of commit activity
    JavaScript 1 Apache-2.0 1 0 0 Updated Feb 3, 2016
  • ImprintDB Public

    A database of early modern printers and sellers culled from the eMOP source documents

    Early-Modern-OCR/ImprintDB’s past year of commit activity
    3 Apache-2.0 2 0 0 Updated Jan 20, 2016
  • TCP-ECCO-texts Public

    Document level full-text of TCP transcribed ECCO docs (2188)

    Early-Modern-OCR/TCP-ECCO-texts’s past year of commit activity
    3 CC0-1.0 3 0 0 Updated Jan 19, 2016
  • GameraTraining Public

    Baskerville typeface training for Gamera OCR engine

    Early-Modern-OCR/GameraTraining’s past year of commit activity
    Python 0 0 0 0 Updated Dec 22, 2015
  • Early-Modern-OCR/juxta-ws-ruby’s past year of commit activity
    Ruby 0 Apache-2.0 0 0 0 Updated Nov 19, 2015
  • juxta-service Public

    Needed for emop-dashboard.

    Early-Modern-OCR/juxta-service’s past year of commit activity
    Java 0 0 0 0 Updated Nov 19, 2015

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…