Skip to content

OCR/OCSR on handwritting ⏣/chemical-structural-formulas with YOLO & CRNN models.

License

Notifications You must be signed in to change notification settings

xuguodong1999/COCR

Repository files navigation

COCR

COCR is designed to convert an image of handwriting chemical structure to graph of that molecule.

COCR, Optical Character Recognition for Chemical Structures, was once a demo for my undergraduate graduation thesis in 2021.6. It brings OCSR(optical chemical structure recognition) capability into handwriting cases.

Symbol String Ring Solid- Hash- Wavy- Single- Double- Triple-
Appearance (CH2)2COOEt ~~ / // ///
Status ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️

COCR is developed under Qt framework. It handles images with YOLO and CRNN models using opencv or ncnn backend.

Online Demo

Supported platforms:

  • Windows
  • Android
  • WebAssembly
  • Mac OS
  • Linux

News

  • [2021/06] COCR v1.1 released with support for strings and wavy-bonds.
v1.1.on.Ubuntu.mp4
  • [2021/02] COCR v1.0 released for simple cases.
Input Detection Render
png png png
  1. Support single element symbols: C、H、O、N、P、B、S、F、Cl、Br、I.
  2. Support bond types: single, double, triple, hash wedge, solid wedge, circle.

Build

System Requirements

  1. CMake (At least 3.22), Ninja
  2. Git (For submodule clone)
  3. Qt 6 (Test with 6.5.2)
  4. Latest C++ Compiler (Test with GCC-11.x, MSVC-v17.x, Clang-16.x)
Qt Beginner's Guide
  • for windows developer: have 7zip, wget and git-bash ready, or simply use WSL:
# download mingw, msvc binary
wget -r -np -nH -e robots=off http://mirrors.nju.edu.cn/qt/online/qtsdkrepository/windows_x86/desktop/qt6_652/
# extract all 7z files to a folder called "result"
find . -name '*.7z' -exec 7z x {} -aos -o./result \;
  • for linux developer:
# download gcc_64 binary
wget -r -np -nH -e robots=off http://mirrors.nju.edu.cn/qt/online/qtsdkrepository/linux_x64/desktop/qt6_652/
# extract all 7z files to a folder called "result"
find . -name '*.7z' -exec 7z x {} -aos -o./result \;
  • for macos developer:
# download clang_64 binary
wget -r -np -nH -e robots=off http://mirrors.nju.edu.cn/qt/online/qtsdkrepository/mac_x64/desktop/qt6_652/
# extract all 7z files to a folder called "result"
find . -name '*.7z' -exec 7z x {} -aos -o./result \;

after downloading and extracting, the folder structure looks like:

result/6.5.2
└── gcc_64
OR
result/6.5.2
└── msvc2019_64
OR
result/6.5.2
└── clang_64

Then copy gcc_64 or msvc2019_64 or clang_64 to your ideal library path and it's OK.


Get the Code

git clone https://github.com/xuguodong1999/COCR.git --branch main --single-branch --recursive

or using ssh

git clone git@github.com:xuguodong1999/COCR.git --branch main --single-branch --recursive

All third-party libraries except Qt are in third_party directory, including boost, openbabel, rdkit, opencv, etc.

Only a minimal set of source codes is kept, and you can find custom changes in patch files under third_party directory.

Nearly all build scripts for third-party libraries have been rewritten, to make cross-build, bugfix and feature hack easier.

Compile

Building COCR project is the same as common Qt 6 projects, the basic step is:

mkdir build && cd build
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON -DCMAKE_PREFIX_PATH=path/to/Qt6/binary/dir
cmake --build . --parallel --target COCR

for example, on linux desktop,

cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON -DCMAKE_PREFIX_PATH=~/shared/Qt/6.5.2/gcc_64
More custom flags to control SIMD, Release Info, etc.
FLAG VALUE USAGE
XGD_USE_OPENMP ON/OFF enable openmp, default auto-detect
XGD_USE_VK ON/OFF enable vulkan, default auto-detect
XGD_USE_CCACHE ON/OFF enable ccache, default auto-detect
XGD_FLAG_MARCH_NATIVE ON/OFF add -march=native for gcc compiler, default ON
XGD_FLAG_WASM_SIMD128 ON/OFF add -msimd128 for wasm, default ON
XGD_BUILD_WITH_GRADLE ON/OFF disable custom cmake directory layout for android studio cmake plugin, default OFF
XGD_OPT_RC ON/OFF add release info for products, default OFF
XGD_NO_DEBUG_CONSOLE ON/OFF hide debug console, default OFF
XGD_OPT_ARCH_X86 ON/O