COCR is designed to convert an image of handwriting chemical structure to graph of that molecule.
COCR, Optical Character Recognition for Chemical Structures, was once a demo for my undergraduate graduation thesis in 2021.6. It brings OCSR(optical chemical structure recognition) capability into handwriting cases.
Symbol | String | Ring | Solid- | Hash- | Wavy- | Single- | Double- | Triple- |
---|---|---|---|---|---|---|---|---|
Appearance | (CH2)2COOEt | ⏣ | ▲ | △ | ~~ | / | // | /// |
Status | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
COCR is developed under Qt framework. It handles images with YOLO and CRNN models using opencv or ncnn backend.
Online Demo
- try on chromium-based web browser like Chrome and Edge: https://xuguodong1999.github.io/COCR.html
Supported platforms:
- Windows
- Android
- WebAssembly
- Mac OS
- Linux
- [2021/06] COCR v1.1 released with support for strings and wavy-bonds.
v1.1.on.Ubuntu.mp4
- [2021/02] COCR v1.0 released for simple cases.
Input | Detection | Render |
---|---|---|
- Support single element symbols: C、H、O、N、P、B、S、F、Cl、Br、I.
- Support bond types: single, double, triple, hash wedge, solid wedge, circle.
- CMake (At least 3.22), Ninja
- Git (For submodule clone)
- Qt 6 (Test with 6.5.2)
- Latest C++ Compiler (Test with GCC-11.x, MSVC-v17.x, Clang-16.x)
Qt Beginner's Guide
- for windows developer: have 7zip, wget and git-bash ready, or simply use WSL:
# download mingw, msvc binary
wget -r -np -nH -e robots=off http://mirrors.nju.edu.cn/qt/online/qtsdkrepository/windows_x86/desktop/qt6_652/
# extract all 7z files to a folder called "result"
find . -name '*.7z' -exec 7z x {} -aos -o./result \;
- for linux developer:
# download gcc_64 binary
wget -r -np -nH -e robots=off http://mirrors.nju.edu.cn/qt/online/qtsdkrepository/linux_x64/desktop/qt6_652/
# extract all 7z files to a folder called "result"
find . -name '*.7z' -exec 7z x {} -aos -o./result \;
- for macos developer:
# download clang_64 binary
wget -r -np -nH -e robots=off http://mirrors.nju.edu.cn/qt/online/qtsdkrepository/mac_x64/desktop/qt6_652/
# extract all 7z files to a folder called "result"
find . -name '*.7z' -exec 7z x {} -aos -o./result \;
after downloading and extracting, the folder structure looks like:
result/6.5.2
└── gcc_64
OR
result/6.5.2
└── msvc2019_64
OR
result/6.5.2
└── clang_64
Then copy gcc_64
or msvc2019_64
or clang_64
to your ideal library path and it's OK.
git clone https://github.com/xuguodong1999/COCR.git --branch main --single-branch --recursive
or using ssh
git clone git@github.com:xuguodong1999/COCR.git --branch main --single-branch --recursive
All third-party libraries except Qt are in third_party
directory, including boost, openbabel, rdkit, opencv, etc.
Only a minimal set of source codes is kept, and you can find custom changes in patch files under third_party
directory.
Nearly all build scripts for third-party libraries have been rewritten, to make cross-build, bugfix and feature hack easier.
Building COCR project is the same as common Qt 6 projects, the basic step is:
mkdir build && cd build
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON -DCMAKE_PREFIX_PATH=path/to/Qt6/binary/dir
cmake --build . --parallel --target COCR
for example, on linux desktop,
cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON -DCMAKE_PREFIX_PATH=~/shared/Qt/6.5.2/gcc_64
More custom flags to control SIMD, Release Info, etc.
FLAG | VALUE | USAGE |
---|---|---|
XGD_USE_OPENMP | ON/OFF | enable openmp, default auto-detect |
XGD_USE_VK | ON/OFF | enable vulkan, default auto-detect |
XGD_USE_CCACHE | ON/OFF | enable ccache, default auto-detect |
XGD_FLAG_MARCH_NATIVE | ON/OFF | add -march=native for gcc compiler, default ON |
XGD_FLAG_WASM_SIMD128 | ON/OFF | add -msimd128 for wasm, default ON |
XGD_BUILD_WITH_GRADLE | ON/OFF | disable custom cmake directory layout for android studio cmake plugin, default OFF |
XGD_OPT_RC | ON/OFF | add release info for products, default OFF |
XGD_NO_DEBUG_CONSOLE | ON/OFF | hide debug console, default OFF |
XGD_OPT_ARCH_X86 | ON/O |