Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chained workflows #25

Open
bertsky opened this issue Mar 20, 2023 · 0 comments
Open

chained workflows #25

bertsky opened this issue Mar 20, 2023 · 0 comments

Comments

@bertsky
Copy link
Owner

bertsky commented Mar 20, 2023

It would help if workflows can be chained at runtime, e.g. ocrd-make -f pre3.mk -f seg1.mk -f ocr4.mk -f post.mk, where each makefile would consume the last fileGrp of the previous – so each stage can be replaced by an alternative configuration independent of the others. This in turn would allow writing very concise small (sub-)configurations without repetition.

As for implementation, make allows passing multiple makefiles and reads them sequentially (w.r.t. first phase, i.e. expansion of immediate variables etc.), then combines them (second phase) and finally computes dependencies.

So we could by convention (for chainable configurations) allow defining a simply expanded variable (say) OUTPUT for the (phase's) output fileGrp name, and allow using INPUT for the (phase's) dynamic input fileGrp name. Internally then (i.e. in our Makefile that always needs to be included), we predefine INPUT := $(or $(OUTPUT),$(INPUT)) and .DEFAULT_GOAL := $(OUTPUT). For the very first phase (entry point), we then just have to pass INPUT – either in a separate (phase zero) non-rule config file or with an additional cmdline arg.

For example

  • pre3.mk
BIN: $(INPUT)
BIN: TOOL = ocrd-doxa-binarize

DESK: BIN
DESK: TOOL = ocrd-cis-ocropy-deskew
DESK: PARAMS = "level-of-operation": "page"

CROP: DESK
CROP: TOOL = ocrd-anybaseocr-crop
CROP: PARAMS = "rulerAreaMax": 0

OUTPUT := CROP
  • seg1.mk
SEG: $(INPUT)
SEG: TOOL = ocrd-kraken-segment
SEG: PARAMS = "model": "blla.mlmodel"

RESEG: SEG
RESEG: TOOL = ocrd-cis-ocropy-resegment
RESEG: PARAMS = "method": "baseline"

OUTPUT := RESEG
  • ocr4.mk
OCR1: $(INPUT)
OCR2: $(INPUT)
OCR3: $(INPUT)
OCR1 OCR2 OCR3: OPTIONS = -P textequiv_level glyph

OCR1: TOOL = ocrd-tesserocr-recognize
OCR1: OPTIONS += -P model frak2021+deu

OCR2: TOOL = ocrd-calamari-recognize
OCR2: OPTIONS += -P checkpoint_dir qurator-gt4histocr-1.0

OCR3: TOOL = ocrd-kraken-recognize
OCR3: OPTIONS += -P model austriannewspapers.mlmodel

MULTI: OCR1 OCR2 OCR3
MULTI: TOOL = ocrd-cor-asv-ann-align
MULTI: PARAMS = "method": "combined"

OUTPUT := MULTI
  • post.mk
ALTO: $(INPUT)
ALTO: TOOL = ocrd-fileformat-transform
ALTO: OPTIONS = -P from-to "page alto" -P script-args "--no-check-border --dummy-word"

OUTPUT := ALTO
  • in preinstalled Makefile
override INPUT := $(or $(OUTPUT),$(INPUT))
.DEFAULT_GOAL := $(OUTPUT)
...
  • running
make -f pre3.mk -f seg1.mk -f ocr4.mk -f post.mk INPUT=ORIGINAL

Since this only requires these 2 additional lines and does not break existing makefiles, this is more of a documentation issue actually. (And probably, the old makefiles should be removed or updated or split into multi-stage configurations anyway.)

@mikegerber would that fit your need as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant