Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Screen-OCR Documentation Need #15

Open
brighten6363 opened this issue Aug 30, 2023 · 6 comments
Open

Screen-OCR Documentation Need #15

brighten6363 opened this issue Aug 30, 2023 · 6 comments

Comments

@brighten6363
Copy link

brighten6363 commented Aug 30, 2023

Hello?
I've encountered screen-ocr[winrt] and I'm going to try it.
like.. lang=language, psm=6..etc.. used by tesseract.. How do I set the config parameter with screen-ocr??
I want to control parameters that psm, lang... ect.

If the block area is recognized using ocr_reader.read_screen(area),
the string is not output continuously.
Depending on the +/- number value or color of the letter, the order is not continuous and the order is mixed up.

I want to solve this problem.
Like using config parameter in tesseract..

What is the definition of kwargs in create_quality_reader(cls, **kwargs)?
I'd appreciate it if you could show me an example of how to use it.

I'm asking because I can't figure it out even if I look at the example.
thank U.

@wolfmanstout
Copy link
Owner

You can pass any of the following keyword arguments to create_quality_reader:

backend: Union[str, _base.OcrBackend],
tesseract_data_path=None,
tesseract_command=None,
threshold_function="local_otsu",
threshold_block_size=41,
correction_block_size=31,
convert_grayscale=True,
shift_channels=True,
debug_image_callback=None,
language_tag=None,

backend: _base.OcrBackend,
margin: int = 0,
resize_factor: int = 1,
resize_method=None, # Pillow resize method
debug_image_callback: Optional[Callable[[str, Any], None]] = None,
confidence_threshold: float = 0.75,
radius: int = 200, # screenshot "radius"
search_radius: int = 125,
homophones: Optional[Mapping[str, Iterable[str]]] = None,

Please note that some of the arguments in the first set only apply to one backend or another. It is unclear from your question whether you plan to use winrt or tesseract -- you will need to choose one or the other, and the former only works on Windows. If you do want to use tesseract, currently there is no way to directly override the config that gets passed to pytesseract, although it wouldn't be too difficult to plumb this through if that's what you need. Here is where the call to pytesseract actually happens:

results = pytesseract.image_to_data(
image, config=tessdata_dir_config, output_type=pytesseract.Output.DATAFRAME

@brighten6363
Copy link
Author

Thank you for your answer.
My questions, It's all about winrt.
In tesseract, whether to treat images as blocks, lines, or columns can be controlled by delivering psm values, but I wonder how it is possible in winrt.
Thank you~

@wolfmanstout
Copy link
Owner

Got it. WinRT has very minimal configuration options: https://learn.microsoft.com/en-us/uwp/api/windows.media.ocr.ocrengine?view=winrt-22621

@brighten6363
Copy link
Author

Thanks for your help.
Then, I don't think it would be easy to extract text from the image block in order. If It use winrt libarary..

But I don't understand why it's not output in order. For example, a table consisting of rows and columns with +/- numbers, winrt printout later some columns are ahead of other numbers. The same image on the tesseract with block config parameter, it will be printout in order..

@wolfmanstout
Copy link
Owner

I'm assuming you are referring to the ordering in the ScreenContents.as_string function. This is just a convenience function that concatenates in the original order that the OCR library produced:

def as_string(self) -> str:
"""Return the contents formatted as a string."""
lines = []
for line in self.result.lines:
words = []
for word in line.words:
words.append(word.text)
lines.append(" ".join(words) + "\n")
return "".join(lines)

If you want to do your own order, you are welcome to check the bounds of the OcrLines and concatenate these however you want, using a separate function.

@brighten6363
Copy link
Author

Thanks. :-)
Very similar to what I want. I'll test it in many ways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants