Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexible regex match error message in detect important massage #771

Merged
merged 5 commits into from
Mar 1, 2022

Conversation

unkcpz
Copy link
Member

@unkcpz unkcpz commented Jan 18, 2022

For new QE version (test on v6.8) 'some nodes have no k-points' is not raised as an error and stop the
calculation. The output is different but still contain the same content,
so instead of check content in line use regex match

@unkcpz unkcpz requested a review from sphuber January 18, 2022 17:34
@sphuber
Copy link
Contributor

sphuber commented Jan 18, 2022

One thing that is important to take into account is efficiency. The regexes will be executed a lot because each regex is matched against every line, so I can see this become very expensive. We should do some benchmarking to see the effect. And for a start, you may want to re.compile them, to speed it up already a bit.

@unkcpz
Copy link
Member Author

unkcpz commented Jan 18, 2022

@sphuber thanks for the head up. That's true, I will make a quick banchmark.

@unkcpz
Copy link
Member Author

unkcpz commented Jan 19, 2022

Regex slows down the parsing process of stdout raw (a little bit I would say?). Here are the benchmarks by using pytest-benchmark. The aiida.out for test contains 1802 lines and the following results are get from only checking two cases of massage_map errors in detect_important_message.

(1) using in

-------------------------------------------- benchmark: 1 tests --------------------------------------------
Name (time in ms)        Min     Max    Mean  StdDev  Median     IQR  Outliers       OPS  Rounds  Iterations
------------------------------------------------------------------------------------------------------------
test_bench            2.5649  2.8962  2.6353  0.0625  2.6057  0.1000      75;4  379.4594     331           1
------------------------------------------------------------------------------------------------------------

(2) regex without re.compile

-------------------------------------------- benchmark: 1 tests --------------------------------------------
Name (time in ms)        Min     Max    Mean  StdDev  Median     IQR  Outliers       OPS  Rounds  Iterations
------------------------------------------------------------------------------------------------------------
test_bench            4.1097  4.5742  4.2170  0.0864  4.2107  0.1277      62;3  237.1355     206           1
------------------------------------------------------------------------------------------------------------

(3) regex with re.compile

-------------------------------------------- benchmark: 1 tests --------------------------------------------
Name (time in ms)        Min     Max    Mean  StdDev  Median     IQR  Outliers       OPS  Rounds  Iterations
------------------------------------------------------------------------------------------------------------
test_bench            3.5193  3.8615  3.6254  0.0761  3.6118  0.1214      86;0  275.8345     242           1
------------------------------------------------------------------------------------------------------------

For the results above, I think maybe we can both use compiled re and string as markers? Only do regex check when the marker is a re.Pattarn, like:

    # Match any known error and warning messages
    for marker, message in message_map['error'].items():
        if isinstance(marker, re.Pattern):
            if marker.match(line):
                if message is None:
                    message = line
                logs.error.append(message)            
        else:  
            if marker in line:
                if message is None:
                    message = line
                logs.error.append(message)

The benchmark for this strategy:

-------------------------------------------- benchmark: 1 tests --------------------------------------------
Name (time in ms)        Min     Max    Mean  StdDev  Median     IQR  Outliers       OPS  Rounds  Iterations
------------------------------------------------------------------------------------------------------------
test_bench            3.1958  3.6412  3.2852  0.0752  3.2702  0.1210      73;3  304.3932     255           1
------------------------------------------------------------------------------------------------------------

@unkcpz
Copy link
Member Author

unkcpz commented Feb 25, 2022

Hi @sphuber, could you have a look at this? I think from the benchmark, the loss of efficiency is acceptable even all the checks move to REGEX.

@sphuber
Copy link
Contributor

sphuber commented Feb 26, 2022

I think it is fine to merge these changes. Would just ask you to try and trim down the test reference files. Especially for error testing, there is usually not a lot of the output file you exactly need, you can get rid of most of the text. Please try to get them as minimal as possible as to not bloat the repository too much

@unkcpz
Copy link
Member Author

unkcpz commented Feb 28, 2022

Thanks @sphuber!
I trimmed down the files and rebase the branch. Hope it is a lot more clear.

@sphuber
Copy link
Contributor

sphuber commented Feb 28, 2022

Any idea why the tests are now failing?

@unkcpz
Copy link
Member Author

unkcpz commented Feb 28, 2022

@sphuber Sorry that I didn't check the tests. I don't know why it is failing but I think it only fails for python3.6 (I remove the py3.6 from CI and others are passed). The failed status codes are exit_code(305, 'ERROR_OUTPUT_FILES', message='Both the stdout and XML output files could not be read or parsed.') and exit_code(350, 'ERROR_UNEXPECTED_PARSER_EXCEPTION', message='The parser raised an unexpected exception.')

Do we have a plan to deprecate py3.6? But always I can create a py3.6 environment to try to figure out why these fail.

EDIT: Test failed because for py<3.7 module 're' has no attribute 'Pattern'. I replace it with if hasattr(marker, "search"):

But what concerns me most is when the test fails the error message is not so obvious.

@unkcpz
Copy link
Member Author

unkcpz commented Feb 28, 2022

However the readthedocs test failed because of following error:

Traceback (most recent call last):
  File "/home/docs/.pyenv/versions/3.8.6/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/docs/.pyenv/versions/3.8.6/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/docs/checkouts/readthedocs.org/user_builds/aiida-quantumespresso/envs/771/lib/python3.8/site-packages/sphinx/__main__.py", line 13, in <module>
    from sphinx.cmd.build import main
  File "/home/docs/checkouts/readthedocs.org/user_builds/aiida-quantumespresso/envs/771/lib/python3.8/site-packages/sphinx/cmd/build.py", line 25, in <module>
    from sphinx.application import Sphinx
  File "/home/docs/checkouts/readthedocs.org/user_builds/aiida-quantumespresso/envs/771/lib/python3.8/site-packages/sphinx/application.py", line 32, in <module>
    from sphinx.config import Config
  File "/home/docs/checkouts/readthedocs.org/user_builds/aiida-quantumespresso/envs/771/lib/python3.8/site-packages/sphinx/config.py", line 27, in <module>
    from sphinx.util.tags import Tags
  File "/home/docs/checkouts/readthedocs.org/user_builds/aiida-quantumespresso/envs/771/lib/python3.8/site-packages/sphinx/util/tags.py", line 11, in <module>
    from jinja2 import nodes
  File "/home/docs/checkouts/readthedocs.org/user_builds/aiida-quantumespresso/envs/771/lib/python3.8/site-packages/jinja2/__init__.py", line 12, in <module>
    from .environment import Environment
  File "/home/docs/checkouts/readthedocs.org/user_builds/aiida-quantumespresso/envs/771/lib/python3.8/site-packages/jinja2/environment.py", line 25, in <module>
    from .defaults import BLOCK_END_STRING
  File "/home/docs/checkouts/readthedocs.org/user_builds/aiida-quantumespresso/envs/771/lib/python3.8/site-packages/jinja2/defaults.py", line 3, in <module>
    from .filters import FILTERS as DEFAULT_FILTERS  # noqa: F401
  File "/home/docs/checkouts/readthedocs.org/user_builds/aiida-quantumespresso/envs/771/lib/python3.8/site-packages/jinja2/filters.py", line 13, in <module>
    from markupsafe import soft_unicode
ImportError: cannot import name 'soft_unicode' from 'markupsafe' (/home/docs/checkouts/readthedocs.org/user_builds/aiida-quantumespresso/envs/771/lib/python3.8/site-packages/markupsafe/__init__.py)

Is it some dependency broken for docs related packages?

@unkcpz unkcpz force-pushed the fix/437/npool-success branch 2 times, most recently from 267e73d to 9838c2e Compare February 28, 2022 21:41
@sphuber
Copy link
Contributor

sphuber commented Mar 1, 2022

The problem with the docs is due to a recent release of markupsafe. See this issue and PR on aiida-core for problem analysis and temporary fix aiidateam/aiida-core#5367 aiidateam/aiida-core#5371 I can submit a quick PR to fix this.

But what concerns me most is when the test fails the error message is not so obvious.

I think the full stack trace is included and stored in the attributes. verdi process report should show it. But this doesn't really help from the logs of the CI indeed. Maybe we should improve the 350 exception message to include the title of the exception so it becomes The parser raised an unexpected exception: module 're' has no attribute 'Pattern'. Would be good to open an issue for this.

As for the timeline of deprecating Python 3.6, I think since aiida-core==2.0 will have dropped it, we can drop it also for the version of aiida-quantumespresso which will be compatible with that aiida-core, so for aiida-quantumespresso==4.0 most likely. This would drop both 3.6 and 3.7 then.

As for the temporary fix,it is ok to keep, but please add the following comment behind it # Replace with isinstance(marker, re.Pattern) once Python 3.6 is dropped

@sphuber
Copy link
Contributor

sphuber commented Mar 1, 2022

Fixed the docs in #790
So if you can just rebase and add the comment I suggested in my previous comment, we can merge this

For new QE version (test on v6.8) 'some nodes have no k-points' is not raised as an error and stop the
calculation. The output is different but still contain the same content,
so instead of check content in line use regex match
@unkcpz
Copy link
Member Author

unkcpz commented Mar 1, 2022

@sphuber Thanks a lot! I rebased the PR and add the comment as you suggest.

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THanks @unkcpz

@sphuber sphuber merged commit 96356d3 into aiidateam:develop Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants