Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add detailed trace mode showing saved object size and visual depth level #471

Merged
merged 15 commits into from
Jul 1, 2022

Conversation

leogama
Copy link
Contributor

@leogama leogama commented May 8, 2022

This is a detailed mode for the trace feature to help to better visualize the hierarchy of the objects being dumped and also to identify large child objects. If it is OK, I will write the documentation.

Usage:

import dill
dill.detect.trace(True, detail=True)
pickled = dill.dumps(obj)

dill/_dill.py Outdated
@@ -1176,7 +1211,7 @@ def _save_with_postproc(pickler, reduction, is_pickler_dill=None, obj=Getattr.NO
# License: https://github.com/cloudpipe/cloudpickle/blob/master/LICENSE
@register(CodeType)
def save_code(pickler, obj):
log.info("Co: %s" % obj)
pickler._log_trace("Co: %s", obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of limitations from long ago (#456), the pickler that is passed into the registered save functions is not necessarily a dill pickler and could have a missing _log_trace function.

I think that formatting the log trace as a tree is a good idea, but I think you should be using a Formatter instead of the pickler itself to do this:

https://stackoverflow.com/questions/35048921/format-log-messages-as-a-tree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now I get the whole is_dill() thing. It is related to other packages that also extend pickle. I'll take a look at Formatter then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other people can use the default pickler pickle.Pickler instead of dill.Pickler and they shouldn't see an error. This system is like this because in older versions of Python, the pickler class couldn't have its own dispatch. I think that it would be quick change to make the dispatch per class instead of global (cloudpickle did it), but it may be a breaking change and I didn't find time to verify that it isn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not so hard to verify whether the package is working after such a change, but for code that extends dill it could be tricky. There are both benefits and drawbacks of being able to extend pickle just by importing dill. Maybe it could be made an optional feature somehow.

Hey, look at this! My notes from learning how to use logging.Formatter (it's a lot of fun):

Basic calling sequence with StreamHandler and Formatter in Python 3.7: 
(All methods from the second line on are from logging.__init__.py)     
                                                                       
dill.logging.logger.log ->   (or info, warning, etc.)                  
  Logger.log ->                                                        
    Logger._log ->                                                     
      Logger.makeRecord ->                                             
        LogRecord.__init__                                             
      Logger.handle ->                                                 
        Logger.callHandlers ->                                         
          Handler.handle ->                                            
            StreamHandler.emit ->                                      
              Handler.format ->                                        
                Formatter.format ->                                    
                  LogRecord.getMessage                                 
                  Formatter.formatMessage ->                           
                    PercentStyle.format                                

@mmckerns
Copy link
Member

mmckerns commented May 9, 2022

So dill has a script that uses objgraph to get a visual understanding of some of the same information that trace provides. Just FYI in case you aren't aware of it...

@leogama
Copy link
Contributor Author

leogama commented May 9, 2022

So dill has a script that uses objgraph to get a visual understanding of some of the same information that trace provides. Just FYI in case you aren't aware of it...

I did saw it. It's a good tool to get an overview of large objects or entire module structures. My proposed addition is meant for detailed inspection of a single or few objects in the pickle, and the "tree style" is just a visual hint to help you to find it in the middle of a long log. An option for objgraph to visualize object sizes would be interesting, though.

@leogama
Copy link
Contributor Author

leogama commented May 9, 2022

Refactored the code using logging classes. Seems good.

@mmckerns
Copy link
Member

mmckerns commented May 9, 2022

I think that formatting the log trace as a tree is a good idea

I agree.

@leogama
Copy link
Contributor Author

leogama commented May 9, 2022

For me it's done. Feel free to edit the trace tree style in the formatter if you like.

@leogama
Copy link
Contributor Author

leogama commented May 9, 2022

Here is a sample output from the test_dictviews.py file:

> Dk: <dict_keys(['a'])>
>> Wr: <method 'keys' of 'dict' objects>
>>> F2: <function _getattr at 0x7fd372962820>
>>> # F2 (26 B)
>>> T1: <class 'dict'>
>>>> F2: <function _load_type at 0x7fd37295db80>
>>>> # F2 (17 B)
>>> # T1 (28 B)
>> # Wr (101 B)
> # Dk (156 B)
> Dv: <dict_values([1])>
>> Wr: <method 'values' of 'dict' objects>
>>> F2: <function _getattr at 0x7fd372962820>
>>> # F2 (26 B)
>>> T1: <class 'dict'>
>>>> F2: <function _load_type at 0x7fd37295db80>
>>>> # F2 (17 B)
>>> # T1 (28 B)
>> # Wr (105 B)
>> T1: <class 'enumerate'>
>> # T1 (18 B)
> # Dv (168 B)
> Di: <dict_items([('a', 1)])>
>> Wr: <method 'items' of 'dict' objects>
>>> F2: <function _getattr at 0x7fd372962820>
>>> # F2 (26 B)
>>> T1: <class 'dict'>
>>>> F2: <function _load_type at 0x7fd37295db80>
>>>> # F2 (17 B)
>>> # T1 (28 B)
>> # Wr (103 B)
>> D2: <dict object at 0x7fd37296ef00>
>> # D2 (9 B)
> # Di (116 B)

@anivegesana
Copy link
Contributor

Personally, I am more of a fan of blank space indentation or box drawing characters. If you think that is a good idea, I can look into later to see if it will look nicer that way without adding overhead.

https://en.m.wikipedia.org/wiki/Box_Drawing

@anivegesana
Copy link
Contributor

anivegesana commented May 10, 2022

@leogama Thoughts? leogama#1

┬ Dkvi: <dict_keys(['a'])>
├┬ Wr: <method 'keys' of 'dict' objects>
│├┬ F2: <function _getattr at 0x10af83b50>
││└ # F2 (26 B)
│├┬ T1: <class 'dict'>
││├┬ F2: <function _load_type at 0x10af82ef0>
│││└ # F2 (17 B)
││└ # T1 (28 B)
│└ # Wr (101 B)
├┬ D2: <dict object at 0x10ac88880>
│└ # D2 (9 B)
└ # Dkvi (114 B)

@leogama
Copy link
Contributor Author

leogama commented May 10, 2022

Sleek! I will just add a fallback clause for non-UTF-8 terminals just in case.

@anivegesana
Copy link
Contributor

Ah yes. Keep forgetting that Python 2 is still supported by dill. Glad you like it!

@leogama
Copy link
Contributor Author

leogama commented May 11, 2022

Ah yes. Keep forgetting that Python 2 is still supported by dill.

Actually, I had to work really hard to get ASCII output, which I eventually did with env -i LC_ALL=C python2. It seems that Python 2 also uses UTF-8 encoding for terminal output in most cases. But there must be some really dumb terminals out there.

And the Unicode strings were necessary just for the translate() method to work with Unicode characters. Otherwise it was working just fine.

@mmckerns Any final thoughts?

@mmckerns mmckerns added this to the dill-0.3.6 milestone May 29, 2022
@mmckerns mmckerns self-requested a review June 5, 2022 09:47
dill/logger.py Outdated Show resolved Hide resolved
Copy link
Member

@mmckerns mmckerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comments. Otherwise, LGTM.

dill/logger.py Outdated Show resolved Hide resolved
@leogama leogama requested a review from mmckerns June 22, 2022 14:42
Copy link
Member

@mmckerns mmckerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the suggested change to improve code compatibility across this PR.

dill/_dill.py Outdated Show resolved Hide resolved
@leogama leogama requested a review from mmckerns June 24, 2022 02:06
@leogama
Copy link
Contributor Author

leogama commented Jun 24, 2022

@mmckerns and @anivegesana: what do you think of converting trace to a context manager?

class trace:
    """print a trace through the stack when pickling; useful for debugging"""
    def __init__(self, boolean=None):
        self.boolean = boolean
        if boolean is not None:
            logger.setLevel(logging.INFO if boolean else logging.WARNING)
    def __enter__(self):
        self.previous_level = logger.level
        logger.setLevel(logging.INFO)
        return logger
    def __exit__(self, *exc_info):
        logger.setLevel(self.previous_level)
        return False
    def __repr__(self):
        if self.boolean is None:
            return super().__repr__()
        return "dill trace is %s" % ("ON" if self.boolean else "OFF")

It can be used in the same way as before too, but using it as a context manager preserves the previous log level:

>>> from dill import detect, dumps
>>> from dill.logger import logger
>>> d = {'a': 1, 'b': {}}
>>> logger.setLevel(40)
>>> detect.trace(True)
dill trace is ON
>>> _ = dumps(d)
┬ D2: <dict object at 0x7f83c2176180>
├┬ D2: <dict object at 0x7f83c2188c40>
│└ # D2 [2 B]# D2 [16 B]
>>> detect.trace(False)
dill trace is OFF
>>> _ = dumps(d)  # no trace
>>> logger.level
30
>>> logger.setLevel(40)
>>> with detect.trace():
>>>     dumps(d)
┬ D2: <dict object at 0x7f83c2176180>
├┬ D2: <dict object at 0x7f83c2188c40>
│└ # D2 [2 B]# D2 [16 B]
>>> _ = dumps(d)  # no trace
>>> logger.level
40

Copy link
Member

@mmckerns mmckerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Will you be adding the context manager to this PR? I'll approve or defer depending on your response.

@leogama
Copy link
Contributor Author

leogama commented Jun 25, 2022

LGTM. Will you be adding the context manager to this PR? I'll approve or defer depending on your response.

Yes, I prefer to finish this in one shot. Let me just update the documentation and add a test and we can close it.

@leogama leogama requested a review from mmckerns June 25, 2022 21:49
@leogama
Copy link
Contributor Author

leogama commented Jun 25, 2022

I've made the context manager implementation separated lately.

dill/logger.py Outdated Show resolved Hide resolved
Copy link
Member

@mmckerns mmckerns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the two comments on the context manager. Otherwise, LGTM.

@leogama leogama requested a review from mmckerns June 26, 2022 19:48
@mmckerns mmckerns merged commit 9000ab9 into uqfoundation:master Jul 1, 2022
@leogama leogama deleted the log-trace branch July 1, 2022 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants