Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sleep parameter for cpu_percent() calls #123

Closed
giampaolo opened this issue May 23, 2014 · 30 comments
Closed

Sleep parameter for cpu_percent() calls #123

giampaolo opened this issue May 23, 2014 · 30 comments

Comments

@giampaolo
Copy link
Owner

From jlo...@gmail.com on October 27, 2010 05:43:33

Reference: Issue #120 "Since cpu_percent is often paired with a 'time.sleep' 
call, I wonder if it would be simpler to have cpu_percent accept a "sleep" 
argument with default behavior of, say, 2s."

Opening this enh request as a reminder to look into this and decide if we want 
to add such a parameter to the percentage calculation functions. They require 
at least .1-.2 seconds to calculate a percentage for the interval, so it 
usually makes sense to use a sleep() call between requests. This would obviate 
the need for a separate call to time.sleep(). We could also make the default 0, 
i.e. do not sleep, which would avoid any performance issues with using this in 
the default case.

Original issue: http://code.google.com/p/psutil/issues/detail?id=123

@giampaolo giampaolo self-assigned this May 23, 2014
@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on October 26, 2010 20:43:44

Labels: OpSys-All

@giampaolo
Copy link
Owner Author

From g.rodola on October 27, 2010 12:54:56

I think it makes sense, but I'd be for post-poning the decision for version 0.2.1.
Why do you think this would be needed also for memory stats, btw?

@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on October 27, 2010 13:06:38

Good question... I guess I should probably not write bug reports/feature 
requests when I'm tired ;)

Summary: Sleep parameter for cpu_percent() calls
Labels: Milestone-0.2.1

@giampaolo
Copy link
Owner Author

From g.rodola on November 01, 2010 12:59:06

As per https://code.google.com/p/psutil/source/detail?r=766 discussion me and 
Yan tried to refactor cpu_percent() code a bit, also including the interval 
parameter while we were at it.

def cpu_percent(interval=0.1):
    """Return a float representing the current system-wide CPU 
    utilization as a percentage.
    """
    t1 = cpu_times()
    t1_all = sum(t1)
    t1_busy = t1_all - t1.idle

    time.sleep(interval)

    t2 = cpu_times()
    t2_all = sum(t2)
    t2_busy = t2_all - t2.idle

    busy_delta = t2_busy - t1_busy
    difference = t2_all - t1_all
    try:
        busy_perc = (busy_delta / difference) * 100
        if busy_perc < 0.0:
           raise ZeroDivisionError             
    except ZeroDivisionError:
        return 0.0
    else:
        return round(busy_perc, 1)

Notable changes are:

- (obviously) "interval" parameter defaulting to 0.1

- no longer rely on time.time() and neither NUM_CPUS: considering that 
cpu_times() already returns times for *all* CPUs I fail to understand why we 
were using NUM_CPUS in the first place

- returned floats are rounded with precision == 1. I did this because numbers 
might be printed by using scientific notation resulting in stuff like 
"3.46474172093e-09" which is quite misleading

Cc: yanraber

@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on November 01, 2010 13:24:44

I think in general this idea makes sense, and it's nice to avoid needing 
time.time()... the only downside to doing it this way is this now becomes a 
blocking call waiting for "interval" which is why we went with wall clock time 
originally. 

> considering that cpu_times() already returns times for *all* CPUs I fail to 
understand why we were 
> using NUM_CPUS in the first place

If you're comparing against wall clock time, you have to know how many CPUs are 
being counted, or you can't make the calculation to determint CPU usage.

Do we need to use the ZeroDivisionError try/catch? I see why they're in there, 
but maybe for maintainability later it would be better if we used explicit 
checks for error conditions that would result in a ZeroDivisionError. That way 
it's clearer and easier to debug if we run into further CPU util related bugs. 
Otherwise I'm ok with this change, I think it's likely to end up a little more 
accurate/reliable.

@giampaolo
Copy link
Owner Author

From g.rodola on November 01, 2010 13:50:52

I think it's better to keep the ZeroDivisionError catch, although I find it's 
kinda ugly as well. On Linux busy_delta = t2_busy - t1_busy might happen to be 
a negative number and I don't why that happens.

@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on November 01, 2010 13:58:05

> On Linux busy_delta = t2_busy - t1_busy might happen to be a negative number 
> and I don't why that happens.

Probably the same reason the code I wrote was coming up with negative values 
for the delta on Linux :D

@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on November 01, 2010 13:59:40

Before we decide to commit this, I would like to confirm that it's more 
accurate than (or at least equal to) the existing code we had. Can you compare 
the two? I am a little concerned that we'll end up with a lot of 0% utilization 
numbers when we shouldn't be (same issue I noted earlier for my code)

@giampaolo
Copy link
Owner Author

From g.rodola on November 01, 2010 14:22:35

Not sure how more accurate this is compared to previous version. 
I think it *should* be more accurate, because we dropped time.time(), I also 
compared it with htop and looks fine but I don't know other ways to test it 
better than this. 
Any hints?

> I am a little concerned that we'll end up with a lot of 0% utilization 
numbers when 
> we shouldn't be

I have the same concern as well, and most of them are caused when we catch 
ZeroDivisionError.
I also though whether it would be better to raise ValueError("interval too 
low") instead of falling back on 0.0, but I noticed that ZeroDivisionError 
occurs also when using 0.1 interval, after some cycles.

@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on November 01, 2010 14:36:36

It it seems to match htop on Linux and BSD, and taskmgr on Windows, then I'd 
say that's a good indicator (I can test OS X once it's committed). Mainly I 
don't want to be reporting wildly different values than the system task manager 
or top/ps, since that's been our standard for other functions.

As far as returning 0.0 values unnecessarily, I'm not sure what we can do about 
it. It seems like there's just not enough accuracy to ensure that we don't end 
up with negative deltas, especially with smaller intervals. Does this only 
happen on Linux? I didn't see this type of behavior on OS X, maybe it's 
specific to Linux anyway.

My vote would still be an explicit check to see if "difference" or "busy_delta" 
are <= 0, and if so we can return 0% util, rather than the try/catch. It's not 
ideal, and I'd rather address why we're seeing negative deltas in the first 
place, but I don't know what to do about that. Maybe if/when we implement 
per-CPU stats we'll have better luck there.

@giampaolo
Copy link
Owner Author

From g.rodola on November 02, 2010 08:31:36

Code committed in r768 , including removal of ZeroDivisionError try/except statement.
Still have to modify Process.get_cpu_percent() in the same manner, though.

> Does this only happen on Linux? I didn't see this type of behavior on OS X, maybe 
> it's specific to Linux anyway.

I'm going to see whether this is Linux only.
I have a doubt though. Right now cpu_times() return times for:

- user
- system
- nice
- idle
- irq
- softirq
- iowait

...and we assume that t1_all = sum(cpu_times()).
Are we sure about this?
Here: http://ubuntuforums.org/showthread.php?t=148781 ...for example, someone 
states that for calculating the percentage only user, nice, system, and idle 
times are required and the other values should be ignored.

Just 2 quick cents btw... I'll investigate further later today.

@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on November 02, 2010 09:07:41

> ...and we assume that t1_all = sum(cpu_times()).
> Are we sure about this?

The original code sample you pointed to here: 
http://www.boduch.ca/2009/02/python-cpu-usage.html Uses only user, nice, 
system, idle times. I think we should be doing the same, at least on Linux.

@giampaolo
Copy link
Owner Author

From g.rodola on November 02, 2010 12:25:10

Tested on both Windows and FreeBSD (where we are also using irq time to fill 
t_all variables) and everything is fine: busy_delta *never* happens to be 
negative hence it seems the problem only affects Linux.

On Linux I tried to calculate t_all as such (user + system + nice + idle) but 
didn't help either.
For some reason when the result is negative is because t2_idle happens to be 
greater than t1_idle.

@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on November 02, 2010 12:41:08

> For some reason when the result is negative is because t2_idle happens to be 
greater than t1_idle.

Do you mean t2_idle is less than t1_idle?

@giampaolo
Copy link
Owner Author

From g.rodola on November 02, 2010 12:48:32

No, the opposite:

# pdb session started when busy_delta happens to be negative
(Pdb) t1.idle
9831.7000000000007
(Pdb) t2.idle
9831.7199999999993

@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on November 02, 2010 12:52:09

I don't understand, why would that be a problem? Idle time should increase over 
time, so t2.idle is supposed to be >= t1.idle, depending on CPU utilization 
during that time period.

@giampaolo
Copy link
Owner Author

From g.rodola on November 02, 2010 13:01:17

You are right, sorry. Then problem is how we calculate busy time maybe?

(Pdb) t1_busy
1556.1700000000019
(Pdb) t2_busy
1556.1700000000001

What I do not understand is how t2_busy can be < t1_busy.
Or maybe I stopped to understand everything a long time ago. =)

@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on November 02, 2010 13:24:21

> (Pdb) t1_busy
> 1556.1700000000019
> (Pdb) t2_busy
> 1556.1700000000001

Hmm... that looks suspiciously like a precision issue. The values are probably 
effectively the same, but we could be losing some precision in the conversion 
to a float or in the arithmetic we're using to get the CPU times. Trivial example:

>>> 1 + 1.1
2.1000000000000001

@giampaolo
Copy link
Owner Author

From g.rodola on November 02, 2010 14:01:42

I think you're right.
This seems to fix the issue:


--- psutil/__init__.py  (revisione 768)
+++ psutil/__init__.py  (copia locale)
@@ -441,8 +441,16 @@
     t2 = cpu_times()
     t2_all = sum(t2)
     t2_busy = t2_all - t2.idle
+    
+    # This should avoid precision issues (t2_busy < t1_busy1).
+    # This is based on the assumption that anything beyond 2nd decimal
+    # digit is garbage.
+    # XXX - maybe it makes sense to do this for all numbers returned
+    # in cpu_times()
+    t1_busy = round(t1_busy, 2)
+    t2_busy = round(t2_busy, 2)

-    if t2_busy <= t1_busy:
+    if t2_busy == t1_busy:
         return 0.0

     busy_delta = t2_busy - t1_busy

@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on November 02, 2010 14:44:18

Excellent... do you want to make any further changes, or do you think we're ok 
to commit this officially then?

@giampaolo
Copy link
Owner Author

From g.rodola on November 02, 2010 15:02:24

I think we're ok with psutil.cpu_percent() but Process.get_cpu_percent() is not 
ok, now that I've just looked at it.
Aside from a similar code refactoring that still needs to be done, it seems it 
can return values > 100.0.
The code below shows the problem:

import psutil, os, time
p = psutil.Process(4103)
while 1:
    print p.get_cpu_percent()
    time.sleep(.001)


Also, I'm not sure whether it's a good idea to add an "interval" parameter as 
we did for system cpu percentage.
If you imagine an application like taskmgr.exe, a GUI which [lists/deals with] 
all processes, a blocking cpu_percent() call is useless.
For that kind of application it is better a get_cpu_percent() function which 
calculates elapsed time since last call, as the one we have right now.

Maybe we can refactor is at follows: 
- adds interval parameter defaulting to 0.1.
- if interval > 0 work as system cpu_times()
- if interval == 0 return immediately and calculate the result based on time 
elapsed since last call

What do you think?

@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on November 02, 2010 15:31:11

> I think we're ok with psutil.cpu_percent() but Process.get_cpu_percent() is 
not ok, now 
> that I've just looked at it. Aside from a similar code refactoring that still 
needs to be 
> done, it seems it can return values > 100.0.

I am not able to reproduce this... maybe this is another Linux-only issue? If 
so perhaps refactoring the code the same way will resolve it as well.

> - adds interval parameter defaulting to 0.1.
> - if interval > 0 work as system cpu_times()
> - if interval == 0 return immediately and calculate the result based on time 
elapsed since last call

That's fine for me, though if you think the main use case is to prefer a 
non-blocking call, maybe we should default to 0 for interval? Either way works for me.

@giampaolo
Copy link
Owner Author

From g.rodola on November 03, 2010 11:11:01

Code committed as r773 .
I decided to set 0.1 as interval default (hence, it's blocking).
I also removed cpu-percent related code from the constructor because it slows 
down quite a bit (25%), at least on Linux.
I still like an opinion about this though, hence I'm setting WaitingForReview 
state for now as long as Jay gets back.

Status: WaitingForReview
Labels: -Milestone-0.2.1 Milestone-0.2.0

@giampaolo
Copy link
Owner Author

From jlo...@gmail.com on November 05, 2010 19:43:18

does the code work with 0 as the interval, by calculating the interval since 
the last time the function was called? If so then I'm fine with the interval 
being .1 by default. I think it would be nice to have a non-blocking version of 
the call, but other than that I'm fine with either 0 or .1 as default.

@giampaolo
Copy link
Owner Author

From g.rodola on November 06, 2010 04:44:57

You get the non-blocking version by passing 0.0 or None as the argument.

@giampaolo
Copy link
Owner Author

From g.rodola on November 08, 2010 08:52:22

Modified psutil.cpu_percent() so that it can work in both blocking and 
non-blocking mode in the same manner in r774 .
Closing out as fixed.

Status: FixedInSVN

@giampaolo
Copy link
Owner Author

From g.rodola on November 08, 2010 12:57:00

Issue 131 has been merged into this issue.

@giampaolo
Copy link
Owner Author

From g.rodola on November 12, 2010 19:14:59

Status: Fixed

@giampaolo
Copy link
Owner Author

From g.rodola on June 09, 2011 15:33:51

Labels: -OpSys-All

@giampaolo
Copy link
Owner Author

From g.rodola on March 02, 2013 03:55:44

Updated csets after the SVN -> Mercurial migration: r768 == revision 
2171f705f0f4 r773 == revision 6366524f642b r774 == revision 589ce4a800a3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant