Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Site sluggishness #1548

Open
Martii opened this issue Dec 15, 2018 · 81 comments
Open

Site sluggishness #1548

Martii opened this issue Dec 15, 2018 · 81 comments
Assignees
Labels
SOFT BLOCKING Removable by Active Maintainer but usually blocking new features. stability Important to operations. tracking upstream Waiting, watching, wanting.

Comments

@Martii
Copy link
Member

Martii commented Dec 15, 2018

So I've been contacting all kinds of people in the last 24 hours with no clear resolution on why sometimes the site is super fast vs super slow.

  • I did manage to find https://downdetector.com/status/level3/map/ in which Level3 is one of the hops in my inet connection and it's super latent comparatively.
  • Right now Europe is having better connect time than the US. Asia/Pacific is having higher latency than normal. Re: https://www.monitis.com/traceroute/
  • Our VPS says everything is okay from their testing points. Tried alternate networks as well with the same intermittent results.
  • We're not getting any additional DoS any more than the trapped normal.
  • Traffic is nominal via our process manager (1% to 5% atm)
  • Nothing unusual detected in packets... seems about normal. Some UDP traffic from google a lot but that's goo for you.

When I know anything more I'll let everyone know however there isn't much to be done as everything has been triple checked on our end (hence the dep updates a couple of times in the last few days [that's not typical for my updating], server restarts, and we did an unannounced backup since it's about that time.).

Anyhow... just letting everyone know we're on top of what we can do. Apologies for what we can't do.

P.S. When it's sitting idle in the browser (spinner spinning) it's doing nothing in the network management tools e.g. your request isn't reaching us every time atm from the test points we tried and our VPS (real person) confirmed occasionally it's taking an excessive period from their testing.

Refs:

@Martii Martii added tracking upstream Waiting, watching, wanting. stability Important to operations. labels Dec 15, 2018
Martii added a commit to Martii/OpenUserJS.org that referenced this issue Dec 15, 2018
* Link in the FAQ for this

Post OpenUserJS#944 OpenUserJS#970 OpenUserJS#389 ... missed somewhere around OpenUserJS#976 to OpenUserJS#1208 *(vaguely recall this was on the script homepage originally and moved to source code page)*. Needed for OpenUserJS#1548 to calm network traffic issues which appear to be global with Level3. Over 17,000 sites are down according to pingdom.
Martii added a commit that referenced this issue Dec 15, 2018
* Link in the FAQ for this

Post #944 #970 #389 ... missed somewhere around #976 to #1208 *(vaguely recall this was on the script homepage originally and moved to source code page)*. Needed for #1548 to calm network traffic issues which appear to be global with Level3. Over 17,000 sites are down according to pingdom.

Auto-merge
@Martii
Copy link
Member Author

Martii commented Dec 16, 2018

Had a friend in Wisconson (say cheese ;) try it and it's not loading for him either.

Here's my traceroute:

$ time traceroute openuserjs.org
traceroute to openuserjs.org (104.236.255.50), 30 hops max, 60 byte packets
 1  *(Intranet)*  0.614 ms  0.646 ms  0.672 ms
 2  *(ISP)*      10.865 ms  10.872 ms  16.676 ms
 3  *(ISP hop)*  20.595 ms  20.588 ms  20.592 ms
 4  *(ISP hop)*  17.291 ms  17.301 ms  17.815 ms
 5  *(ISP hop)   16.543 ms  17.109 ms  17.116 ms
 6  *(ISP City hop)*.Level3.net 17.123 ms  16.745 ms  16.750 ms
 7  ae-2-3602.ear4.Newark1.Level3.net (4.69.211.181)  64.112 ms  59.858 ms  59.861 ms
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *

real    0m25.048s *(Time to site way to long for these requests)*
user    0m0.003s
sys     0m0.004s

Of the sites monitored by https://livemap.pingdom.com/ it's up to ~18,000 now but the Level3 map url is more specific to opening this issue.

@Martii
Copy link
Member Author

Martii commented Dec 16, 2018

I don't believe this is MongoLabs or AWS connection issue as dev is working at normal speed. Thinking the internet backbone is damaged via Level3 (owned by CenturyLink now... probably answers the why it is broken perhaps).

Anyhow... no new status updates. Spent a wonderful time with my ISP confirming they deny anything is problematic with their service (and cross referenced with a different provider out of State).


Temporarily downgraded node... not the issue and tested roll back to about a85b989 ... no change.

@Martii
Copy link
Member Author

Martii commented Dec 16, 2018

Some summary info our VPS provider had me run with some tests:

  1. Level3 is having packet loss on immediate and sustained testing. DSL and some Cable ISPs will be affected by this. Haven't tested cell service other than attempted visits to the site which didn't work most of the time from iOS or Android.
  2. AWS (RAW storage) has some loss after lengthy sustained testing. They probably just don't like me doing a sustained test although if enough ppl are visiting the site it's a more accurate result.
  3. MongoLabs (BSON storage) has some loss after lengthy sustained testing. Again they probably just don't like me doing a sustained test although if enough ppl are visiting the site it's a more accurate result.

I've triple checked the firewall. Functional and okay.

Still awaiting any further suggestions including some preferred results... until then it's the waiting game. Ughh.

Still sitting around 1% to 9% with process usage (few spikes when downloading certain items but nominal is occurring). Removed letsencrypt package and it seemed to improve that from 3% (or just more ppl trying). Google Public DNS still hammering UDP (plus I'm trying it myself instead of my ISPs... no perceptual difference than native). I'd block it but then anyone using it wouldn't be able to reach us. Heh.

@Martii
Copy link
Member Author

Martii commented Dec 17, 2018

When it rains it pours... yet another issue bleh. (cascading is my guess)

Should rule out node by using nvm atm. e.g. gcc issues with system compilied node. Using precompiled node. gcc (Debian 4.9.2-10+deb8u2) 4.9.2 is different than back in March and I do recall an extra notice within the last few months that said the distro was adding this for some compatibility check.

Passed all tests back then and even now... however since this issue appeared out of "thin air" I'm lead to believe we need a VPS migration to a newer distro. It's near the holidays and I'm super busy but I'll try to squeeze something in after @sizzlemctwizzle responds. Will try project in a VM first though as I've had that prepped since March. Until then this is BLOCKING which means sizzle has to unblock it when migration occurs.

Found that async dep is lagging on .parallel and .series. Test point console dropped in here and it was definitely lagging there. Tried async@next mentioned from caolan/async#1589 and ended up with the same lagging issue... so possibly not a dep issue.

@Martii Martii added the BLOCKING Concentrate on bug fixes primarily. label Dec 17, 2018
@brazenvoid
Copy link

brazenvoid commented Dec 17, 2018

Facing 2-3 minutes of load on every page on a VPN secured connection through Iceland from Pakistan. Am on a 20Mbps U/D fiber connection from a Tier 2 ISP.

Tracert:

  1   189 ms   189 ms   189 ms  10.8.8.1
  2   189 ms   192 ms   191 ms  185.159.158.252
  3   225 ms   228 ms   226 ms  be-2-ver.peer1tc2.ams.nl.is1net.net [31.15.113.5]
  4   224 ms   224 ms   256 ms  ix-xe-4-1-1-0.tcore1.av2-amsterdam.as6453.net [195.219.194.109]
  5   294 ms   295 ms   292 ms  if-ae-2-2.tcore2.av2-amsterdam.as6453.net [195.219.194.6]
  6   296 ms   295 ms   299 ms  if-ae-14-2.tcore2.l78-london.as6453.net [80.231.131.160]
  7   302 ms   300 ms     *     if-ae-4-2.tcore2.n0v-new-york.as6453.net [80.231.131.5]
  8   296 ms   295 ms   294 ms  if-ae-2-2.tcore1.n0v-new-york.as6453.net [216.6.90.21]
  9   300 ms   300 ms   299 ms  if-ae-7-2.tcore1.nto-new-york.as6453.net [63.243.128.25]
 10   298 ms   302 ms   547 ms  if-ae-9-2.tcore1.n75-new-york.as6453.net [63.243.128.122]
 11   605 ms     *      360 ms  66.110.96.22
 12     *        *        *     Request timed out.
 13     *        *        *     Request timed out.
 14   306 ms   322 ms   306 ms  104.236.255.50

@Martii
Copy link
Member Author

Martii commented Dec 17, 2018

@brazenvoid
Appreciate that report. Cleaning up my test VM before I create a new VPS with the next distro version. Want to make sure I'm not wasting money or too much time. So far in the VM it's not lagging but that's only using dev DBs/system. Local pro has to be setup and is quite lengthy to do it right. The real, final, test will have to be on production unfortunately. The VM I created mostly mirrors what we have on production.

Thanks for the continued patience. This is going to be interesting getting done in between the holiday stuff I have planned because I'll be AFK for quite a bit of it... so please continue patience. This might take a couple of weeks since I don't have full access. Sorry... but I'm trying. :) Until then just try to use the site as is... the best recommendation at the moment.

@Martii
Copy link
Member Author

Martii commented Dec 17, 2018

Hmmm new distro via MongoDB only has MongoDB 4... that can present a problem/delay with express-brute-mongo.

@Martii
Copy link
Member Author

Martii commented Dec 17, 2018

OOOH.. not good... local pro delay with next version of distro in VM. Guess I'll need to add that async test back in a local branch. It is about peak internet time so that could be a little bit of it too. Will do some more thorough testing.


  • express-brute-mongo (our fork) seems okay with MongoDB 4. Forced a detection and it wrote to the DB.

@Martii
Copy link
Member Author

Martii commented Dec 17, 2018

Okay... pro was dragging AWS and MongoLabs down in local pro. Killed pro and local pro is at top speed (my usual perusing the site) atm. Still have the lurking Level3 issue in my outbound network for a possible additional reason.


Restored pro to "online-expected".

@Martii
Copy link
Member Author

Martii commented Dec 17, 2018

@sizzlemctwizzle

I'm still recommending the VPS upgrade to a new one esp. to (hopefully) resolve this issue. Adding the extra security will take a bit more time on the VPS but once it's all in place I can't move the DNS. That's something you'll need to do. Please stick with IPv4 for now. Some of our deps don't currently do well at IPv6. Plus this will cost some extra during the setup to migration from them. I'm building a list of what needs to be done. We really need the next level up to with more vCores which is more mulah per month.

I'm about beat from lack of sleep so I'll await your response(s) please.

@Martii
Copy link
Member Author

Martii commented Dec 18, 2018

Hmmm looks like I have DNS access change... we'll know for sure in ~24 to 48 hours.

@Martii
Copy link
Member Author

Martii commented Dec 18, 2018

So before the DNS propagated to me it was fine using the direct IP. Granted no one else had that IP. There's very few things left that I can think of.

The server was recreated to the same data center but better cpu/mem/ssd stats. I can try to migrate it to another data center but have to do a backup first (and another IP change and some other details too)

  • Still have that Level3 issue
  • Might still have that async issue. But at least gcc incompatibilities should be ruled out as well as server stats.

I'm also needing a break after 12 hours of this.

P.S. It's still lagging but lagging quicker heh. Note pingdom url is up to > 21000 atm


Misc notes:

  • If I hit https://openuserjs.org/about a few dozen times it's fine, so far, because it never contacts AWS/MongoLabs... It seems like async stalls when it does visit our storage. I'll have to ponder on this some more.
  • Tried bluebird out on local pro on the off chance that it's incompatible with native Promises in node and since async uses bluebird as a devDependencies... no difference
  • Killed pro again and ran local pro... full speed ahead again... started pro then local pro started lagging. :\

@Martii
Copy link
Member Author

Martii commented Dec 18, 2018

Misc test note

Local pro test (this means I'm not using our VPS providers hosting for those new here but actual pro(duction) is running atm) with this diff and async dep at this code block point which has already contacted MongoLabs and succeeded in the callback but somehow messes up async now. (This code block point hasn't changed in quite some time):

diff --git a/controllers/user.js b/controllers/user.js
index 37ec725..3fa8f27 100644
--- a/controllers/user.js#b914392
+++ b/controllers/user.js
@@ -390,6 +390,9 @@ exports.userListPage = function (aReq, aRes, aNext) {
 
     async.parallel([
       function (aCallback) {
+          
+        console.timeEnd('userListPage()');
+          
         if (!!!options.isFlagged || !options.isAdmin) {  // NOTE: Watchpoint
           aCallback();
           return;
@@ -440,6 +443,9 @@ exports.userListPage = function (aReq, aRes, aNext) {
   tasks.push(execQueryTask(userListQuery, options, 'userList'));
 
   //---
+  
+  console.time('userListPage()');
+  
   async.parallel(tasks, asyncComplete);
 };
 
@@ -462,6 +468,9 @@ exports.view = function (aReq, aRes, aNext) {
 
       async.parallel([
         function (aCallback) {
+            
+          console.timeEnd('view() ' + username);
+            
           if (!options.isAdmin) {  // NOTE: Watchpoint
             aCallback();
             return;
@@ -525,6 +534,9 @@ exports.view = function (aReq, aRes, aNext) {
     tasks = tasks.concat(stats.getSummaryTasks(options));
 
     //---
+    
+    console.time('view() ' + username);
+    
     async.parallel(tasks, asyncComplete);
   });
 };

... produces this output on some random clicks of users and user list:

view() -JesperJod: 332.085ms
view() -hoverboard: 69.237ms
view() -_ArmandLevas: 63.867ms
view() -lavienrose: 64.703ms
view() 00000H: 67.742ms
view() 0097gvk: 71.954ms
view() 04MR17: 75.263ms
view() 01018575475: 69.633ms
view() 007: 70.182ms
view() -mg-: 67.090ms
userListPage(): 405.968ms
view() 1544cman2000gmail.com: 70.499ms
view() 1solutions: 32389.779ms
view() 160004000: 25381.381ms
view() 1solutions: 64.616ms

... Some are quick... some are realllllllllllllly slow.

Same test with some more:

userListPage(): 134.631ms
userListPage(): 139.506ms
userListPage(): 109.424ms
userListPage(): 148.130ms
userListPage(): 147.365ms
userListPage(): 132.894ms
userListPage(): 136.901ms
userListPage(): 125.249ms
userListPage(): 129.335ms
userListPage(): 127.833ms
userListPage(): 119.766ms
userListPage(): 135.394ms
userListPage(): 132.066ms
userListPage(): 234.387ms
userListPage(): 769.262ms
userListPage(): 121.794ms
userListPage(): 111.775ms
userListPage(): 122.818ms
userListPage(): 108.172ms
userListPage(): 388.980ms
userListPage(): 352.293ms
userListPage(): 149.513ms
userListPage(): 172.532ms
userListPage(): 150.295ms
userListPage(): 594.304ms
userListPage(): 221.400ms
userListPage(): 195.818ms
userListPage(): 174.826ms
userListPage(): 198.745ms
userListPage(): 204.019ms
userListPage(): 135.008ms
userListPage(): 105.093ms
userListPage(): 112.189ms
userListPage(): 93.537ms
userListPage(): 111.668ms
userListPage(): 235.212ms
userListPage(): 259.961ms
userListPage(): 304.103ms
userListPage(): 318.409ms
userListPage(): 489.174ms
userListPage(): 582.791ms
userListPage(): 139.654ms
userListPage(): 122.359ms
userListPage(): 771.824ms
userListPage(): 601.540ms
userListPage(): 199.029ms
view() 93Akkord: 19528.984ms
view() 99aintenough: 9438.322ms
view() 9tfall: 111.546ms
view() 9kopb: 69.801ms
view() AAK: 66.806ms
view() ADRENALINE1234: 71.058ms
view() AJMansfield: 24289.024ms

... last 10 or so userListPage() are actually different pages... the ones above are the same exact page over and over (page refresh like a human could do).

@Martii
Copy link
Member Author

Martii commented Dec 18, 2018

Finally!!! Found evidence and confirmation of network issue on production (from VPS to MongoLabs):

2018-12-18 13:30:17.100 +00:00: Group rating NOT updated aErr := MongoNetworkError: connection 10 to *clipped*.mongolab.com:*portclipped* timed out aGroup := undefined

Martii added a commit to Martii/OpenUserJS.org that referenced this issue Dec 19, 2018
* This didn't seem to help in a direct test on production but doing since it's the logical thing to do with our current process manager.

NOTE(S):
* We don't currently have clustering management in the project itself but may at some point in the future so this could eventually use some improvement. Trying to keep this simple at start for everyone.


Applies to OpenUserJS#1548
Martii added a commit to Martii/OpenUserJS.org that referenced this issue Dec 19, 2018
* This value is per thread otherwise it would be huge in each thread. Durr.

Applies to OpenUserJS#1548
Martii added a commit to Martii/OpenUserJS.org that referenced this issue Dec 19, 2018
* Since commit notes said it didn't seem to help let's try tripling the multiplier

Applies to OpenUserJS#1548
Martii added a commit that referenced this issue Dec 19, 2018
* This didn't seem to help in a direct test on production but doing since it's the logical thing to do with our current process manager.

NOTE(S):
* We don't currently have clustering management in the project itself but may at some point in the future so this could eventually use some improvement. Trying to keep this simple at start for everyone.
* Go back to the original multiplier
* This value is per thread otherwise it would be huge in each thread. Durr.
* Try triple multiplier
* Since commit notes said it didn't seem to help let's try tripling the multiplier

Applies to #1548
@Martii
Copy link
Member Author

Martii commented Dec 19, 2018

Misc test note

I've temporarily audited url usage (on pro) this morning for about 4 minutes on a single thread and most of the requests are to .meta.js and .user.js which is AWS and not MongoLabs.

Martii added a commit to Martii/OpenUserJS.org that referenced this issue Dec 19, 2018
* Delete op retested
* Please read their CHANGELOG... however it's a bug fix that we're looking for and there is one. May apply to OpenUserJS#1548
@Martii Martii mentioned this issue Dec 19, 2018
Martii added a commit that referenced this issue Dec 19, 2018
* Delete op retested
* Please read their CHANGELOG... however it's a bug fix that we're looking for and there is one. May apply to #1548

Auto-merge
@Joeviocoe
Copy link

The main site is still sluggish for me (several minutes to load)... but the requests for .user.js (AWS not Mongo?) is completely unresponsive. Nothing but HTTP code 429 (too many requests) 95% of the time, and the other 5% return HTTP code 444 (unknown).
Is the AWS instance rate limited like this? Also, no "Retry-After" header in the response.

Martii added a commit that referenced this issue Jul 1, 2019
* Change a few classes around for UI coloring and display
* Shows exclamation on script homepage and script lists that there **may** be user initiated, system initiated, etc. notices on the Install button or Raw Source button e.g. the Source Code tab contents.
* Few mustache identifier name changes for symmetry

Applies to #1548 #432

Auto-merge
@Martii
Copy link
Member Author

Martii commented Jul 1, 2019

Installing script ends up in a white page with url

This is still true for lockdown but there is some advanced warning now other than the about page. The blue install button becomes orange'ish if there are any "Source Code notices" including script source lockdown. All may still be found in the drop down.

Martii added a commit to Martii/OpenUserJS.org that referenced this issue Jul 1, 2019
* Little less prominent for warnings vs. possible critical issue
* More in line with the docs that mostly say "blue install button"
* Add `updateURL` check for all modes and display if present
* Reorder the UI notices a bit.
* Some line length conformance


Post OpenUserJS#1632 and applies to OpenUserJS#1548 OpenUserJS#432
Martii added a commit that referenced this issue Jul 1, 2019
* Little less prominent for warnings vs. possible critical issue
* More in line with the docs that mostly say "blue install button"
* Add `updateURL` check for all modes and display if present
* Reorder the UI notices a bit.
* Some line length conformance


Post #1632 and applies to #1548 #432

Auto-merge
Martii added a commit to Martii/OpenUserJS.org that referenced this issue Jul 2, 2019
* Chromium 75.0.3770.90 started spewing this out and it's not in *mime-db* dep (yet?)... Relates to `/install/<username>/<scriptname>.meta.js`. Don't think it has an extension spec based off skimming doc
* Relaxing is temporary atm in lieu of more aggressive re

Post OpenUserJS#1633 OpenUserJS#1632 OpenUserJS#944 and applies to OpenUserJS#1548 OpenUserJS#432
Martii added a commit that referenced this issue Jul 2, 2019
* Chromium 75.0.3770.90 started spewing this out and it's not in *mime-db* dep (yet?)... Relates to `/install/<username>/<scriptname>.meta.js`. Don't think it has an extension spec based off skimming doc
* Relaxing is temporary atm in lieu of more aggressive re

Post #1633 #1632 #944 and applies to #1548 #432

Auto-merge
Martii added a commit to Martii/OpenUserJS.org that referenced this issue Dec 4, 2019
* Please read their CHANGELOGs
* Delete op retested
* Post OpenUserJS#1628 partial rollback now that v12.x is LTS and no longer the issue in OpenUserJS#1548
Martii added a commit that referenced this issue Dec 4, 2019
* Please read their CHANGELOGs
* Delete op retested
* Post #1628 partial rollback now that v12.x is LTS and no longer the issue in #1548

Auto-merge
@Martii Martii added SOFT BLOCKING Removable by Active Maintainer but usually blocking new features. and removed BLOCKING Concentrate on bug fixes primarily. labels Aug 18, 2020
@Martii
Copy link
Member Author

Martii commented Aug 18, 2020

Created a new label for this since I put it on... still means we should work on debugging but AM can remove.

Martii added a commit to Martii/OpenUserJS.org that referenced this issue Dec 28, 2021
* Add another default. This is a breaking change for third-party instances so version bump

Closes OpenUserJS#1745 and applies to OpenUserJS#1548

NOTE:
* Issue OpenUserJS#1745 has been hung for over a year. If needed may be reopened by appropriate personnel.
Martii added a commit that referenced this issue Dec 28, 2021
* Add another default. This is a breaking change for third-party instances so version bump

Closes #1745 and applies to #1548

NOTE:
* Issue #1745 has been hung for over a year. If needed may be reopened by appropriate personnel.

Auto-merge
Martii added a commit to Martii/OpenUserJS.org that referenced this issue Dec 2, 2022
* Libraries will be affected in lockdown atm... first usage.
* Not sure visible (graceful) messages are needed but we'll give it a whirl atm.

Post OpenUserJS#944 OpenUserJS#1548
Martii added a commit that referenced this issue Dec 2, 2022
* Libraries will be affected in lockdown atm... first usage.
* Not sure visible (graceful) messages are needed but we'll give it a whirl atm.

Post #944 #1548

Auto-merge
Martii added a commit to Martii/OpenUserJS.org that referenced this issue Dec 4, 2022
* Second usage
* Autoban comes much sooner for bad actors


Post OpenUserJS#944 OpenUserJS#1548
Martii added a commit that referenced this issue Dec 4, 2022
* Second usage
* Autoban comes much sooner for bad actors


Post #944 #1548

Auto-merge
Martii added a commit to Martii/OpenUserJS.org that referenced this issue Feb 26, 2023
Martii added a commit that referenced this issue Feb 26, 2023
* More can be done if needed

Post #944 #1548

Auto-merge
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SOFT BLOCKING Removable by Active Maintainer but usually blocking new features. stability Important to operations. tracking upstream Waiting, watching, wanting.
Development

No branches or pull requests

4 participants