Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Riot-web is very greedy on RAM (and can make chrome crash) #3307

Closed
richvdh opened this issue Feb 23, 2017 · 67 comments · Fixed by matrix-org/matrix-js-sdk#395
Closed

Riot-web is very greedy on RAM (and can make chrome crash) #3307

richvdh opened this issue Feb 23, 2017 · 67 comments · Fixed by matrix-org/matrix-js-sdk#395
Assignees
Labels
A-Memory Memory leaks, leak hunting tools P2 S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect
Milestone

Comments

@richvdh
Copy link
Member

richvdh commented Feb 23, 2017

(where crash = sad folder of segfaultyness)

@richvdh
Copy link
Member Author

richvdh commented Feb 23, 2017

(Chrome 55.0 on linux)

@jfrederickson
Copy link

If it's actually a segfault, that should probably be reported upstream to Chromium.

@richvdh
Copy link
Member Author

richvdh commented Feb 23, 2017

Attempting to send a rageshake is particularly likely to cause a crash, ironically. strace says the chrome process gets a sigill.

@richvdh
Copy link
Member Author

richvdh commented Feb 23, 2017

Seems to be something to do with memory usage. #3311 may help.

@ara4n
Copy link
Member

ara4n commented Feb 27, 2017

mine crashes on pretty much every time i unsleep my comp - it's like #2951 but worse. unclear if it's the rageshaking or the new sync stuff leaking.

@ara4n ara4n added T-Defect S-Critical Prevents work, causes data loss and/or has no workaround P1 X-Release-Blocker labels Feb 27, 2017
@richvdh
Copy link
Member Author

richvdh commented Mar 14, 2017

ftr #3311 helped but didn't solve this

@ara4n
Copy link
Member

ara4n commented Mar 16, 2017

I briefly profiled it yesterday, and saw 110K RoomMembers being allocated in the space of 20 minutes. As Chrome Debug Tools has no API, it's hard to do a stats analysis to see where these are coming from; kegan suggested we may be allocating new RoomMember objects on each and every event we're receiving.

@kegsay
Copy link
Contributor

kegsay commented Mar 16, 2017

I'm operating under the assumption this is an OOM.

kegan suggested we may be allocating new RoomMember objects on each and every event we're receiving

Tested locally and I see no evidence of this (both on sent messages and received messages).


Did a heap snapshot of my Riot. I have a 435MB heap, of which:

  • 254MB (58%) are retained by 209 Room objects. Of this 254MB:
    • 230MB (90%) are retained by 420 RoomState objects (all these objects are also retained by Room objects).
  • 41MB (9.4%) are retained by the SyncAccumulator.
  • 26MB (6%) are retained by User objects.
  • 114MB (26%) are retained by other junk.

The low-hanging fruit superficially looks to be Room objects, so I'm going to see exactly how this breaks down. STAY TUNED FOR MORE UPDATES 🎶

@kegsay
Copy link
Contributor

kegsay commented Mar 16, 2017

So the Room object distribution is unsurprisingly not linear: there are a few huge rooms and most are small (all sizes in this post are retained sizes, NOT shallow sizes):

  • #go-nuts (49MB)
  • #rust (49MB)
  • Matrix HQ (44MB)
  • #ipfs (25MB)
  • Riot (17MB)
  • #rust-internals (10MB)
  • Riot Android (6MB)
  • and a long tail from 5MB down to 25KB

Note how this set of channels doesn't include high traffic rooms like #matrix-dev/Core/Internal, it seems to be correlated with member count more than anything else. Looking at #go-nuts (49MB):

  • 30MB (61%) Room.currentState
  • 18MB (37%) Room.oldState
  • 1MB (2%) other

So making RoomState use less memory looks to be the easiest win here. The currentState broken down (30MB):

  • 15.33MB (51%) RoomState.members - 400KB of which are simply the user ID keys.
  • 2.4MB (8%) RoomState._sentinels
  • 2.2MB (7.3%) RoomState._displayNameToUserIds
  • 400KB (1.3%) RoomState.events
  • 393KB (1.3%) RoomState._userIdsToDisplayNames - NB: The values in this map are all 1 bytes values (internalized undefineds)

Superficially, it appears that V8 does not just store pointers to the same repeated string (room IDs, user IDs), but instead stores the entire string again. This is why the maps appear to be so large. The members map has 10,163 entries, or 15,330KB/10163 = ~1.5KB each. Of this 1.5KB:

  • 1.3KB (87%) _events (from EventEmitter)

Reducing the number of maps we have may go a long way to help here, or somehow coercing V8 to store pointers to strings instead (e.g via String perhaps which is the object wrapper around string literals)?

@kegsay
Copy link
Contributor

kegsay commented Mar 20, 2017

I had an epiphany on the way in today. What happens to the RoomState objects in the case of a limited sync? I thought this because Ara said:

mine crashes on pretty much every time i unsleep my comp

and I thought there was significance to the "unsleep" part. What the JS SDK does is it will preserve where it got up to (room state and all) then create a new copy of the room state for the new data (so each dis-contiguous timeline corresponds to 2 RoomState objects: old and current (either end of the timeline)).

This scales extremely poorly when 1x RoomState object is between 5-20MB and you get ~50 limited responses on unsleeping your laptop... I have now "fixed" this in matrix-org/matrix-js-sdk#395 by not holding onto the old dis-contiguous timelines but instead dropping them to the floor. This has helped dramatically.

@kegsay
Copy link
Contributor

kegsay commented Mar 20, 2017

It's worth noting that the indexeddb stuff may have exacerbated this. The store is synced every 5 minutes. If there is enough traffic in a room (strictly speaking >20 messages within 5 mins: hi2u freenode channels) then on reload of the page it will get a limited sync back for that room, doubling the amount of space it would've taken up had it done a fresh /sync. This may be why we've noticed this now.

@ara4n
Copy link
Member

ara4n commented Mar 27, 2017

Current status: @kegsay fixed the main mem leak here (due to old state hanging around), however @richvdh and @erikjohnston are still seeing random crashes even immediately after boot. They also can't even do a heap snapshot immediately after boot, which sounds like their heap is mysteriously full of something or other. @kegsay hypothesises that this might be a bad extension, as nobody else has this problem apart from rich & erik who are both running an Adblock of some kind. Next step to get erik to disable his adblock and see if everything gets fixed.

@richvdh
Copy link
Member Author

richvdh commented Mar 29, 2017 via email

@ara4n
Copy link
Member

ara4n commented Mar 29, 2017 via email

@kegsay
Copy link
Contributor

kegsay commented Mar 29, 2017

Nothing exciting. He has a 410MB heap on startup (I have 380MB on startup and have no trouble snapshotting). No explosion of RoomState objects either. The only thing which looked odd was some things have references to chrome:// "things" which I hadn't seen before, hence my suspicion with extensions. Perhaps this is actually a dev tools add on instead? I know that React dev tools do things like send stuff over postMessage, so they are fairly intrusive.

@richvdh
Copy link
Member Author

richvdh commented Mar 29, 2017

react dev tools are certainly a memory and cpu sink once they are active, but I rather thought they didn't do much until the react dev tools tab was opened.

I've got a few snapshots from Friday and today, which I can share if it is of interest, but I didn't see a smoking gun.

Whilst extensions don't seem to make the difference between being able to snapshot and not, perhaps they do make the difference between the tab crashing under normal use and not. I have everything disabled for now; will keep using the session for a while and see what occurs.

@ara4n
Copy link
Member

ara4n commented Mar 30, 2017

@richvdh has it gone bang yet? might be worth running in safe mode as well as disabling all extensions?

@richvdh
Copy link
Member Author

richvdh commented Mar 30, 2017

Sorry. Yes, disabling extensions didn't help - it still goes bang after a few hours.

I haven't tried just leaving Riot running, but I also can't take a heap snapshot in safe mode, nor in a chrome instance in a completely new profile, nor under chromium. I was able to take a heap snapshot on @kegsay's machine having logged in with my account, so this appears to be specific to my machine rather than my account.

@lampholder
Copy link
Member

We think the recent webworker-for-indexeddb improvements have resolved this, so closing.

@richvdh
Copy link
Member Author

richvdh commented Oct 1, 2017

that is incredibly weird then. i wonder if a tonne of events came in on other rooms? or device updates? is there anything in the logs at all for that timeframe?

There is very little in the logs at that time tbh.

@ghost
Copy link

ghost commented Oct 3, 2017

I'm experiencing similar behaviour in Chrome Version 61.0.3163.100 (Official Build) (64-bit), on both /app and /develop. No sudden crashes though, just absurdly high memory usage which seems strongly correlated with rooms and the size of them.

Not sure how to provide any useful data for analysis, but if you need me to do anything I'm happy to oblige.

@ara4n
Copy link
Member

ara4n commented Nov 6, 2017

I've been seeing this getting much worse in the last few weeks; every time I unsleep Riot after overnight i get a flurry of unresponsive warnings. I'll take a look.

@ara4n ara4n assigned ara4n and unassigned dbkr Nov 6, 2017
@richvdh
Copy link
Member Author

richvdh commented Nov 6, 2017

what's the logic behind thinking that an unresponsive warning is the same problem as an excess of VmData?

@ara4n
Copy link
Member

ara4n commented Feb 20, 2018

So I did a bunch of memory profiling last week - the summary from https://matrix.to/#/!DdJkzRliezrwpNebLk:matrix.org/$1518745116614283uoTaQ:matrix.org was:

  • copy-on-write currentState from oldState (or vice versa)
  • copy-on-write sentinels (from members)
    • lazy-creating sentinels (almost as good as copy-on-write)
  • be more aggressive on interning (14MB of duplication of the strings "join" and "leave" alone!)
  • generate event.date on demand (and hope that doesn't slow rendering rooms down too much; perhaps we could then cache them)
  • ignoring state and members for users who have left (unless we paginate back far enough to view them). this could perhaps be done on the server to save bandwidth too, modulo having to resurrect them at the right point when you call /messages.
  • Avoiding having multiple of different per-user arrays in each room for things like receipts and displayname disambiguation lookups, as these can get massive. We want to minimise the amount of data we track per user per room.

Filing this here for ease of reference and to track the bits which @dbkr has been attacking.

@dbkr
Copy link
Member

dbkr commented Feb 22, 2018

Data points on how much the above has improved matters:

My test account on Chrome on an ubuntu VM:
before: 870000KB live, 1041000KB total
after: 705000KB live, 801000KB total

(Edit: for comparison, v0.13.5 has 965000KB live and 1200000KB total... so actually higher than both)

Travis: "Memory usage is down to just under 2gb, so about 500-600mb savings" (https://matrix.to/#/!DdJkzRliezrwpNebLk:matrix.org/$15191517123083TkBgH:t2l.io)

Matthew: "dave: my heap is now 775MB (down from 1.2GB when this began :D)" (https://matrix.to/#/!DdJkzRliezrwpNebLk:matrix.org/$1519152400200168zKZpA:matrix.org)

So based on those rough numbers, a saving of somewhere around the 20-25% mark.

@dbkr
Copy link
Member

dbkr commented Feb 22, 2018

I've also done the 'be more aggressive on interning' thing: matrix-org/matrix-js-sdk#615

@ara4n
Copy link
Member

ara4n commented Apr 21, 2018

since dave's work in feb we seem to be just about okay again, so dropping prio

@ara4n ara4n added P2 S-Major Severely degrades major functionality or product features, with no satisfactory workaround and removed S-Critical Prevents work, causes data loss and/or has no workaround P1 X-Release-Blocker labels Apr 21, 2018
@ara4n ara4n changed the title Riot-web makes chrome crash Riot-web is very greedy on RAM (and can make chrome crash) Apr 21, 2018
@richvdh
Copy link
Member Author

richvdh commented Aug 8, 2018

I'm no longer seeing frequent crashes. (It still happens occasionally, but...)

I'm not really sure there's any value in a "Riot uses too much memory" bug - that's kindof a given - unless we want to address a specific cause or set of symptoms.

Is this open because the checklist at #3307 (comment) still has outstanding items? If so, it might be worth moving them to a new bug - I think they are going to get lost among the other noise here.

@ara4n
Copy link
Member

ara4n commented Sep 30, 2018

lazy loading should have effectively fixed this for good. have split out the other two issues for posterity.

@ara4n ara4n closed this as completed Sep 30, 2018
@jryans jryans added the A-Memory Memory leaks, leak hunting tools label Feb 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Memory Memory leaks, leak hunting tools P2 S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants