Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Seeding binary data + migration causes issues (memory + duration) #21065

Closed
ntziolis opened this issue May 28, 2020 · 9 comments
Closed

Comments

@ntziolis
Copy link

We have a report template table that holds binary data which contains the templates. We included this binary data in our seed HasData. The total size of all templates is < 2 MByte (so really not that much). We had no issues while using drop / create so far. Recently we have moved to use migrations which is when we ran into the following issues:

  • Huuuuge memory consumption
    • On my personal machine it consumed 8GB started + swapping to disk
  • Very long migration creation time
    • 10-20x compared to when out commenting the seed of binary data (5-10min depending on machine)
  • The issue does not go away and impacts any subsequent migration creation, heavily impacting the long term development experience

Steps to reproduce

  • Create an project with an EF Context
  • Create an entity class with a byte[] column
  • Use HasData to seed the entity with data
  • Create initial migration (this already takes long + lots of memory)
  • Create second migration right after (this is where time length for creation + memory usage explodes)
  • This goes also for any migration created after the second
  • This means this is a

If needed I will be providing a repo, but thought to let you guys know right away.

Further technical details

EF Core version: 3.1.3
Database provider: Microsoft.EntityFrameworkCore.SqlServer
Target framework: 3.1
Operating system: Windows
IDE: VS 2019, but should not matter as dotnet cli with ef extensions were used to create migrations

Workaround
If anybody runs into this while its not fixed here is the workaround that worked for us:

  • We only seed this table during initial creation, no updates down the road planned (yet)
  • We have removed seeding binary data from HasData calls
  • Created the inital migration without seeding the binary columns
  • We then manually edited the generated migration to include the binary data
  • This way performance / memory usage stay within the usual limits during migration creation
  • Any further migration are also not impacted by our manual edit
@ntziolis ntziolis changed the title Seeding binary data + migration causes issues (memory + duration) Bug: Seeding binary data + migration causes issues (memory + duration) May 28, 2020
@ajcvickers
Copy link
Member

@ntziolis After discussion with the team, we think using seed data for this is probably not the way to, since it's not designed for large amounts of binary data. Your workaround seems okay for now. I'm putting this on the backlog to consider making binary data more efficient here.

@ntziolis
Copy link
Author

ntziolis commented Jun 2, 2020

Understood and totally ok with using the workaround path. In fact we would be totally ok with ef core not supporting binary data seeding at all.

Just want to call out the following again:

  • We are talking about 1.4 Megabytes of data in total already causing this, so rather small amounts when talking binary data
  • Meaning likely anybody that does any binary data seeding will run into this issue
  • We had a fairly complex model when switching to EF migrations from drop / create and first suspected certain cycle relationships causing issues. So it took us quite a while to pinpoint it to seeding binary data.

To let others not suffer the same fate I would suggest:

  • Update the documentation with a warning that seeding binary data while generally possible is not recommended.
  • (Even Better) ef core could spit out a warning when trying to generate migrations that contain a column with binary data value, that seeding binary columns is not recommended?

@roji
Copy link
Member

roji commented Jun 2, 2020

Opened dotnet/EntityFramework.Docs#2416 to track updating the docs.

@andrejohansson
Copy link

Coming from #23118, @roji recommended me to go with custom initialization logic as per the documentation this is fine and this is something I'm trying to do now.

There is one case I am unsure of how to handle though, and thats circular references. When using the HasData methods I could go with creating anonymous objects and manually add Id columns to resolve this. But how would I go about it in custom initialization logic where I don´t have access to the HasData method with the object overload?

@roji
Copy link
Member

roji commented Nov 3, 2020

@andrejohansson one common way is to just set the IDs yourself before saving your entities.

@andrejohansson
Copy link

@roji won´t that get me a constraint exception when saving the first entity since the second one is not saved yet (chicken and egg)? I'll try...

@ajcvickers
Copy link
Member

@andrejohansson Typically you would use navigation properties to define relationships, as is normal with EF. If you want to use FK values explicitly, then you can, and as long as the FKs are mapped (as is normal), then EF will order the updates appropriately.

@roji
Copy link
Member

roji commented Nov 3, 2020

Depending on the specific database type (and how you start your transactions), it may be possible to defer the constraint checking until the transaction is committed. Or if the table(s) are only in use by the seeding logic at that point, you can temporarily turn off constraints before seeding and reinstate afterwards.

Or you can do whatever was working for you when seeding - if you were adding a new column to the table (including the constraint), that should work without seeding as well...

@AndriySvyryd
Copy link
Member

Duplicate of #19710

@AndriySvyryd AndriySvyryd marked this as a duplicate of #19710 Oct 10, 2023
@AndriySvyryd AndriySvyryd closed this as not planned Won't fix, can't repro, duplicate, stale Oct 10, 2023
@ajcvickers ajcvickers removed this from the Backlog milestone Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants