Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection between dockerized application and azure sql hangs after ef core upgrade to 3.1 #20316

Closed
Arcanst opened this issue Mar 17, 2020 · 18 comments

Comments

@Arcanst
Copy link

Arcanst commented Mar 17, 2020

Our solution consists of almost 40 projects, 6 of them are runnable applications. In the simplest scenario we only need two applications running - webapi and worker which communicate through rabbitmq in a command-commandhandler pattern. We use .netcoreapp3.1 (to which we recently migrated from 2.2) and ef core 3.1 (to which we migrated afterwards). After ef core migration we encountered a strange issue - our worker app just stops executing code at random place during our custom made seeding (and we can't really do anything without seeding so that's as far as we got as it comes to commands). There is no exception, no timeout, the last log we see from worker is always saying EFStatementsLogger.Log : EFInfo=[Executing DbCommand and the sql query showed in the log is never seen in the Azure Data Studio profiler so it never reaches the database. The query at which the application stops changes from seeding to seeding, there is no rule and we've been trying to find the source for 2 weeks now.

We narrowed down the issue to linux + efcore 3.1, let me quickly walk you through the investigation (tests were done repeatedly and the results were consistent):

  • running applications locally without docker and using sql express works - rules out many things but it's kinda 'works for me' so we can't rely on that
  • running applications locally without docker and using azure sql works - rules out azure sql
  • running dockerized applications on azure and using dockerized mssql express as a container instance - doesn't work
  • running applications on linux vm without docker and using azure sql doesn't work - rules out docker

We were able to narrow down the commits that could've caused the issue to six and two of them are pure migrations (we flattened a huge number of migrations into one - twice during that task). 90% changes in these four commits are about ef core's fluent api that wasn't working after ef core upgrade, indexes, includes, custom projections (nothing big - mostly changes like replacing string.Equals(a, b, StringComparison.CurrentCultureIgnoreCase); with a.ToLower() == b.ToLower();).
I could provide you with the diff from these four commits if you'd like but it's going to be a big one.

Our setup (that's been working for over a year now on ef core 2.*):

  • every application is dockerized separately
  • all applications run as dockerized web apps on azure

The issue is quite annoying and made us move back to windows apps on azure (where everything works just as it did before ef core upgrade). Since our solution is huge, there is no possibility to make a small POC project. We went through all breaking changes a few times but didn't notice anything wrong. Could you point us in any direction here? Any hint, any place worth checking? Anything you need will be supplied.

@arjunblg75
Copy link

Team, could you please help on troubleshooting this issue.

@ajcvickers
Copy link
Member

@cheenamalhotra Could this be something with SqlClient? I think there have been some issues with Docker/Linux.

@cheenamalhotra
Copy link
Member

@Arcanst

May I know which linux docker image are you using in your application? I can try to test SqlClient connectivity in that image.

@Arcanst
Copy link
Author

Arcanst commented Mar 18, 2020

Sure!
Build: mcr.microsoft.com/dotnet/core/sdk:3.1-buster
Runtime - more complicated, I basically merged two netcore images into our base one, here is the dockerfile for our base image that's used by our applications:

FROM debian:buster

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        ca-certificates \
        \
# .NET Core dependencies
        libc6 \
        libgcc1 \
        libgssapi-krb5-2 \
        libicu63 \
        libssl1.1 \
        libstdc++6 \
        zlib1g \
    && rm -rf /var/lib/apt/lists/*

# Configure web servers to bind to port 80 when present
ENV ASPNETCORE_URLS=http://+:80 \
    # Enable detection of running in a container
    DOTNET_RUNNING_IN_CONTAINER=true

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        curl \
    && rm -rf /var/lib/apt/lists/*

# Install .NET Core
RUN dotnet_version=3.1.1 \
    && curl -SL --output dotnet.tar.gz https://dotnetcli.azureedge.net/dotnet/Runtime/$dotnet_version/dotnet-runtime-$dotnet_version-linux-x64.tar.gz \
    && dotnet_sha512='991a89ac7b52d3bf6c00359ce94c5a3f7488cd3d9e4663ba0575e1a5d8214c5fcc459e2cb923c369c2cdb789a96f0b1dfb5c5aae1a04df6e7f1f365122072611' \
    && echo "$dotnet_sha512 dotnet.tar.gz" | sha512sum -c - \
    && mkdir -p /usr/share/dotnet \
    && tar -ozxf dotnet.tar.gz -C /usr/share/dotnet \
    && rm dotnet.tar.gz \
    && ln -s /usr/share/dotnet/dotnet /usr/bin/dotnet

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        curl \
    && rm -rf /var/lib/apt/lists/*

# Install ASP.NET Core
RUN aspnetcore_version=3.1.1 \
    && curl -SL --output aspnetcore.tar.gz https://dotnetcli.azureedge.net/dotnet/aspnetcore/Runtime/$aspnetcore_version/aspnetcore-runtime-$aspnetcore_version-linux-x64.tar.gz \
    && aspnetcore_sha512='cc27828cacbc783ef83cc1378078e14ac558aec30726b36c4f154fad0d08ff011e7e1dfc17bc851926ea3b0da9c7d71496af14ee13184bdf503856eca30a89ae' \
    && echo "$aspnetcore_sha512  aspnetcore.tar.gz" | sha512sum -c - \
    && tar -ozxf aspnetcore.tar.gz -C /usr/share/dotnet ./shared/Microsoft.AspNetCore.App \
    && rm aspnetcore.tar.gz

ENV SSH_PASSWD "root:Docker!"
RUN apt-get update \
	&& apt-get install -y --no-install-recommends dialog \
	&& apt-get update \
	&& apt-get install -y --no-install-recommends openssh-server \
	&& echo "$SSH_PASSWD" | chpasswd

COPY ["sshd_config", "/etc/ssh/"]
COPY ["init.sh", "/usr/local/bin"]
RUN chmod u+x /usr/local/bin/init.sh
EXPOSE 8000 2222

Untill installing ssh it's just a copy-paste from 3.1.2-buster-slim that's not available on dockerhub any longer, as far as i see.

@poullundjoergensen
Copy link

poullundjoergensen commented Mar 18, 2020

Just want to add that we also re-produced the issue in a Azure virtual machine made from Ubuntu Server 18.04 LTS

where the following was done

wget -q https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
sudo add-apt-repository universe
sudo apt-get update
sudo apt-get install apt-transport-https
sudo apt-get update
sudo apt-get install dotnet-sdk-3.1
sudo apt-get install aspnetcore-runtime-3.1
sudo apt-get install dotnet-runtime-3.1

Portable version of app deployed with ssh and the the issue was re-produce by running the application with the dotnet command

@cheenamalhotra
Copy link
Member

Can you also share the version of Azure DB/server you're connecting to
[Output of SELECT @@VERSION]

This doesn't seem right as we run CI tests in SqlClient from Ubuntu to Azure all the time!

@cheenamalhotra
Copy link
Member

Can you also verify if you're able to connect to Azure DB using just SqlClient driver from both docker/Ubuntu VM?

You can use this app for validation:
TestLinuxDocker.zip

@Arcanst
Copy link
Author

Arcanst commented Mar 18, 2020

SELECT @@VERSION gives Microsoft SQL Azure (RTM) - 12.0.2000.8 Feb 14 2020 18:30:14 Copyright (C) 2019 Microsoft Corporation

I can't check that app right now - will do it tomorrow and come back with results.

Let me jut specify one thing - it's not like we don't have the connection - we are able to truncate entire database from the code, then recreate it using migrations (we do it programatically). When seeding starts, worker always creates some objects in database before it hangs.

We even thought the issue could be caused by too large transaction because at first we had entire seeding (that lasted like 2 minutes locally) implemented as one huge db transaction; but after making the seeding create a separate transaction for each object inserted into db, the problem wasn't solved.

I'm thinking if it was possible to somehow use ef core's source for our application (it would be much easier if the issue was reproducable on developer's desk) to actually add more logs and see where exactly it hangs (apparently it's not a blocking call because our applications still send heartbeats to rabbitmq and so on).

@cheenamalhotra
Copy link
Member

Thanks @Arcanst I think this would then fall back to @ajcvickers for EF side of investigations first.

@ajcvickers

As @Arcanst mentioned, this doesn't look like connection problem but in a particular flow with EF Core APIs. I think you can take over from here to reproduce the problem and if it turns out to be with one of SqlClient API flows, please let us know with a repro. :)

Best Regards,
Cheena

@poullundjoergensen
Copy link

@ajcvickers We could provide a repro project that would be a stripped down version of our application solution. Still it would be quite big and we would have to put a significant effort into doing this so before we do so we would prefer to have confirmation from you that it actual would be useful/necessary for you to identify root cause.

@ajcvickers
Copy link
Member

@Arcanst

I'm thinking if it was possible to somehow use ef core's source for our application

EF Core can be built from source easily--see https://github.com/dotnet/efcore/blob/master/docs/getting-and-building-the-code.md

You can build NuGet packages locally with build -pack. However, note that the NuGet package versions don't change between builds, which means you'll have to flush NuGet package caches (it's in the NuGet settings in VS) each time you rebuild the packages.

confirmation from you that it actual would be useful/necessary for you to identify root cause

I don't have any ideas as to what is going on here, so I can't be certain that we will be able identify the root cause even if we can reproduce the issue. That being said, it certainly seems unlikely that we will be able to root cause this without being able to reproduce it.

If you don't want to post the code publicly, then feel free to send it to avickers at microsoft.com.

@AndriySvyryd @roji Any ideas here?

@AndriySvyryd
Copy link
Member

What version of Microsoft.Data.SqlClient are you using? If it's not 1.1.1 could you upgrade it and see whether that makes any difference?

@Arcanst
Copy link
Author

Arcanst commented Mar 19, 2020

Microsoft.Data.SqlClient, Version=1.0.19269.1 - we'll try upgrading it and let you know as soon as possible, thanks

@poullundjoergensen
Copy link

Upgrading to to Microsoft.Data.SqlClient to 1.1.1 solved the issue which now can be closed - thanks

@ErikEJ
Copy link
Contributor

ErikEJ commented Mar 20, 2020

Please reconsider taking M.D.S. 3.1.1 in a 3.1 patch release.

@ajcvickers ajcvickers reopened this Mar 20, 2020
@ajcvickers
Copy link
Member

@ErikEJ We discussed it before, but this is certainly another data point.

@ajcvickers
Copy link
Member

@ErikEJ We're going to do this and take it for approval.

@bricelam I'll probably be able to create a PR for this next week, but feel free to do it if you get time.

@ajcvickers
Copy link
Member

Filed #20378 to track updating the dependency.

@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants