-
-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IDEA] - Add a health metrics if the node-socket is reachable at all #154
Comments
Good point. It's also unfortunate that the network synchronization is only updated on every new tip, while simple, it means that the value is only refreshed when the connection is up. Perhaps having a background thread to create artificial ticks would be better here. |
Would be possible to set |
what about implementing it in the docker images of ogmios as healthcheck.sh script? Currently neither curl nor jq are installed on the docker image. |
@redoracle -> implementing what exactly in the docker image 🤔 ? |
I meant implementing the healthcheck.sh script as usual docker images do in order to verify the container is running properly otherwise the healthcheck script will trigger the container restart. by using this command : curl -s http://127.0.0.1:1337/health | jq Alternatively I can create one and map it inside the container, but at least I need preinstalled: curl and jq, in order to make it work. attached here an example of a container with health-check and one without. |
Seems like this can work nicely with just HEALTHCHECK --interval=10s --timeout=5s --retries=1 CMD \
[ connected == $(wget http://localhost:1337 | sed 's/.*"connectionStatus":"\([a-z]\+\)".*/\1/') ] |
Note: I've started re-working the docker images recently to avoid having to maintain two build systems. The new images are based on the Nix build and make heavy use of the caching: # This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
# #
# ------------------------------- SETUP ------------------------------------- #
# #
FROM nixos/nix:2.3.11 as build
RUN echo "substituters = https://cache.nixos.org https://hydra.iohk.io" >> /etc/nix/nix.conf &&\
echo "trusted-public-keys = cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY= hydra.iohk.io:f/Ea+s+dFdN+3Y/G+FDgSq+a5NEWhJGzdjvKNGv0/EQ=" >> /etc/nix/nix.conf
WORKDIR /app
RUN nix-shell -p git --command "git clone --depth 1 https://github.com/input-output-hk/cardano-configurations.git"
WORKDIR /app/ogmios
RUN nix-env -iA cachix -f https://cachix.org/api/v1/install && cachix use cardano-ogmios
COPY . .
RUN nix-build -A ogmios.components.exes.ogmios -o dist
RUN cp -r dist/* . && chmod +w dist/bin && chmod +x dist/bin/ogmios
# #
# --------------------------- BUILD (ogmios) --------------------------------- #
# #
FROM busybox as ogmios
ARG NETWORK=mainnet
LABEL name=ogmios
LABEL description="A JSON WebSocket bridge for cardano-node."
COPY --from=build /app/ogmios/bin/ogmios /bin/ogmios
COPY --from=build /app/cardano-configurations/network/${NETWORK} /config
EXPOSE 1337/tcp
STOPSIGNAL SIGINT
HEALTHCHECK --interval=10s --timeout=5s --retries=1 CMD \
[ connected == $(wget http://localhost:1337 | sed 's/.*"connectionStatus":"\([a-z]\+\)".*/\1/') ]
ENTRYPOINT ["/bin/ogmios"]
# #
# --------------------- RUN (cardano-node & ogmios) -------------------------- #
# #
FROM inputoutput/cardano-node:1.31.0 as cardano-node-ogmios
ARG NETWORK=mainnet
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
LABEL name=cardano-node-ogmios
LABEL description="A JSON WebSocket bridge for cardano-node w/ a cardano-node."
COPY --from=build /app/ogmios/bin/ogmios /bin/ogmios
COPY --from=build /app/cardano-configurations/network/${NETWORK} /config
RUN mkdir -p /ipc
WORKDIR /root
COPY scripts/cardano-node-ogmios.sh cardano-node-ogmios.sh
# Ogmios, cardano-node, ekg, prometheus
EXPOSE 1337/tcp 3000/tcp 12788/tcp 12798/tcp
STOPSIGNAL SIGINT
HEALTHCHECK --interval=10s --timeout=5s --retries=1 CMD \
[ connected == $(wget http://localhost:1337 | sed 's/.*"connectionStatus":"\([a-z]\+\)".*/\1/') ]
CMD ["bash", "cardano-node-ogmios.sh" ] Still work-in-progress however as the |
that
that's nice too, but still wget is missing as preinstalled package. while sed is there. |
Do you really need to expose all those ports if only used internally? BTW very good point migrating to nix, I like it very much. |
root@973ea926352e:/# wget http://localhost:1337 | sed 's/."connectionStatus":"([a-z]+)"./\1/' index.html.20 [ <=> ] 7.63K --.-KB/s in 0s 2022-01-02 12:55:28 (1.01 GB/s) - 'index.html.20' saved [7811] not sure wget does the same of curl... or am I missing some other option? the following returns the value that tell's us that ogmio is in sync, right? |
Even on the new images with Nix, that is, on top of BusyBox? I thought wget was available in BusyBox ... 🤔
Those aren't internal though. except maybe 3000/tcp. ekg and prometheus are used for metrics, and ogmios is used for local clients.
Ah! My mistake... We need to hit the health endpoint here! So |
ok, but wget keeps saving the file not printing it, therefore I need an additional step to retrive the particular metric which says that the node is connected and in sync from the saved file. right? |
what about this? |
for now I got it working with an healthchek.sh mapped inside the container as follow: if ! command -v wget; result=$(wget -qO- http://localhost:1337/health | sed 's/.*"connectionStatus":"//g' | sed 's/connected"}/0/g') if [ $result != 0 ]; then exit 1; fi I guess with the NIX version it wouldn't work though :) |
I figured that a nicer way to do all this would be to have a proper health-check command in Ogmios to begin with, so I implemented: $ ogmios health-check --help
Handy command to check whether an Ogmios server is up-and-running, and correctly connected to a Network / cardano-node.
This can, for example, be wired to Docker's HEALTHCHECK feature easily.
Usage: ogmios health-check [--port TCP/PORT]
Performs a health check against a running server.
Available options:
-h,--help Show this help text
--port TCP/PORT Port to listen on. (default: 1337) (see 62691fb) It exits with 0 or 1, depending on whether it could perform a health check on a running server. Dead-simple to configure the HEALTHCHECK in the Dockerfile with that: HEALTHCHECK --interval=10s --timeout=5s --retries=1 CMD /bin/ogmios health-check |
That's very thoughtful and very nice!! Well done! Tnx |
Describe your idea, in simple words.
Running for example node 1.33.0 in P2P mode with
"DiffusionMode": "InitiatorOnly",
in the config will not create a local listening port anymore. So we can't use cardanoPing/cncli to check if the node is alive.
If such a node stops to work or was shutdown, there is currently no flag for that in the
ogmios health check:
curl -s http://127.0.0.1:1337/health | jq
Thats a sample output after the node was shut down.
So using the health metrics, there is only one way currently to see if the node is really ok by comparing the
lastKnownTip
with the theoretical calculated one from the genesis files and do a threshold if it falls too far behind.The Error-Log is showing a warning like:
"networkSynchronization": 1,
also stays on1(=100%)
.Why is it a good idea?
It would be nice to have a flag that can show if the current connection to the node via the node socket is ok or not. We get error outputs in the logs, but not on the health check here.
The text was updated successfully, but these errors were encountered: