Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instance statistics table #656

Merged
merged 1 commit into from
Jul 19, 2023
Merged

Add instance statistics table #656

merged 1 commit into from
Jul 19, 2023

Conversation

skoeva
Copy link
Contributor

@skoeva skoeva commented Jun 27, 2023

A by-product of the Omaha update protocol employed by Nebraska delivers statistics on version spread, but accumulating and presenting this data suffers from a number of limitations:

  • The chart only represents all nodes’ current OS versions. We currently do not have historic data of the nodes’ previous software versions, and we would like to track of this data.
  • While historic data can be calculated from update events, this is computationally intense and unfeasible in practice. We currently cannot calculate or store this historic data efficiently, and we would like to be able to have quick and easy access to this data.
  • To be able to back-fill the live Nebraska server using historic data available in CSV format, we need to use a third-party tool. We currently cannot accomplish this using our existing setup, and we would like to implement a mechanism to do this natively.

We implement the instance_stats table (based on instance fields from instance_application) to store data on current version spread in an efficient format optimized for querying. To accomplish this, there are five data points that we associate each instance with, serving as the main fields of this table:

  • Timestamp: Necessary for keeping track of when update checks are performed
  • Channel name: Necessary to identify an individual instance
  • Architecture: Necessary to identify an individual instance
  • Version: Necessary to identify an individual instance
  • Instance count: Necessary for grouping instances by the previous four fields

New entries to the table will be generated using a background job scheduled periodically at a specified interval.

@skoeva skoeva requested a review from pothos June 27, 2023 13:56
@pothos
Copy link
Member

pothos commented Jun 28, 2023

The CI also has this error Error: pkg/api/instances.go:668:28: cannot use query (variable of type *goqu.SelectDataset) as type string in argument to api.db.Query

backend/Makefile Outdated Show resolved Hide resolved
@skoeva skoeva self-assigned this Jul 18, 2023
@skoeva skoeva removed their assignment Jul 19, 2023
@skoeva skoeva force-pushed the skoeva/db_vs branch 3 times, most recently from 6e38eb4 to 06979d5 Compare July 19, 2023 17:45
@skoeva skoeva marked this pull request as ready for review July 19, 2023 18:06
A by-product of the Omaha update protocol employed by Nebraska delivers statistics on
version spread, but accumulating and presenting this data suffers from a number of
limitations:

- The chart only represents all nodes’ current OS versions. We currently do not have
  historic data of the nodes’ previous software versions, and we would like to track of
  this data.
- While historic data can be calculated from update events, this is computationally
  intense and unfeasible in practice. We currently cannot calculate or store this
  historic data efficiently, and we would like to be able to have quick and easy access
  to this data.
- To be able to back-fill the live Nebraska server using historic data available in CSV
  format, we need to use a third-party tool. We currently cannot accomplish this using
  our existing setup, and we would like to implement a mechanism to do this natively.

We implement the `instance_stats` table (based on instance fields from
`instance_application`) to store data on current version spread in an efficient format
optimized for querying. To accomplish this, there are five data points that we
associate each instance with, serving as the main fields of this table:

- Timestamp: Necessary for keeping track of when update checks are performed
- Channel name: Necessary to identify an individual instance
- Architecture: Necessary to identify an individual instance
- Version: Necessary to identify an individual instance
- Instance count: Necessary for grouping instances by the previous four fields

New entries to the table will be generated using a background job scheduled
periodically at a specified interval.
@skoeva skoeva marked this pull request as draft July 19, 2023 19:24
@skoeva skoeva force-pushed the skoeva/db_vs branch 3 times, most recently from e174a09 to d7a1836 Compare July 19, 2023 19:44
@pothos pothos marked this pull request as ready for review July 19, 2023 19:50
@pothos pothos merged commit 25d24f1 into main Jul 19, 2023
2 checks passed
@pothos pothos deleted the skoeva/db_vs branch July 19, 2023 19:56
skoeva added a commit that referenced this pull request Jul 31, 2023
Following #656 where we add the `instance_stats` table, we now implement a background task that populates this table every hour, with the purpose of committing a snapshot of live instance counts to the new database table. This task, using a Go ticker, is started at service startup and runs for the whole lifetime of the Nebraska server until it is stopped.
skoeva added a commit that referenced this pull request Aug 2, 2023
Following #656 where we add the `instance_stats` table, we now implement a background task that populates this table every hour, with the purpose of committing a snapshot of live instance counts to the new database table. This task, using a Go ticker, is started at service startup and runs for the whole lifetime of the Nebraska server until it is stopped.
@skoeva skoeva mentioned this pull request Aug 9, 2023
skoeva added a commit that referenced this pull request Aug 11, 2023
Pull request #656 adds the necessary instance_stats database table as well as functions for querying instance counts (by channel, version, and architecture) from the groups, instance, and instance_application tables and storing the results in the new table.

To make this data accessible outside Nebraska, we create the following:

- Prometheus metrics endpoint: A new HTTP endpoint (instance-metrics/prometheus) that serves only the latest snapshot from the instance_stats table. As Prometheus is a time-series database, this serves instance counts with the latest timestamp only.
- JSON metrics data endpoint: A new HTTP endpoint that emits all instance_stats data in JSON format. The difference here is that we query for all data, and we emit one JSON document per row.
skoeva added a commit that referenced this pull request Aug 14, 2023
Pull request #656 adds the necessary instance_stats database table as well as functions for querying instance counts (by channel, version, and architecture) from the groups, instance, and instance_application tables and storing the results in the new table.

To make this data accessible outside Nebraska, we create the following:

- Prometheus metrics endpoint: A new HTTP endpoint (instance-metrics/prometheus) that serves only the latest snapshot from the instance_stats table. As Prometheus is a time-series database, this serves instance counts with the latest timestamp only.
- JSON metrics data endpoint: A new HTTP endpoint that emits all instance_stats data in JSON format. The difference here is that we query for all data, and we emit one JSON document per row.
skoeva added a commit that referenced this pull request Aug 14, 2023
Pull request #656 adds the necessary instance_stats database table as well as functions for querying instance counts (by channel, version, and architecture) from the groups, instance, and instance_application tables and storing the results in the new table.

To make this data accessible outside Nebraska, we create the following:

- Prometheus metrics endpoint: A new HTTP endpoint (instance-metrics/prometheus) that serves only the latest snapshot from the instance_stats table. As Prometheus is a time-series database, this serves instance counts with the latest timestamp only.
- JSON metrics data endpoint: A new HTTP endpoint that emits all instance_stats data in JSON format. The difference here is that we query for all data, and we emit one JSON document per row.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants