Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Datadog Incident Management plugin support #46271

Merged
merged 19 commits into from
Sep 18, 2024

Conversation

bernardjkim
Copy link
Contributor

@bernardjkim bernardjkim commented Sep 5, 2024

This PR adds support for self-hosted Access Request Datadog plugin.

I'll create a separate PR for a few todo items that are also required to support the plugin.
Todo:

changelog: Add support for Access Request Datadog plugin.

@zmb3
Copy link
Collaborator

zmb3 commented Sep 5, 2024

I woudl prefer that we named this plugin something like DatadogIncidentManagement as this is a separate product (similar to PagerDuty), not Datadog's traditional monitoring solution (which there is also a Teleport integration for)

@bernardjkim bernardjkim changed the title Add datadog plugin support Add Datadog Incident Management plugin support Sep 5, 2024
@public-teleport-github-review-bot

@bernardjkim - this PR will require admin approval to merge due to its size. Consider breaking it up into a series smaller changes.

@github-actions github-actions bot added size/xl tctl tctl - Teleport admin tool ui labels Sep 6, 2024
@tigrato
Copy link
Contributor

tigrato commented Sep 9, 2024

Is it possible to split this PR into smaller PRs?

@bernardjkim
Copy link
Contributor Author

Yeah, I'll try to split the PR even more. But I think a large chunk of it is just grpc and test code.

Copy link
Contributor

@marcoandredinis marcoandredinis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most things look good
I want to do a real test before giving the final approval

integrations/access/datadog/README.md Outdated Show resolved Hide resolved
integrations/access/datadog/README.md Outdated Show resolved Hide resolved
integrations/access/datadog/config.go Outdated Show resolved Hide resolved
},
}
var result IncidentsBody
_, err := d.client.NewRequest().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's this
https://github.com/DataDog/datadog-api-client-go

Should we use it instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering using the client lib, but I wasn't sure if we wanted to add the additional dependency. It doesn't looks like we're using the client libs for the other plugins either.

The datadog api usage is pretty limited, at least for now. Do you think it is worth adding the dependency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historical context: for azure/msteams we did not use the official API because the SDK depends on the entire universe and it took 30 min + to build the plugin (I tried doing it but gave up).

For other implementations this was done by the martians and they likely walked the easiest/quickest pah.

I don't have an opinion on whether we should use or not their go client. I think the most dangerous things with building our own client are the authentication flow and the failure/retry logic. If those are trivial and the go client adds many deps, rolling our own client makes sense.

In this case maybe drop a comment saying why we decided to do it ourselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave a todo to migrate to the official datadog package if the need arises.

integrations/access/datadog/cmd/teleport-datadog/main.go Outdated Show resolved Hide resolved
integrations/access/datadog/client.go Outdated Show resolved Hide resolved
integrations/access/datadog/client.go Outdated Show resolved Hide resolved
- Add PluginShutdownTimeout const
- Support api endpoint configuration
- Add additional godocs/comments
Copy link
Contributor

@marcoandredinis marcoandredinis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a small test and things look good 👍
image
image

@bernardjkim
Copy link
Contributor Author

Hey @tigrato @hugoShaka, are there any other major concerns with this PR? I will be make a couple follow up PRs, so we can always catch some minor issues then.

@hugoShaka
Copy link
Contributor

Yeah, the Enterprise tests are not passing and I'm trying to figure out why. (They are not run in the CI because the plugin is in Teleport OSS)

@bernardjkim bernardjkim added this pull request to the merge queue Sep 18, 2024
Merged via the queue into master with commit 9a0f8d6 Sep 18, 2024
42 checks passed
@bernardjkim bernardjkim deleted the bernard/datadog-plugin-impl branch September 18, 2024 21:48
@public-teleport-github-review-bot

@bernardjkim See the table below for backport results.

Branch Result
branch/v16 Failed

bernardjkim added a commit that referenced this pull request Sep 18, 2024
* Implement datadog plugin

* Add unit tests

* Add fallback recipient config

* Rename to Datadog Incident Management

* Update tests

* Datadog Incident Management

* Update tctl resource plugin command

* Typos

* Lint

* Exclude api changes for now

* Set channel size

* Address feedback

- Add PluginShutdownTimeout const
- Support api endpoint configuration
- Add additional godocs/comments

* Comment about datadog client package

* Document Datadog API types

* Only post resolution message when the AR is resolved

* Fix lint

* Unused function

---------

Co-authored-by: hugoShaka <hugo.hervieux@goteleport.com>
github-merge-queue bot pushed a commit that referenced this pull request Sep 19, 2024
* Implement datadog plugin

* Add unit tests

* Add fallback recipient config

* Rename to Datadog Incident Management

* Update tests

* Datadog Incident Management

* Update tctl resource plugin command

* Typos

* Lint

* Exclude api changes for now

* Set channel size

* Address feedback

- Add PluginShutdownTimeout const
- Support api endpoint configuration
- Add additional godocs/comments

* Comment about datadog client package

* Document Datadog API types

* Only post resolution message when the AR is resolved

* Fix lint

* Unused function

---------

Co-authored-by: hugoShaka <hugo.hervieux@goteleport.com>
mvbrock pushed a commit that referenced this pull request Sep 19, 2024
* Implement datadog plugin

* Add unit tests

* Add fallback recipient config

* Rename to Datadog Incident Management

* Update tests

* Datadog Incident Management

* Update tctl resource plugin command

* Typos

* Lint

* Exclude api changes for now

* Set channel size

* Address feedback

- Add PluginShutdownTimeout const
- Support api endpoint configuration
- Add additional godocs/comments

* Comment about datadog client package

* Document Datadog API types

* Only post resolution message when the AR is resolved

* Fix lint

* Unused function

---------

Co-authored-by: hugoShaka <hugo.hervieux@goteleport.com>
github-merge-queue bot pushed a commit that referenced this pull request Sep 20, 2024
* Displaying mode and controls to additional participants

* Moving SessionControlsInfoBroadcast over to kube/proxy

* Transitioning to consistent proxy-emitted mode+controls

* Moving message broadcast so new participant wont see it

* Possible unit test fix (cant seem to test locally)

* Fixed unit test

* Adding a line break before messaging the participant

* Linter errors

* Emitting audit event and controls message for additional parties, i.e. not the session initiator

* Revert "Emitting audit event and controls message for additional parties, i.e. not the session initiator"

This reverts commit b66ad27.

* Add User Tasks resource - protos (#46059)

* Add User Integration Tasks resource - protos

* add account id

* move state to task instead of instance

* rename from user integration task to user task

* add instance id

* Add notice to web UI that users arent equal to MAU (#46686)

This adds a dismissible notice to the Users page for usage based billing
users that notifies them that the user count here isn't an accurate
reflection of MAU

* Clarify TLS requirements in the Jira guide (#46484)

Closes #45654

- Indicate that certificates for the Jira web server cannot be self
  signed.
- Remove references to Caddy and a `Certificate` resource, which were
  left over from an attempted change to this guide that was not fully
  completed.

* Remove TXT record validation of custom DNS zones in VNet (#46709)

* Remove TXT record validation from custom DNS zones

* Remove mentions of TXT records from docs

* Outline in the RFD why domain verification was dropped

* Update rfd/0163-vnet.md

Co-authored-by: Nic Klaassen <nic@goteleport.com>

---------

Co-authored-by: Nic Klaassen <nic@goteleport.com>

* docs: mention the --days flag when executing an audit log query (#45764)

* Update access-monitoring.mdx

Include the default date range in the CLI example. This range is otherwise unclear and is hidden in the tctl audit help menu.

* Update access-monitoring.mdx

* Update docs/pages/admin-guides/access-controls/access-monitoring.mdx

Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com>

---------

Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com>

* Remove access-graph path resolution, proxy `/enterprise` requests (#46541)

* Remove `access-graph` path from tsconfig

* Proxy /enterprise requested in dev

* update e ref (#46726)

* fix: tolerate mismatched key PEM headers (#46725)

* fix: tolerate mismatched key PEM headers

Issue #43381 introduced a regression where we now fail to parse PKCS8
encoded RSA private keys within an "RSA PRIVATE KEY" PEM block in
some cases.
This format is somewhat non-standard, usually PKCS8 data should be in a
"PRIVATE KEY" PEM block. However, certain versions of OpenSSL and
possibly even Teleport in specific cases have generated private keys in
this format.

This commit updates ParsePrivateKey and ParsePublicKey to be more
tolerant of PKCS8, PKCS1, or PKIX key data no matter which PEM header is
used.

Fixes #46710

changelog: fixed regression in private key parser to handle mismatched PEM headers

* fix typo in comment

Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>

---------

Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>

* Use dynamic base path for favicon images (#46719)

* Add Datadog Incident Management plugin support (#46271)

* Implement datadog plugin

* Add unit tests

* Add fallback recipient config

* Rename to Datadog Incident Management

* Update tests

* Datadog Incident Management

* Update tctl resource plugin command

* Typos

* Lint

* Exclude api changes for now

* Set channel size

* Address feedback

- Add PluginShutdownTimeout const
- Support api endpoint configuration
- Add additional godocs/comments

* Comment about datadog client package

* Document Datadog API types

* Only post resolution message when the AR is resolved

* Fix lint

* Unused function

---------

Co-authored-by: hugoShaka <hugo.hervieux@goteleport.com>

* Add AutoUpdate Client/Cache implementation (#46661)

* Add AutoUpdate Client/Cache implementation

* CR changes

* Add permission for proxy to access resources

* Rename all occurrences auto update to camelcase

* Remove auto update client wrapper

* Drop AutoUpdateServiceClient helper
Rename comments for consistency

* User Tasks: services and clients implementation (#46131)

This PR adds the implementation for the User Tasks:
- services (backend+cache)
- clients (API + tctl)
- light validation to set up the path for later PRs

* expanding testplan for host user creation (#46729)

* Fix operator docs reference generator bug (#46732)

In the reference page for one Kubernetes operator resource, some
Markdown links are malformed.

The issue is that some fields of custom resource definitions used by the
operator consist of arrays of anonymous objects with fields that are
also objects. When creating docs based on these fields, the operator
resource docs generator creates a malformed link reference.

This change modifies the generator to replace any spaces with hyphens
before outputting link references, causing the resulting internal links
to work correctly.

This change also does some light refactoring to remove an unnecessary
`switch` statement.

* [auto] Update AMI IDs for 16.4.0 (#46746)

Co-authored-by: GitHub <noreply@github.com>

* Remove deprecated HTTP RemoteCluster endpoints (#46756)

* Remove deprecated HTTP RemoteCluster endpoints

* Remove redundant test

* Add `tbot` helm chart to `version.mk` (#46763)

* Remove LockConfiguration.LockName (#46772)

Cleans up the deprecated config option now that
gravitational/teleport.e#5034 has been
merged.

* adding a reference to  to the host user guide (#46765)

* Replace more Logrus usage with Slog (#46757)

* Remove logrus from lib/auth/machineid

* Switch authclient.Config.Log and TunnelAuthDialerConfig.Log to Slog

* Add *slog.Logger to auth.Server

* Remove logrus usage in `lib/auth/access.go`

* Replace logrus with slog in lib/auth/accountrecovery.go

* Replace logrus with slog in `lib/auth/apiserver.go`

* Add missing logger to auth.Server

* Fix test

* Update AWS roles ARNs displayed on `tsh app login` for AWS console apps (#44983)

* feat(tsh): list aws console logins from server

* chore(services): remove unified resources change

This is being covered on another PR.

* test(tsh): solve TestAzure flakiness by waiting using app servers are ready

* fix(tsh): apps with logins were fallingback into using aws arns

* refactor(client): use GetEnrichedResources

* chore(client): rename function

* refactor(tsh): directly resource lisiting for apps and reuse cluster client

* chore(client): reset client changes

* refactor(tsh): reuse cluster client for fetching allowed logins

* chore(tsh): remove unused function param

* refactor(tsh): update getApp retry with login

* refactor(tsh): use a single function to grab profile and cluste client

* refactor(tsh): perform retry with login at caller site

* fix(tsh): close auth client

* test(tsh): fix test failing due to login misconfiguration

* test(tsh): fix lint errors

* test(tsh): remove unused imports

* bulk audit event export api (#46399)

* Reverting back to using the emitSessionJoin boolean

* Nits and removing a debug log

---------

Co-authored-by: Marco Dinis <marco.dinis@goteleport.com>
Co-authored-by: Michael <michael.myers@goteleport.com>
Co-authored-by: Paul Gottschling <paul.gottschling@goteleport.com>
Co-authored-by: Rafał Cieślak <rafal.cieslak@goteleport.com>
Co-authored-by: Nic Klaassen <nic@goteleport.com>
Co-authored-by: Dan Johns <117299936+djohns7@users.noreply.github.com>
Co-authored-by: Zac Bergquist <zac.bergquist@goteleport.com>
Co-authored-by: Ryan Clark <ryan.clark@goteleport.com>
Co-authored-by: Edoardo Spadolini <edoardo.spadolini@goteleport.com>
Co-authored-by: Bernard Kim <bernard@goteleport.com>
Co-authored-by: hugoShaka <hugo.hervieux@goteleport.com>
Co-authored-by: Vadym Popov <vadym.popov@goteleport.com>
Co-authored-by: Erik Tate <erik.tate@goteleport.com>
Co-authored-by: teleport-post-release-automation[bot] <128860004+teleport-post-release-automation[bot]@users.noreply.github.com>
Co-authored-by: GitHub <noreply@github.com>
Co-authored-by: Noah Stride <noah.stride@goteleport.com>
Co-authored-by: rosstimothy <39066650+rosstimothy@users.noreply.github.com>
Co-authored-by: Gabriel Corado <gabriel.oliveira@goteleport.com>
Co-authored-by: Forrest <30576607+fspmarshall@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants