Skip to content

Releases: kubernetes-sigs/node-feature-discovery

v0.14.0

07 Sep 16:14
Compare
Choose a tag to compare

What's new

NodeFeature API

The NodeFeature API is now enabled by default. The new CRD-based API replaces the previous gRPC-based communication between nfd-master and nfd-worker and, reducing network traffic and allows changes in NodeFeatureRules to take effect immediately (independent of the sleep-interval of nfd-worker). NodeFeature API can also be used to implement 3rd party extensions, see customization guide for more details.

Garbage collection of stale NodeFeature objects was added in the form of nfd-gc daemon.

The gRPC API is now deprecated and will be removed in a future release. The related command-line flags are also deprecated (and don't have any effect when NodeFeature API is in use):

  • nfd-master: -ca-file, -cert-file, -key-file, -port, -verify-node-name
  • nfd-worker: -ca-file, -cert-file, -key-file, -server, -server-name-override

Metrics

NFD now provides Prometheus metrics for better observability. Also, the Helm and kustomize deployments support enabling metrics collection with the Prometheus operator. See the documentation for more information about the available metrics and deployment instructions.

Hooks disabled by default

The deprecation of nfd-worker hooks continues, disabling them by default in v0.14. Potential users of hooks are encouraged to switch to use the NFD CRDs (NodeFeature and
NodeFeatureRule) or feature files. Hooks can still be enabled with the sources.local.hooksEnabled
configuration option.

Feature files

Expiry time: NFD now supports specifying an expiry time for the features specified in a feature file, providing better lifecycle management for the feature labels. See the documentation for more details.

Size limit: There is now a 64kB size limit for feature files.

Miscellaneous

NodeFeatureRule API

Dynamic values for labels is now supported by using the @ notation, see documentation for more details.

NFD-Master

  • support for leader election was added, enabling high-availability deployments with multiple-replicas of nfd-master (with the NodeFeature API enabled)
  • dynamically configurable logging parameters via the config file
  • configurable resync period for the CRD controller
  • parallelized node updates, speeding up simultaneous updates of large number of nodes (e.g. update in NodeFeatureRules in a big cluster), can be controlled with the -nfd-api-parallelism flag

CPU features

Detection of Intel TDX guests is now supported.

Logging

The project was migrated structured logging, making log messages more consistent, better machine parseable and enables future improvements in logging.

Support policy

The project now officially documented it's supported versions and deprecation policy, see the documentation for details.

List of PRs

  • test/e2e: use proper context (#1154)
  • deps: Update kubernetes to v1.27.1 (#1155)
  • generate: update k8s code-generator to v0.27.1 (#1156)
  • generate: update protoc to v22.3 (#1157)
  • generate: update controller-gen to v0.11.3 (#1158)
  • generate: update mockery to v2.25.1 (#1159)
  • nfd-master: support noPublish with -prune (#1161)
  • nfd-master: fix -prune (#1160)
  • nfd-master: don't create emtpy annotations (#1166)
  • nfd-master: fix a crash when processing NodeFeatureRules (#1173)
  • pkg/nfd-master/nfd-master.go: Fix typo (#1171)
  • nfd-master: reject malformed extended resource dynamic capacity assignment (#1169)
  • go.mod: update deps (#1178)
  • OWNERS: add ArangoGutierrez as an approver (#1180)
  • feat: add master resync period configurability (#1139)
  • nfd-topology-updater: fix wrong kubelet_internal_checkpoint path and compare basename to full path (#1167)
  • docs: add missing .md suffix to internal references (#1189)
  • nfd-master: log node name when processing NodeFeatureRules (#1191)
  • scripts/test-infra: provide PR info to codecov (#1194)
  • Match usage and example for prepare-release.sh (#1196)
  • apis/nfd: add unit tests for Feature type (#1190)
  • Update README to v0.13.1 (#1197)
  • scripts/test-infra: provide PR base SHA to codecov (#1199)
  • codecov: drop required minimum coverage ratio of a commit to 0% (#1200)
  • codecov: drop required minimum coverage ratio at patch level (#1201)
  • nfd-master: refactor api-controller object handling (#1198)
  • nfd-master: refactor filtering of labels, taints and ERs (#1202)
  • helm: fix mount for nfd-master config (#1204)
  • nfd-master: fix resync period config option (#1185)
  • deployment/helm: fix default for kubeletStateDir parameter (#1207)
  • deployment/kustomize: drop pod-resources mount for topology-updater (#1208)
  • test/e2e: refactor matching of node properties (#1184)
  • deployment/helm: avoid overlapping mount paths on topology-updater (#1212)
  • deployment/helm: user dedicated serviceaccount for topology-updater (#1213)
  • deployment/helm: improve handling of topologyUpdater.kubeletStateFiles (#1211)
  • topology-updater: use node IP in the default configz URI (#1218)
  • e2e: delete CRs only if found (#1221)
  • Add leader election for nfd-master (#1219)
  • Fixed typo in Header under deployment/kustomize.md (#1222)
  • nfd-master: use close for stop channel (#1227)
  • scripts/test-infra: bump golangci-lint to v1.52.2 (#1230)
  • nfd-master: add validation of label names and values (#1228)
  • Migrate to structured logging (#1223)
  • scripts/test-infra: add logcheck to verify script (#1235)
  • Update README to v0.13.2 (#1238)
  • github: update new-release issue template (#1239)
  • feat: support dynamic values for labels in the NodeFeatureRule (#1226)
  • feat: parallelize nodes update (#1133)
  • cpu: Discover TDX guests based on cpuid information (#1240)
  • deployment/kustomize: use a named port for nfd gRPC service (#1243)
  • Fix missing apostrophe for jq (#1245)
  • Fix a typo on nfd-master cmd (#1244)
  • Removal of the bases field as it is deprecated by kustomize (#1246)
  • Docs: Fix typo on customization-guide (#1247)
  • hooks: disable hooks by default from v0.14 (#1182)
  • Remove pkg's imported twice (#1248)
  • fix typo in helm chart (#1253)
  • Stop ticker in time to avoid memory leak (#1255)
  • nfd-master: check for nil references in nfdAPIUpdateAllNodes (#1258)
  • cpu: Take cgroupsv1 into account when reading misc.capacity (#1265)
  • go.mod: update kubernetes to v1.27.4 (#1268)
  • github: update assignees in new-release issue template (#1274)
  • Enable metrics via prometheus operator (#1242)
  • README: update to v0.13.3 (#1276)
  • docs: document version and deprecation policy (#1279)
  • docs: fix toc of topology-updater and topology-gc reference (#1278)
  • docs: remove useless TOCs (#1280)
  • Add optional labels to the podmonitor (#1282)
  • docs: describe supported Kubernetes versions (#1277)
  • docs: deprecation policy for Helm chart params (#1283)
  • Fix Topology Manager policy and scope not being updated after NRT creation (#1256)
  • generate: bump tools to their latest versions (#1284)
  • Improve metrics (#1288)
  • docs: align metrics documentation with latest changes on naming (#1289)
  • docs: unify formatting of NOTEs (#1292)
  • source/local: trim whitespace from input (#1293)
  • source/local: support comments in input (#1294)
  • nfd-master: use term node update instead of labeling (#1291)
  • docs: document -metrics flag in command line reference (#1296)
  • fix empty hugepages in some numa nodes caused no such file or directory errors (#1287)
  • scripts/test-infra: update logcheck tool to v0.6.0 (#1299)
  • scripts/test-infra: bump golangci-lint to v1.54.0 (#1300)
  • Update kubernetes to v1.28.0 (#1302)
  • docs: update github-pages gem to v228 (#1303)
  • topology-gc: fix Stop (#1306)
  • topology-gc: rename run() (#1309)
  • topology-gc: rename runGC to garbageCollect() (#1310)
  • nfd-topology-updater: add metrics support (#1295)
  • topology-gc: refactor unit tests (#1307)
  • topology-gc: move initial GC out of startNodeInformer() (#1308)
  • topology-gc: simplify listing of node objects (#1311)
  • metrics: additional metrics for nfd-master (#1290)
  • Garbage collection of NodeFeature objects (#1305)
  • topology-updater: make -version always runnable (#1297)
  • go.mod: update kubernetes to v1.28.1 (#1315)
  • Makefile: increase golangci-lint timeout to 10min (#1320)
  • docs: use ruby docker image for building docs (#1319)
  • README: update to v0.13.4 (#1324)
  • test: add node updater pool unit tests (#1252)
  • docs: nfd-updater: clarify accounting (#1321)
  • nfd-updater: events: enable timer-only flow (#1325)
  • docs...
Read more

v0.13.4

01 Sep 09:41
v0.13.4
082f3fe
Compare
Choose a tag to compare

Changelog

This release contains one bug fix to the nfd-topology-updater and makes it runnable in Kubernetes v1.28, in addition to updating dependencies.

List of PRs

  • fix empty hugepages in some numa nodes caused no such file or directory errors (#1298)
  • Bump kubernetes to v1.28.1 (#1318)

v0.13.3

21 Jul 10:12
v0.13.3
3a42822
Compare
Choose a tag to compare

This patch release contains a few bug fixes in addition to updating dependencies.

What's Changed

Full Changelog: v0.13.2...v0.13.3

v0.12.5

21 Jul 10:11
v0.12.5
27d79ef
Compare
Choose a tag to compare

This patch releases updates dependencies.

What's Changed

Full Changelog: v0.12.4...v0.12.5

v0.13.2

01 Jun 11:49
v0.13.2
09bc42e
Compare
Choose a tag to compare

This patch release adds validation for feature label names and values, updates dependencies and contains fixes to the Helm chart.

List of PRs

  • helm: fix mount for nfd-master config (#1205)
  • deployment/kustomize: drop pod-resources mount for topology-updater (#1210)
  • deployment/helm: fix default for kubeletStateDir parameter (#1209)
  • deployment/helm: improve handling of topologyUpdater.kubeletStateFiles (#1217)
  • deployment/helm: avoid overlapping mount paths on topology-updater (#1214)
  • deployment/helm: user dedicated serviceaccount for topology-updater (#1215)
  • go.mod: bump kubernetes to v1.26.5 (#1224)
  • nfd-master: add validation of label names and values (#1233)

v0.12.4

01 Jun 11:45
v0.12.4
9371cea
Compare
Choose a tag to compare

This patch release contains bug fixes to nfd-master, adds validation for feature labels, updates dependencies and fixes an issue with the Helm chart.

List of PRs

  • nfd-master: support noPublish with -prune (#1164)
  • nfd-master: fix a crash when processing NodeFeatureRules (#1176)
  • deployment/helm: user dedicated serviceaccount for topology-updater (#1216)
  • go.mod: bump kubernetes to v1.26.5 (#1225)
  • nfd-master: add validation of label names and values (#1234)

v0.13.1

27 Apr 08:37
Compare
Choose a tag to compare

Changelog

This patch release contains bug fixes to nfd-master and infd-topology-updater.

List of PRs

Full Changelog: v0.13.0...v0.13.1

v0.13.0

18 Apr 15:19
v0.13.0
9697ffe
Compare
Choose a tag to compare

Changelog

Default image based on distroless

The default container image is now based on distroless/base. This was formerly shipped as the "minimal" image, and "v0.13.0-minimal" image tag is thus provided for backwards compatibility. A new "full" image variant (v0.13.0-full) that corresponds the previous default image is made available.

The practical user impact of this change is that support for hooks is limited to statically linked ELF binaries. Bash or Perl scripts are not supported by the default image anymore, but the new "full" image variant can be used if support for these is needed.

Config file for nfd-master

NFD-Master now supports dynamic run-time configurability through a configuration file, deployed as a ConfigMap similar to the nfd-worker. Many of the command line flags are now available as dynamically changeable config file options. Visit the documentation for more details.

Allow custom label prefixes

The restrictions on allowed label prefixes (or label namespaces) for custom labels are mostly removed. All prefixes are allowed, except for kubernetes.io/ and its sub-namespaces (i.e. *.kubernetes.io/), with the NFD-specific feature.node.kubernetes.io/ and profile.node.kubernetes.io/ (and their sub-namespaces) still being allowed.

Those wanting to have stricter policy on allowed label prefixes can use the new denyLabelNs config file option (or the corresponding -deny-label-ns command line flag) of nfd-master. To preserve the old behavior of rejecting all custom prefixes, denyLabelNs="*" can be used, with extraLabelNs config option available for allowing specific custom prefixes.

Extended resources

NFD now supports creating node extended resources from the NodeFeatureRule custom resources. See the documentation for details. With this the -resource-labels command line flag is now marked as deprecated.

Topology Updater enhancements

A new Topology-Garbage-Collector daemon for deleting obsolete NodeResourceTopology objects was added. This daemon is enabled in default deployments.

Topology-Updater reacts faster to changes in the node, making NodeResourceTopology objects more accurately track the current state of node resource status.

Topology-Updater gained the ability to report "pods fingerprint" as a single value representing the node resources status. See the new -pods-fingerprint command line flag.

Topology-Updater now supports the latest v1alpha2 version of the NodeResourceTopology API.

Miscellaneous

New CPU features:

  • X86_64
    • Intel Sierra Forest: AVXIFMA, AVXNECONVERT, AVXVNNIINT8, CMPCCXADD, WRMSRNS and MSRLIST
    • number of Intel TDX keys
    • amount Intel SGX EPC (Encrypted Page Cache) memory
    • AMD SEV, including number of ASIDs (Address Space Identifiers), and number of ES (Encrypted State) IDs
  • PPC64
    • IBM Nest Accelerator for GZIP
  • RDT: number of L3 CLOSID

Kernel: new kernel.enabledmodule feature that lists both loaded dynamic modules and modules built into the kernel.

Deprecations

The feature.node.kubernetes.io/cpu-rdt.* labels are now marked as deprecated and will be removed in a future release. The RDT features will stay to be available for NodeFeatureRule objects to consume to create custom labels.

The -resource-labels command line flag is now deprecated and will be removed in a future release. NodeFeatureRule objects should be used for managing node extended resources, instead.

List of PRs

  • docs: mention NodeFeature as an extension point (#1009)
  • docs: fix typo in CRD name (#1011)
  • Use single-dash format for nfd cmdline flags (#1013)
  • README: update to latest release v0.12.0 (#1014)
  • dockerfile: update grpc-health-probe to v0.4.14 (#1015)
  • Add common utility function for getting node name (#1018)
  • topology-updater: move code (#1019)
  • apis/nfd: make all fields in NodeFeatureSpec optional (#1017)
  • worker: move code (#1020)
  • Bump cpuid to v2.2.3 (#1023)
  • Docs: mention tainting in the intro section (#1021)
  • test/e2e: more comprehensive test for NodeFeature objects (#1016)
  • Add missing TopologyManagerPolicy (#1026)
  • Add NRT garbage collector (#1024)
  • e2e: append _test suffix to test files (#1029)
  • e2e: init docker image (#1028)
  • nfd-master: always start gRPC server (#1034)
  • docs: fix internal cross-page references by injecting .md (#1030)
  • docs: Fix link for Helm docs (#1040)
  • cpu: support for detecting nx-gzip coprocessor feature (#956)
  • README: update to release v0.12.1 (#1042)
  • helm: make master port configurable (#1044)
  • test: move out unit testing from Dockerfile (#1047)
  • deployment: disable service links in NFD master pod (#1045)
  • topology-updater: nrt-api Update to v1alpha2 (#1053)
  • Change nfd-worker to use Ticker instead of After. (#1050)
  • images: base the default image on distroless/base (#1027)
  • Add discovery duration logging (#1055)
  • OWNERS: Update Ethyling username to jjacobelli (#1056)
  • Advertise TopologyManger policy and scope as Attributes in NRT api v1alpha2 (#1054)
  • feat: add deny-label-ns flag which supports wildcard (#1051)
  • Fix some typos (#1058)
  • scripts/test-infra: bump golangci-lint to v1.51.1 (#1061)
  • GO Update version to 1.20 (#1059)
  • source/cpu: fix build flags of cpuid detection (#1063)
  • go.mod: bump cpuid to v2.2.4 (#1064)
  • docs: describe nfd-topology-gc in introduction.md (#1062)
  • test/e2e: rename ginkgo focus for tests (#1065)
  • topology-updater:compute pod set fingerprint (#1049)
  • test/e2e: cleanup NodeFeature objects before/after tests (#1074)
  • test/e2e: reduce worker wait-for-ready period to 2s (#1073)
  • docs: fix usage customization guide typos (#1066)
  • test: add code coverage reporting (#1069)
  • helm: fix topology-updater rbac (#1078)
  • deployment: fixes for mounting kubelet config (#1080)
  • Update worker-configuration-reference.md (#1076)
  • scripts/test-infra: bump golangci-lint to v1.51.2 (#1082)
  • test: implement e2e test of the deny-label-ns flag (#1070)
  • go.mod: update kubernetes to v1.26.2 (#1077)
  • pkg/utils: add UnmarshalJSON method to StringSetVal (#1087)
  • codegen: fix code-generation (#1083)
  • kustomize: trim prune overlay (#1090)
  • gitignore: ignore codecov coverage report (#1085)
  • topology-updater: reactive updates (#1031)
  • chore: add debug dump of nfd worker configuration (#1092)
  • feat: add enableTaints to helm chart (#1091)
  • cpu: expose AMD SEV support (#1097)
  • cpu: Expose the total number of keys for TDX (#1079)
  • go.mod: update kubernetes to v1.26.3 (#1106)
  • README: update to release v0.12.2 (#1112)
  • feat: add master config file (#1084)
  • test/e2e: fix node cleanup function (#1115)
  • source/cpu: deprecate cpu-rdt.* labels (#1114)
  • test/e2e: wait for CRD deletion to complete (#1116)
  • test/e2e: refactor nfd pod configuration (#1117)
  • nfd-master: disallow unprefixed and kubernetes taints (#1118)
  • nfd-master: fix node update (#1119)
  • Advertise RDT L3 num_closid (#1100)
  • Create extended resources with NodeFeatureRule (#1099)
  • Dockerfile: bump grpc-health-probe to v0.4.17 (#1121)
  • docs: add missing mentions of extended resources and taints (#1122)
  • nfd-master: increase controller resync period to 1 hour (#1123)
  • nfd-master: re-try on node update failures (#1127)
  • Makefile: set e2e test timeout to 1 hour (#1128)
  • feat: support builtin kernel mods (#1086)
  • nfd-master: deprecate the -resource-labels flag (#1126)
  • source/cpu: don't create cpu-security.tdx.total_keys label (#1130)
  • cpu: Expose SGX EPC resource (#1129)
  • e2e: add codecov uploader configuration (#1095)
  • OWNERS: add PiotrProkop as a reviewer (#1140)
  • Dockerfile: bump grpc-health-probe to v0.4.18 (#1145)
  • cpu: expose the total number of AMD SEV ASID and ES (#1149)
  • hack/prepare-release.sh: fix name of one e2e test file (#1151)

v0.12.3

18 Apr 06:23
v0.12.3
9b1893c
Compare
Choose a tag to compare

Changelog

This patch release contains bug fixes to nfd-master and improvements to the Helm chart.

List of PRs

  • helm: make master port configurable (#1135)
  • feat: add enableTaints to helm chart (#1136)
  • nfd-master: fix node update (#1137)
  • nfd-master: re-try on node update failures (#1138)
  • Dockerfile: bump grpc-health-probe to v0.4.18 (#1147)

v0.12.2

03 Apr 11:36
v0.12.2
221359a
Compare
Choose a tag to compare

What's Changed

This patch release updates dependencies and fixes some issues with the Helm chart.

List of PRs

  • docs: Fix link for Helm docs (#1041)
  • helm: fix topology-updater rbac (#1103)
  • go.mod: update kubernetes to v1.26.2 (#1107)
  • go.mod: update kubernetes to v1.26.3 (#1108)
  • source/cpu: fix build flags of cpuid detection (#1104)
  • deployment: fixes for mounting kubelet config (#1105)

Full Changelog: v0.12.1...v0.12.2