Skip to content

Commit

Permalink
Add documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
conroy-cheers committed Jun 25, 2024
1 parent 68bdfd6 commit 2562cc4
Show file tree
Hide file tree
Showing 8 changed files with 105 additions and 5 deletions.
16 changes: 16 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,11 @@ repository = "https://github.com/conroy-cheers/ocsd"
readme = "README.md"

[features]
client = ["dep:devmem"]
## Enable `client` module for easy access to the OCSD buffer via `/dev/mem`
devmem = ["dep:devmem"]

[dependencies]
bitmask-enum = "2.2.4"
bytemuck = { version = "1.16.1", features = ["derive"] }
devmem = { version = "0.1.1", optional = true }
document-features = "0.2.8"
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,13 @@ on compatible HPE servers.

Credit to [ilo4_unlock](https://github.com/kendallgoto/ilo4_unlock) which made this
reverse-engineering effort possible.

On ML350 Gen9, `cat /proc/iomem` yields:
```
791ff000-7b5fefff : ACPI Non-volatile Storage
7b5ff000-7b7fefff : ACPI Tables
7b7ff000-7b7fffff : System RAM
```

According to the `ocsd header` command, the OCSD buffer starts at `0x791f6000`,
which is inside the "ACPI Non-volatile Storage" region.
49 changes: 48 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,52 @@
#[cfg(feature = "client")]
#![deny(missing_docs)]

//! This crate provides utilities to enable reporting and monitoring of OCSD sensor values
//! from the host OS.
//!
//! ## Disclaimer
//! *Protocol documentation for OCSD is not publicly available.
//! All of this has been reverse-engineered from a single ML350 Gen9. It has not been
//! tested on any other hardware! I cannot guarantee that it fully complies with the
//! OCSD protocol, and results on your server may vary.*
//!
//! ## What's OCSD?
//! Some HPE servers from Gen8 onwards are equipped with a feature called
//! Option Card Sensor Data (OCSD), also referred to as "Sea of Sensors".
//!
//! OCSD allows temperature data to be sent from option cards (e.g. RAID controller
//! daughterboards, PCIe expansion cards) to iLO/BIOS for fan control and monitoring.
//!
//! ## OK, cool, so why do we need this crate?
//! Ordinarily, supported option cards will directly report temperatures via OCSD
//! without any involvement from the host OS, and the server will respond by controlling
//! the fans accordingly.
//!
//! In the case of unsupported option cards, the server may do one of the following:
//! - Assume that the card is running *very hot* and spin up the fans to deafening levels
//! at all times
//! - Ignore the card's existence entirely. In the case of passively cooled cards (e.g.
//! unsupported server GPUs), this leads to thermal throttling due to insufficient fan
//! speed at high load.
//!
//! Ideally, when installing an unsupported option card, we would just modify its firmware
//! to report temperatures directly to the OCSD buffer. Unfortunately, this would be
//! really difficult.
//!
//! As an alternative, this crate allows the host OS to take the reported temperatures
//! available from existing drivers, and forward them to the OCSD buffer so they can
//! be used by the iLO controller for reporting and fan control.
//!
//! It also allows for reading reported temperatures for supported devices directly out
//! of the OCSD buffer, although there are probably better ways of getting that data.

//! ## Feature flags
#![doc = document_features::document_features!(feature_label = r#"<span class="stab portability"><code>{feature}</code></span>"#)]

#[cfg(feature = "devmem")]
pub mod client;

/// Protocol interface for manipulating, decoding, and encoding
/// OCSD structures.
pub mod protocol;

#[cfg(test)]
Expand Down
3 changes: 3 additions & 0 deletions src/protocol/error.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
//! OCSD protocol error types.

pub use super::temperature::TempOutOfRange;
4 changes: 3 additions & 1 deletion src/protocol/mod.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
mod data;
pub mod error;
mod ocsd;
pub mod temperature;
mod temperature;

pub use ocsd::*;
pub use temperature::Celsius;
20 changes: 20 additions & 0 deletions src/protocol/ocsd.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,23 @@ use super::{
#[bitmask_enum::bitmask(u16)]
#[derive(Default)]
pub enum OcsdSensorStatus {
/// 0b0001: sensor not failed
NotFailed,
/// 0b0010: sensor is present
Present,
/// 0b0100: sensor is disabled
Disabled,
/// 0b1000: checksum is enabled
WithChecksum,
}

/// Type of OCSD sensor
#[derive(Default, Clone, Copy)]
pub enum OcsdSensorType {
#[default]
/// Reserved for decoding null sensors or sensors with an unimplemented type
Unknown = 0,
/// Only thermal sensors are supported currently
Thermal = 1,
}

Expand All @@ -40,12 +47,16 @@ impl From<u8> for OcsdSensorType {
}
}

/// Location of OCSD sensor on the option card
#[allow(dead_code)]
#[derive(Default, Clone, Copy)]
pub enum OcsdSensorLocation {
#[default]
/// Reserved for decoding null sensors or sensors with an unimplemented type
Unknown = 0,
/// Internal to card ASIC (e.g. on-die GPU temperature sensor)
InternalToAsic = 1,
/// Somewhere else on the option card
OnboardOther = 5,
}

Expand All @@ -59,9 +70,12 @@ impl From<u32> for OcsdSensorLocation {
}
}

/// OCSD protocol version
#[derive(Clone, Copy)]
pub enum OcsdVersion {
/// Reserved for decoding invalid data or header with an unimplemented version
Unknown = 0,
/// OCSD version 2
Version2 = 2,
}

Expand All @@ -74,9 +88,12 @@ impl From<u8> for OcsdVersion {
}
}

/// OCSD device version
#[derive(Clone, Copy)]
pub enum DeviceVersion {
/// Reserved for decoding null sensors or sensors with an unimplemented type
Unknown = 0,
/// Device version 1
Version1 = 1,
}

Expand All @@ -89,6 +106,7 @@ impl From<u8> for DeviceVersion {
}
}

/// Used for structs which have a 1:1 representation in OCSD shared memory.
pub trait MemoryMapped {
/// Returns byte representation of the structure
/// as it should appeaer in OCSD memory.
Expand Down Expand Up @@ -156,7 +174,9 @@ impl MemoryMapped for OcsdHeader {
/// This implementation assumes fixed-size devices with
/// 3 sensor slots.
pub struct OcsdDevice {
/// Associates the OCSD device with a PCI device; also provides some extra information
pub header: OcsdDeviceHeader,
/// Each device can have up to 3 sensors. Unused sensors should be set to Default::default.
pub sensors: [OcsdSensor; 3],
}

Expand Down
4 changes: 2 additions & 2 deletions src/protocol/temperature.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ impl Display for TempOutOfRange {

impl Error for TempOutOfRange {}

/// Represents a signed integer temperature in degrees Celsius,
/// stored as a single-byte raw value.
#[derive(Default)]
pub struct Celsius {
value: i8,
}

/// Represents a signed integer temperature in degrees Celsius,
/// stored as a single-byte raw value.
impl Celsius {
const OFFSET: i8 = 0;

Expand Down

0 comments on commit 2562cc4

Please sign in to comment.