Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new platform management API. Currently with support for chassis, PSUs, fans and watchdog #13

Merged
merged 17 commits into from
Dec 4, 2018
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
packages=[
'sonic_eeprom',
'sonic_led',
'sonic_platform_base',
'sonic_psu',
'sonic_sfp',
],
Expand Down
Empty file added sonic_platform_base/__init__.py
Empty file.
207 changes: 207 additions & 0 deletions sonic_platform_base/chassis_base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
#
# chassis_base.py
#
# Base class for implementing a platform-specific class with which
# to interact with a chassis device in SONiC.
#

import sys
from . import device_base


class ChassisBase(device_base.DeviceBase):
"""
Base class for interfacing with a platform chassis
"""

# Possible reboot causes
REBOOT_CAUSE_POWER_LOSS = "power_loss"
REBOOT_CAUSE_THERMAL_OVERLOAD = "thermal_overload"
jleveque marked this conversation as resolved.
Show resolved Hide resolved
REBOOT_CAUSE_INSUFFICIENT_FAN = "insufficient_fan"
REBOOT_CAUSE_WATCHDOG = "watchdog"
REBOOT_CAUSE_SOFTWARE = "software"
jleveque marked this conversation as resolved.
Show resolved Hide resolved

# List of ModuleBase-derived objects representing all modules
# available on the chassis (for use with modular chassis)
_module_list = []

# List of FanBase-derived objects representing all fans
# available on the chassis
_fan_list = []

# List of PsuBase-derived objects representing all power supply units
# available on the chassis
_psu_list = []

# Object derived from WatchdogBase for interacting with hardware watchdog
_watchdog = None

def get_base_mac(self):
"""
Retrieves the base MAC address for the chassis

Returns:
A string containing the MAC address in the format
'XX:XX:XX:XX:XX:XX'
"""
raise NotImplementedError

def get_reboot_cause(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reboot cause can be persistent on some product and have to be explicitly cleared.
Otherwise reading this information after a reboot will show all the reload causes that happened previous times.
Some platform might also have limitation on the number of reboot causes that can be stored.

I would suggest either a default parameter clear=False to this method or to add a new one called clear_reboot_cause.
Also I think this function should only be called once with clear=True and the result cached somewhere in a tmpfs. This means that future asks for reload cause information would go through the cache.

Copy link
Contributor Author

@jleveque jleveque Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reboot cause can be persistent on some product and have to be explicitly cleared.
Otherwise reading this information after a reboot will show all the reload causes that happened previous times.

I would suggest either a default parameter clear=False to this method or to add a new one called clear_reboot_cause.
Also I think this function should only be called once with clear=True and the result cached somewhere in a tmpfs. This means that future asks for reload cause information would go through the cache.

Since the need to clear the reboot cause after reading is platform-specific functionality, I believe this logic should be left to the platform-specific implementation of the function. If the platform requires clearing the reboot cause, it should be done in the implementation and a file saved to tmpfs. The function can check for the presence of the file first before attempting to read/clear the register. I prefer this to adding a clear=True parameter to the base method, and I think adding a clear_reboot_cause() method is unnecessary, because there should be no need to explicitly clear the reboot cause.

Some platform might also have limitation on the number of reboot causes that can be stored.

If you're concerned that not all platforms will support all reboot causes, that is expected. The base class will define all possible reboot causes that SONiC understands; the platform-specific implementation will simply return the subset which is applicable to that platform. If this doesn't address your concern, please elaborate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reboot cause can be persistent on some product and have to be explicitly cleared.
Otherwise reading this information after a reboot will show all the reload causes that happened previous times.
I would suggest either a default parameter clear=False to this method or to add a new one called clear_reboot_cause.
Also I think this function should only be called once with clear=True and the result cached somewhere in a tmpfs. This means that future asks for reload cause information would go through the cache.

Since the need to clear the reboot cause after reading is platform-specific functionality, I believe this logic should be left to the platform-specific implementation of the function. If the platform requires clearing the reboot cause, it should be done in the implementation and a file saved to tmpfs. The function can check for the presence of the file first before attempting to read/clear the register. I prefer this to adding a clear=True parameter to the base method, and I think adding a clear_reboot_cause() method is unnecessary, because there should be no need to explicitly clear the reboot cause.

I would assume all the platforms to behave the same way with regard to the reboot causes. However this information would have to be crowd-sourced from other vendors as well for more relevant data.
I'm fine with letting the responsibility of clearing and caching the reboot cause to the plugin implementation.
Is there a place in SONiC where such cache files should go? Currently almost everything is backed by the flash including /tmp. The only place that is a tmpfs and would feel "suitable" seems to be /run. Is there some guidelines somewhere regarding this? I would be interested to know the reason why /tmp is not a tmpfs.

Some platform might also have limitation on the number of reboot causes that can be stored.

If you're concerned that not all platforms will support all reboot causes, that is expected. The base class will define all possible reboot causes that SONiC understands; the platform-specific implementation will simply return the subset which is applicable to that platform. If this doesn't address your concern, please elaborate.

My concern was a bit different. Your previous answer ended up answered my question but I'll elaborate a bit more.
Basically the concept of a reboot cause is to be stored on an external device until being read.
These devices usually have space restriction and can only save a given number of reboot cause after which they won't store anything until being cleared. For example if your device can store 2 reboot cause and you trigger 2 watchdogs and then 1 powerloss, you might not see the powerloss if you haven't cleared the reboot cause after each reboot.

"""
Retrieves the cause of the previous reboot

Returns:
A string containing the cause of the previous reboot. This string
must be one of the predefined strings in this class.
"""
raise NotImplementedError

##############################################
# Module methods
##############################################

def get_num_modules(self):
"""
Retrieves the number of modules available on this chassis

Returns:
An integer, the number of modules available on this chassis
"""
return len(self._module_list)

def get_all_modules(self):
"""
Retrieves all modules available on this chassis

Returns:
A list of objects derived from ModuleBase representing all
modules available on this chassis
"""
return self._module_list

def get_module(self, index):
"""
Retrieves module represented by (0-based) index <index>

Args:
index: An integer, the index (0-based) of the module to
retrieve

Returns:
An object dervied from ModuleBase representing the specified
module
"""
module = None

try:
module = self._module_list[index]
except IndexError:
sys.stderr.write("Module index {} out of range (0-{})\n".format(
index, len(self._module_list)-1))

return module

##############################################
# Fan methods
##############################################

def get_num_fans(self):
"""
Retrieves the number of fans available on this chassis

Returns:
An integer, the number of fan modules available on this chassis
"""
return len(self._fan_list)

def get_all_fans(self):
"""
Retrieves all fan modules available on this chassis

Returns:
A list of objects derived from FanBase representing all fan
modules available on this chassis
"""
return self._fan_list

def get_fan(self, index):
"""
Retrieves fan module represented by (0-based) index <index>

Args:
index: An integer, the index (0-based) of the fan module to
retrieve

Returns:
An object dervied from FanBase representing the specified fan
module
"""
fan = None

try:
fan = self._fan_list[index]
except IndexError:
sys.stderr.write("Fan index {} out of range (0-{})\n".format(
index, len(self._fan_list)-1))

return fan
jleveque marked this conversation as resolved.
Show resolved Hide resolved

##############################################
# PSU methods
##############################################

def get_num_psus(self):
"""
Retrieves the number of power supply units available on this chassis

Returns:
An integer, the number of power supply units available on this
chassis
"""
return len(self._psu_list)

def get_all_psus(self):
"""
Retrieves all power supply units available on this chassis

Returns:
A list of objects derived from PsuBase representing all power
supply units available on this chassis
"""
return self._psu_list

def get_psu(self, index):
"""
Retrieves power supply unit represented by (0-based) index <index>

Args:
index: An integer, the index (0-based) of the power supply unit to
retrieve

Returns:
An object dervied from PsuBase representing the specified power
supply unit
"""
psu = None

try:
psu = self._psu_list[index]
except IndexError:
sys.stderr.write("PSU index {} out of range (0-{})\n".format(
index, len(self._psu_list)-1))

return psu

##############################################
# Other methods
##############################################

def get_watchdog(self):
"""
Retreives hardware watchdog device on this chassis

Returns:
An object derived from WatchdogBase representing the hardware
watchdog device
"""
return _watchdog
66 changes: 66 additions & 0 deletions sonic_platform_base/device_base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#
# device_base.py
#
# Abstract base class for interfacing with a generic type of platform
# peripheral device in SONiC
#

class DeviceBase(object):
"""
Abstract base class for interfacing with a generic type of platform
peripheral device
"""

def get_presence(self):
"""
Retrieves the presence of the device

Returns:
bool: True if device is present, False if not
"""
raise NotImplementedError

def get_model(self):
"""
Retrieves the model number (or part number) of the device

Returns:
string: Model/part number of device
"""
raise NotImplementedError

def get_serial(self):
"""
Retrieves the serial number of the device

Returns:
string: Serial number of device
"""
raise NotImplementedError

def get_status(self):
"""
Retrieves the operational status of the device

Returns:
A boolean value, True if device is operating properly, False if not
"""
raise NotImplementedError

def get_change_event(self, timeout=0):
"""
Returns a dictionary containing all devices which have experienced a
change

Args:
timeout: Timeout in milliseconds (optional). If timeout == 0,
this method will block until a change is detected.

Returns:
(bool, dict):
- True if call successful, False if not;
- Dict where key is device ID and value is device event,
status='1' represents device inserted,
status='0' represents device removed. Ex. {'0': '1', '1': '0'}
"""
raise NotImplementedError
Copy link

@padmanarayana padmanarayana Nov 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to get confirmation that if a platform takes a Baseboard Management Controller (BMC) based approach (to handle thermals, events etc), these (and the other such APIs) can remain unimplemented..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention here is for daemons to receive notifications of changes in platform hardware state in order to update SONiC's view of device state. For instance, if a tranceiver, PSU or fan tray is added or removed. Can you provide an example of an event handled by the BMC which SONiC would not need to be made aware of?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the events referred to is specifically to track the presence/absence i..e OIR of the modules (essentially this API seems to be a generalization of the get_transceiver_change_event()).

If the BMC implements the thermals, an addition/removal of FAN tray would be completely handled by the BMC itself - but atleast for generating a syslog this API needs to be implemented for SONiC..

89 changes: 89 additions & 0 deletions sonic_platform_base/fan_base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
#
# fan_base.py
#
# Abstract base class for implementing a platform-specific class with which
# to interact with a fan module in SONiC
#

from . import device_base


class FanBase(device_base.DeviceBase):
"""
Abstract base class for interfacing with a fan module
"""

# Possible fan directions (relative to port-side of device)
FAN_DIRECTION_INTAKE = "intake"
FAN_DIRECTION_EXHAUST = "exhaust"

# Possible fan status LED colors
STATUS_LED_COLOR_GREEN = "green"
STATUS_LED_COLOR_RED = "red"
STATUS_LED_COLOR_OFF = "off"

def get_direction(self):
"""
Retrieves the direction of fan

Returns:
A string, either FAN_DIRECTION_INTAKE or FAN_DIRECTION_EXHAUST
depending on fan direction
"""
raise NotImplementedError

def get_speed(self):
"""
Retrieves the speed of fan as a percentage of full speed

Returns:
An integer, the percentage of full fan speed, in the range 0 (off)
to 100 (full speed)
"""
raise NotImplementedError

def get_target_speed(self):
"""
Retrieves the target (expected) speed of the fan

Returns:
An integer, the percentage of full fan speed, in the range 0 (off)
to 100 (full speed)
"""
raise NotImplementedError

def get_speed_tolerance(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is only for the fan daemon logic usage?

Copy link
Contributor Author

@jleveque jleveque Oct 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the initial intention. We may find other uses in the future, however. Is there a concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concerns. I'm was just curious if the fand was in the roadmap.

"""
Retrieves the speed tolerance of the fan

Returns:
An integer, the percentage of variance from target speed which is
considered tolerable
"""
raise NotImplementedError

def set_speed(self, speed):
"""
Sets the fan speed

Args:
speed: An integer, the percentage of full fan speed to set fan to,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see everywhere that the fan speed is an int from 0 to 100.
Some device dealing with fans can have finer grained control (like 0 to 255) which means loosing a bit of precision.
Was this taken into consideration and integer deemed better than floats?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you prefer a 0.0 ... 1.0 float scale? This should allow for finer-grained control if necessary, while maintaining a more universal scale.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind either way, integer percentages 0 to 100 sounds good to me.
I just wanted to bring this topic up as the sysfs interface defines a pwm value to be ranging from 0 to 255.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all platforms used sysfs, the 0-255 scale would be fine. However, we'd like a universal scale that will work with other interfaces (IPMI, etc.).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any strong preference. I just wanted to bring that topic to make sure it wasn't overlooked.
A percentage sounds perfect to me.

in the range 0 (off) to 100 (full speed)

Returns:
A boolean, True if speed is set successfully, False if not
"""
raise NotImplementedError

def set_status_led(self, color):
"""
Sets the state of the fan module status LED

Args:
color: A string representing the color with which to set the
fan module status LED

Returns:
bool: True if status LED state is set successfully, False if not
"""
raise NotImplementedError
Loading