Skip to content

Commit d2bfbfc

Browse files
vadimp-nvidiagregkh
authored andcommitted
mlxsw: core: Add validation of transceiver temperature thresholds
[ Upstream commit 57726eb ] Validate thresholds to avoid a single failure due to some transceiver unreliability. Ignore the last readouts in case warning temperature is above alarm temperature, since it can cause unexpected thermal shutdown. Stay with the previous values and refresh threshold within the next iteration. This is a rare scenario, but it was observed at a customer site. Fixes: 6a79507 ("mlxsw: core: Extend thermal module with per QSFP module thermal zones") Signed-off-by: Vadim Pasternak <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
1 parent 60b8b4e commit d2bfbfc

File tree

1 file changed

+7
-4
lines changed

1 file changed

+7
-4
lines changed

drivers/net/ethernet/mellanox/mlxsw/core_thermal.c

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,12 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core,
176176
if (err)
177177
return err;
178178

179+
if (crit_temp > emerg_temp) {
180+
dev_warn(dev, "%s : Critical threshold %d is above emergency threshold %d\n",
181+
tz->tzdev->type, crit_temp, emerg_temp);
182+
return 0;
183+
}
184+
179185
/* According to the system thermal requirements, the thermal zones are
180186
* defined with four trip points. The critical and emergency
181187
* temperature thresholds, provided by QSFP module are set as "active"
@@ -190,11 +196,8 @@ mlxsw_thermal_module_trips_update(struct device *dev, struct mlxsw_core *core,
190196
tz->trips[MLXSW_THERMAL_TEMP_TRIP_NORM].temp = crit_temp;
191197
tz->trips[MLXSW_THERMAL_TEMP_TRIP_HIGH].temp = crit_temp;
192198
tz->trips[MLXSW_THERMAL_TEMP_TRIP_HOT].temp = emerg_temp;
193-
if (emerg_temp > crit_temp)
194-
tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp +
199+
tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp +
195200
MLXSW_THERMAL_MODULE_TEMP_SHIFT;
196-
else
197-
tz->trips[MLXSW_THERMAL_TEMP_TRIP_CRIT].temp = emerg_temp;
198201

199202
return 0;
200203
}

0 commit comments

Comments
 (0)