Quantcast
Channel: Intel Communities : Discussion List - Servers
Viewing all articles
Browse latest Browse all 3923

SC2600CO random reboot

$
0
0

Dears, I have a SC2600CP board in a server with 2 Xeon CPUs and 196GB of RAM.

This machine is used as a calculation node in a cluster environment, with other machines that has almost the same configuration.

 

A few days ago it started to reboot with no reason.

 

To try to identify the problem, I checked all DIMM slots and all the memory's looking for someone with error.  I tested all of them but could not found an error.

Then I checked the SEL logs:

 

  1 | 09/17/2017 | 16:38:10 | Event Logging Disabled #0x07 | Log area reset/cleared | Asserted

   2 | 09/17/2017 | 17:16:55 | Power Unit #0x01 | Failure detected | Asserted

   3 | 09/17/2017 | 17:16:56 | Power Unit #0x01 | Power off/down | Asserted

   4 | 09/17/2017 | 17:17:01 | Power Unit #0x01 | Power off/down | Deasserted

   5 | 09/17/2017 | 17:17:01 | Power Unit #0x01 | Failure detected | Deasserted

   6 | 09/17/2017 | 17:17:02 | Power Unit #0x01 | Power off/down | Asserted

   7 | 09/17/2017 | 17:17:07 | Power Unit #0x01 | Power off/down | Deasserted

   8 | 09/17/2017 | 17:17:13 | Fan #0x32 | Lower Non-critical going low  | Deasserted

   9 | 09/17/2017 | 17:17:13 | Fan #0x32 | Lower Critical going low  | Deasserted

   a | 09/17/2017 | 17:17:13 | Fan #0x32 | Lower Non-critical going low  | Deasserted

   b | 09/17/2017 | 17:17:13 | Fan #0x32 | Lower Critical going low  | Deasserted

   c | 09/17/2017 | 17:17:24 | Fan #0x32 | Lower Non-critical going low  | Asserted

   d | 09/17/2017 | 17:17:24 | Fan #0x32 | Lower Critical going low  | Asserted

   e | 09/17/2017 | 17:17:31 | System Event #0x83 | Timestamp Clock Sync | Asserted

   f | 09/17/2017 | 17:17:32 | System Event #0x83 | Timestamp Clock Sync | Asserted

  10 | 09/17/2017 | 17:17:55 | System Event #0x83 | OEM System boot event | Asserted

 

and on the BMC web console:

 

3009/17/2017 17:39:32Pwr Unit StatusPower Unitreports the power unit is powered off or being powered down - Asserted
2909/17/2017 17:37:19BIOS Evt SensorSystem Eventreports OEM System Boot Event - Asserted
2809/17/2017 17:36:56BIOS Evt SensorSystem Eventreports Timestamp Clock Sync. Event is one of two expected events from BIOS on every power on. - Asserted
2709/17/2017 17:36:56BIOS Evt SensorSystem Eventreports Timestamp Clock Sync. Event is one of two expected events from BIOS on every power on. - Asserted
2609/17/2017 17:36:49System Fan 3Fanreports the sensor is in a low, critical, and going lower state - Asserted
2509/17/2017 17:36:49System Fan 3Fanreports the sensor is in a low, but non-critical, and going lower state - Asserted
2409/17/2017 17:36:36System Fan 3Fanreports the sensor is in a low, critical, and going lower state - Deasserted
2309/17/2017 17:36:36System Fan 3Fanreports the sensor is in a low, but non-critical, and going lower state - Deasserted
2209/17/2017 17:36:34System Fan 3Fanreports the sensor is in a low, critical, and going lower state - Deasserted
2109/17/2017 17:36:34System Fan 3Fanreports the sensor is in a low, but non-critical, and going lower state - Deasserted
2009/17/2017 17:36:31Pwr Unit StatusPower Unitreports the power unit is powered off or being powered down - Deasserted
1909/17/2017 17:36:26Pwr Unit StatusPower Unitreports the power unit has suffered a failure - Deasserted
1809/17/2017 17:36:20Pwr Unit StatusPower Unitreports the power unit is powered off or being powered down - Asserted
1709/17/2017 17:36:20Pwr Unit StatusPower Unitreports the power unit has suffered a failure - Asserted

 

The power unit Failure detected it is not the main cause, since I have replaced the power source and the problem remains.

All the sensors, Fans etc are OK. There is no problem with them, but the LED fault is blinking amber, with no change.

 

No errors reported on the BIOS, only in SEL.

 

I have downloaded the debug logs, but I could not check it because it is password protected.

 

 

The board information is as follows:

 

Manufacturing Date :2012-10-09   03:53
Manufacturer :Intel Corporation
Product Name :S2600CO
Serial Number:QSCO22700376
Part/Model Number :G29920-205
FRU File ID :FRU Ver 1.00

If anyone could help me, it would be great.

 

Best regards,


Viewing all articles
Browse latest Browse all 3923

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>