Energy Storage Module Replacement on
Exadata V2 and Exadata Expansion Rack (X2-2) Machine:
Energy Storage Module replacement on
Exadata Machine is a part of Exadata Preventive Maintenance Activity which
should be performed pro-actively and replace the consumable components based on
its lifespan before it is get failed.
Energy Storage Module (ESM) in the PCI
flash cards in the storage servers which protect the DRAM cache in the event of
a power failure. Failure of ESMs will adversely impact performance however
there will be no loss of data or wrong results.
We replaced 40 ESMs on our V2 & X2
machine last week, attaching the pic of ESM:
As per Oracle, we need to replace ESMs once in every 3 years for
V2 machine and once in every 4 years for X2 machines. Preventive Maintenance
Details are as below:
Model
|
Year End
|
||||||
1
|
2
|
3
|
4
|
5
|
6
|
7
|
|
Exadata
V2
|
No
|
No
|
Yes
|
No
|
No
|
Yes
|
No
|
Exadata
X2-2, X2-8, Expansion Rack
|
No
|
No
|
No
|
Yes
|
No
|
No
|
No
|
To
monitor ESMs status, we have couple of options:
è Using ILOM,
ILOM track the lifespan of F20 cards and sends notifies you when it has to be
replaced.
è Using Sun
Flash Accelerator F20 ESM Monitoring Utility, a script which require to be
installed on storage server.
To
verify the ESM lifetime value, use the following command on the storage servers:
for RISER in RISER1/PCIE1
RISER1/PCIE4 RISER2/PCIE2 RISER2/PCIE5; do ipmitool sunoem cli "show
/SYS/MB/$RISER/F20CARD/UPTIME"; done | grep value -A4
If the "value" reported exceeds the "upper_noncritical_threshold" reported, schedule a replacement of the relevant ESM.
To
replace ESMs we have two methods:
Rolling replacement – components are
replaced by taking one server offline at a time while leaving overall system
up.
Full System Downtime – complete system
shutdown and consumable components replaced simultaneously.
As we had to replace the ESMs on 10
storage servers which require lots of maintenance time and downtime so we
planned this on weekend in rolling replacement fashion. Replacing ESMs on V2
system took much more time compare to X2 due to ESMs physical connectivity
inside the server. On X2 system it took
maximum 30 minutes on each server for this activity including server power off
and power on.
How ESM is placed inside the server
(V2):
However, below is the estimated maintenance window timeline given
by Oracle which may vary system to system:
Specification
|
Full
System Downtime
|
Rolling
Method
|
Quarter
Rack
|
2 - 2.5 Hours
|
4 Hours
|
Half
Rack
|
2.5 – 4 Hours
|
10 Hours
|
Full
Rack
|
5 – 8 Hours
|
20 Hours
|
After
replacement verification:
Once ESMs are replaced successfully, we need to make sure that all
the Flash Disks are showing available to the server:
To verify it, please run below command and it should show flashdisks
in normal state:
CellCLI> list lun where disktype=flashdisk
1_0 1_0
normal
1_1 1_1
normal
1_2 1_2
normal
1_3 1_3
normal
2_0 2_0
normal
2_1 2_1
normal
2_2 2_2
normal
2_3 2_3
normal
4_0 4_0
normal
4_1 4_1
normal
4_2 4_2
normal
4_3 4_3
normal
5_0 5_0
normal
5_1 5_1
normal
5_2 5_2
normal
5_3 5_3
normal
Or check with below command:
lsscsi |grep -i marvel
[root@ex01ecel02 sys]# lsscsi |grep -i marvel
[8:0:0:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdn
[8:0:1:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdo
[8:0:2:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdp
[8:0:3:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdq
[9:0:0:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdr
[9:0:1:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sds
[9:0:2:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdt
[9:0:3:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdu
[10:0:0:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdv
[10:0:1:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdw
[10:0:2:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdx
[10:0:3:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdy
[11:0:0:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdz
[11:0:1:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdaa
[11:0:2:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdab
[11:0:3:0] disk ATA
MARVELL SD88SA02 D20Y /dev/sdac
The above command should show 16 flash disks available.
Hope, it helps you to get a clear understanding for ESMs replacement
on Exadata. J
In case of further question, kindly shoot me a mail on mail2saurav.gupt@gmail.com.
Will update on Battery Controller Replacement on Exadata very soon
J
Regards,
Saurabh