Monday, November 26, 2012

3par V-class arrays, code 37 reset and data corruption

Last week one of the nodes in our 3par v800 reset with a Code 37, and a few seconds before the node reset Oracle started to complain about corrupt blocks. In digging into this issue, it seems that there is a known hardware problem on the V-Class arrays. The issue stems from the PCI-E interface chipset on the system board and fibre channel cards.

We were told that we were the only customer to see data corruption with a Code 37 reset, but your mileage may vary. If you've had similar problems, I'd love to hear about it.

The following output from showeeprom shows a bad and good board:

Node: 5
--------
      Board revision: 0920-200009.A3
            Assembly: SAN 2012/03 Serial 3978
       System serial: 1405629
        BIOS version: 2.9.8
          OS version: 3.1.1.342
        Reset reason: PCI_RESET
           Last boot: 2012-11-17 20:51:43 EST
   Last cluster join: 2012-11-17 20:52:25 EST
          Last panic: 2012-03-23 08:56:46 EDT
  Last panic request: Never
   Error ignore code: 00
         SMI context: 00
       Last HBA mode: 2a100700
          BIOS state: 80 ff 24 27 28 29 2a 2c
           TPD state: 34 40 ff 2a 2c 2e 30 32
Code 27 (Temp/Voltage Failure) - Subcode 0x3 (1)        2012-11-17 20:47:34 EST
Code 31 (GPIO Failure) - Subcode 0x3 (1)                2012-11-17 20:43:45 EST
Code 37 (GEvent Triggered) - Subcode 0x80002001 (0)     2012-11-17 20:41:43 EST
Code 27 (Temp/Voltage Failure) - Subcode 0x3 (1)        2012-04-05 15:33:01 EDT
Code 27 (Temp/Voltage Failure) - Subcode 0x3 (1)        2012-03-26 17:59:21 EDT
Code 38 (Power Supply Failure) - Subcode 0x13 (0)       2012-03-26 17:06:41 EDT

I'm told that boards with revision D2 contain the fixes for the issue:

Node: 0
--------
      Board revision: 0920-200009.D2
            Assembly: SAN 2012/38 Serial 6349
       System serial: 1405629
        BIOS version: 2.9.8
          OS version: 3.1.1.342
        Reset reason: ALIVE_L
           Last boot: 2012-11-23 16:44:12 EST
   Last cluster join: 2012-11-23 16:44:47 EST
          Last panic: 2012-10-23 21:30:25 EDT
  Last panic request: Never
   Error ignore code: 00
         SMI context: 00
       Last HBA mode: 2a100700
          BIOS state: 80 ff 24 27 28 29 2a 2c
           TPD state: 34 40 ff 2a 2c 2e 30 32

1 comment:

  1. Hi friends, This is Chandrika from Chennai. I am a technology freak. I have read your blog and got some valuable information from this blog. Your technical information is really useful for me. Keep update your blog.
    Regards..
    Oracle Training Institutes in Chennai

    ReplyDelete