|
Welcome to our community |
Read-Only Topic|
Go
![]() |
New
![]() |
Find
![]() |
Notify
![]() |
Tools
![]() |
|
Member |
Hi all, just a really wierd one for you to think about. Sorry - it goes on a bit. We had a power cut at our site (great fun as I was the only server hardware eng on site and about 3000 servers went down simultaneously), one of the ES40's from a cluster was failing to see it's CI cards and so unable to rejoin the cluster (or boot). I was there till the early hours and never really found the problem.
I ordered up a set of CI cards (two cards that work together as one). I swapped them when the new ones arrived, but they still weren't seen. I tried them in a different pair of slots and the system did see them - so I ordered up a PCI backplane. That arrived and I swapped it, put the original CI cards in and they still weren't seen. I tried the new ones and they weren't either. I put them in different slots and they were picked up straight away. So two sets of CI cards, two PCI backplanes, same fault with any combination. It had by this time gone 1:00 am and we (I and the software guy) decided to accept defeat and get the system working. So we put the original cards back in with the original PCI backplane, but put the cards in different slots. The poor software guy then had to spend until 5:00 reconfiguring the software to use the CI cards in the new location. So a workaround rather than a fix, but the problem is, that this is a critical system so I've not been able to go back to it to fix the fault. So now, months later I still don't know what the problem is. Any ideas? Regards, Adam |
||
|
|
Anorak |
Hello,
In order to discover the cause of this problem can you tell me which bus/slots in which the CI cards were in originally and where they are now. Thanks So long and thanks for all the fish |
|||
|
|
Member |
Hi, I've been thinking about this since I did the course (last week) and thought I knew what it was. The PChips are on the system board in the ES40, so if one of the PChips was faulty and I moved them to slots belonging to the other one, that would explain it. However, I went to check which slots I'd moved them from and to and they came from the 7th and 8th down and I moved them to 5th and 6th down. I've checked the book and these slots are all on the same hose. So I'm stumped again (it doesn't take much)!
Regards, Adam |
|||
|
|
Anorak |
Hi Adam,
It looks like you have managed to stump us with this one too. We agree the working slots share the same Pchip. There are two possibilities, 1: the Pchip cannot, for some reason identify any device in the slot to which the CIPCI adapter uses to communicates with the PCI bus, each slot is probed by SRM at initialization time, SRM waits for an interrupt from a device after bus reset then walks down the bus slot by slot configuring adapters as they are found. 2: SRM may have become corrupt during the power failure, SRM lives in Flash memory but is not write protected. In any case moving the adapter to another slot negates reason 2. I give in. Regards Paul So long and thanks for all the fish |
|||
|
|
Member |
OK. Thanks anyway. By pure coincidence, I was working on this same machine again today - another power cut believe it or not! It had a dead PSU, a dead CPU and the SRM was corrupt. I used the FSL to start it and updated the firmware, so that may have corrected the old fault too!??! Still, I wasn't brave enough to move the CI cards. :-s
Regards, Adam |
|||
|
|
Member |
I've had a fare share of ES40 problems concerning CPU and Memory trouble. After powerdowns (accidental or ment to) the ES40 (ES45 as well but less) often has a problem with his contacts. On one system equipped with 8 Gb, after a powerdown only 2 Gb came back online....plus I missed one CPU. After cleaning all contacts (CPU board Memory board, DIMM's) with denaturalized alcohol (sorry P&S, not drinkable) everything was back online. I've had several cases with similar problems. All of wich were solved after cleaning contacts.
Logical thinking concludes to oxydation & cooling down (contracting) of CI's? Regards, Bert. "The reason for living is life itself. So live your life to the fullest and at the end you'll be able to say: My life ment something!!!" |
|||
|
|
Member |
Thanks - yes, I've very often fixed ES40s and other alphas with just a clean of the conacts and reseat on MMBs DIMMs and CPUs. In this case though, swapping the boards for known good ones didn't do any good, but they did work in a different slot so I don't think that would help in this case. Thanks for the reply though.
Regards, Adam |
|||
|
| Powered by Eve Community |
| Please Wait. Your request is being processed... |
Read-Only Topic
