Hello Experts,
We had a situation where our PE2 CI server got hung. Server team
rebooted the server ( CI went down). There are 40 APP servers attached
to this CI. We want to know why all 40 APP servers went down when CI
was down. Ideally that should not be the case.
Can you help us analyse this?
dev_disp file on one of the app server shows below error message:
Sun Feb 16 19:26:50 2014
*** ERROR => DpEnvCheck: Waiting for answer from msg server since 300 secs [dpxxdisp.c 8362]
*** ERROR => DpEnvCheck: Connection to msg server will be closed [dpxxdisp.c 8364]
***LOG Q0M=> DpMsDetach, ms_detach () [dpxxdisp.c 13051]
MBUF state OFF
MBUF component DOWN
Sun Feb 16 19:30:53 2014
***LOG Q0I=> NiIRead: P=10.197.4.52:3910; L=10.197.5.9:60032: recv (104: Connection reset by peer) [nixxi.cpp 5087]
*** ERROR => NiIRead: SiRecv failed for hdl 18/sock 11
(SI_ECONN_BROKEN/104; I4; ST; P=10.197.4.52:3910; L=10.197.5.9:60032) [nixxi.cpp 5087]
*** ERROR => MsINiRead: NiBufReceive failed (NIECONN_BROKEN) [msxxi.c 2829]
*** ERROR => MsIReadFromHdl: NiRead (rc=NIECONN_BROKEN) [msxxi.c 1867]
***LOG Q1K=> MsIAttachEx: StoC check failed, Kernel not compatible with system (rc=-100) [msxxi.c 820]
*** ERROR => Kernel incompatible to already connected instances (see dev_ms for details) [dpxxdisp.c 12599]
DpHalt: shutdown server >pe2app01_PE2_10 < (normal)
DpHalt: stop work processes
Sun Feb 16 19:22:22 2014
*** WARNING => DpEnvCheck: no answer from msg server since 34 secs, but dp_ms_keepalive_timeout(300 secs) not reached [dpxxdisp.c 8383]
Did this happen due to reboot of the server?
A quick response would be much appreciated.
Best Regards
Sachin Bhatt