Hello guys,
Our customer IREAL had an issue, the end user told IREAL that all clients disconnected with the server at about 2018/05/19 01:30 am, After 4:00 am,they restarted the server, all communication restored.This is what the customer said. We cannot determine whether it is true or not.
Because there's no engineers of IREAL are on-site during the issue happened, so there's no detailed information about what the end used did on the clients. IREAL provided the log files of the server and one of clients. IREAL wants us to give the possible reason of the issue according to the log files. The end user are forcing IREAL to give a explanation for that.
Do you know why?
From the client log files. I can find a message about the clinet disconnected with the server.
2018/05/19,01:23:44.037,3,W,,3072,TX,205,客户机->服务器连接放弃(client->server give up connection), TXCLIENT0C -> SERVER_W0S
2018/05/19,01:23:44.037,3,I,,3188,TX,205,WinSock 错误(error) = 10054 WSAECONNRESET
2018/05/19,01:23:44.037,3,I,,3240,TX,205,远程站切断连接(remote cut down connection)
2018/05/19,01:23:44.037,3,W,,3074,TX,205,服务器->客户机连接异常中止(server->client connection stop), TXCLIENT0S -> SERVER_W0C
2018/05/19,01:23:44.037,3,I,,3188,TX,205,WinSock 错误(error) = 10054 WSAECONNRESET
2018/05/19,01:23:44.037,3,I,,3240,TX,205,远程站切断连接(remote cut down connection)
From server log files,I got the message:
2018/05/19,01:09:38.215,4,I,,4510,GW,2,CW, 状态 6045 - 102 - 12705 - 12703, (1.MODBUS.HLHTQSDZ.HLHTQS)
2018/05/19,01:09:53.269,4,I,,4510,GW,2,CW, 状态 6045 - 102 - 12707 - 12705, (1.MODBUS.HLHTQSDZ.DZINFO)
2018/05/19,01:14:34.519,0,W,,300,GW,2,无来自 UI 管理器的监视器信息响应,已用时间= 60s ( >= 60s )(no response from UI manager)
2018/05/19,01:14:36.906,0,I,,1,GW,2,Create audit counters file on manager watchdog event, file name=C:ARC InformatiquePcVue 11.1BinLog FilesAudit.WatchDog.0001.txt
2018/05/19,01:14:39.293,4,I,,4510,GW,2,CW, 状态 6045 - 102 - 12730 - 12729, (1.MODBUS.HLHTQSDZ.HLHTQS)
2018/05/19,01:14:47.701,0,I,,1,GW,2,Create audit counters file on manager watchdog event, file name=C:ARC InformatiquePcVue 11.1BinLog FilesAudit.WatchDog.0002.txt
2018/05/19,01:14:54.347,4,I,,4510,GW,2,CW, 状态 6045 - 102 - 12731 - 12730, (1.MODBUS.HLHTQSDZ.DZINFO)
2018/05/19,01:14:57.841,0,I,,1,GW,2,Create audit counters file on manager watchdog event, file name=C:ARC InformatiquePcVue 11.1BinLog FilesAudit.WatchDog.0003.txt
2018/05/19,01:15:07.981,0,I,,1,GW,2,Create audit counters file on manager watchdog event, file name=C:ARC InformatiquePcVue 11.1BinLog FilesAudit.WatchDog.0004.txt
2018/05/19,01:15:18.121,0,I,,1,GW,2,Create audit counters file on manager watchdog event, file name=C:ARC InformatiquePcVue 11.1BinLog FilesAudit.WatchDog.0005.txt
2018/05/19,01:15:28.261,0,I,,1,GW,2,Create audit counters file on manager watchdog event, file name=C:ARC InformatiquePcVue 11.1BinLog FilesAudit.WatchDog.0006.txt
2018/05/19,01:15:48.541,0,I,,1,GW,2,Create audit counters file on manager watchdog event, file name=C:ARC InformatiquePcVue 11.1BinLog FilesAudit.WatchDog.0007.txt
2018/05/19,01:16:08.821,0,I,,1,GW,2,Create audit counters file on manager watchdog event, file name=C:ARC InformatiquePcVue 11.1BinLog FilesAudit.WatchDog.0008.txt
2018/05/19,01:16:29.101,0,I,,1,GW,2,Create audit counters file on manager watchdog event, file name=C:ARC InformatiquePcVue 11.1BinLog FilesAudit.WatchDog.0009.txt
2018/05/19,01:16:59.520,0,I,,1,GW,2,Create audit counters file on manager watchdog event, file name=C:ARC InformatiquePcVue 11.1BinLog FilesAudit.WatchDog.0010.txt
2018/05/19,01:17:29.940,0,I,,1,GW,2,Create audit counters file on manager watchdog event, file name=C:ARC InformatiquePcVue 11.1BinLog FilesAudit.WatchDog.0011.txt
2018/05/19,01:18:30.780,0,I,,1,GW,2,Create audit counters file on manager watchdog event, file name=C:ARC InformatiquePcVue 11.1BinLog FilesAudit.WatchDog.0012.txt
2018/05/19,01:19:31.635,0,I,,1,GW,2,Create audit counters file on manager watchdog event, file name=C:ARC InformatiquePcVue 11.1BinLog FilesAudit.WatchDog.0013.txt
2018/05/19,01:19:52.430,4,I,,4510,GW,2,CW, 状态 6045 - 102 - 12740 - 12739, (1.MODBUS.HLHTQSDZ.HLHTQS)
2018/05/19,01:20:07.484,4,I,,4510,GW,2,CW, 状态 6045 - 102 - 12741 - 12740, (1.MODBUS.HLHTQSDZ.DZINFO)
2018/05/19,01:23:31.405,5,W,,5232,GW,2,在模式 XOFF 上设备处理已超过 600 秒 (device processing exceeds 600 seconds on the mode of XOFF)
2018/05/19,01:23:33.979,0,F,,304,GW,2,无来自 UI 管理器的监视器信息响应,已用时间=600s ( >= 600s ),已请求退出(no response from UI manager,running time==600s, request to exit)
The server log files and one client log files are attached.
Hi Mark,
The issue looks 'clear':
These messages explain your problem I guess
2018/05/19,01:14:34.519,0,W,,300,GW,2,无来自 UI 管理器的监视器信息响应,已用时间= 60s ( >= 60s )(no response from UI manager)
.....
.....
018/05/19,01:23:33.979,0,F,,304,GW,2,无来自 UI 管理器的监视器信息响应,已用时间=600s ( >= 600s ),已请求退出(no response from UI manager,running time==600s, request to exit)
So, in 1 word, the server crashed!
Now you must find why UI blocked. A typical case is the Scada Basic or VBA freezes due to the use of en external ressource like a Dll, a DB connection, etc...
You can start your investigation from that ...
Nico
Nico
Thank you.
Dominique told me the problem come from SCADA Basic function, but this customer has many functions and code, and the project has run very well for months. Do you have a good way to locate which one caused the issue?
I told you Mark, you must start your investigation to find what 'external stuff' is used by the scada basic
Nico


