Hello team, I will need some help here. We have 1 client who made a project with 2 main servers + 42 workstations + 1 dev server + 1 reporting server (total 46 servers). The setup is something like this
- Reporting server is connected only to server 1 and server2
- Server1 and server 2 are connected to each of the workstation
- Each workstation does not connect to each other, so they only connect to server 1 and server 2,and also dev server
- Each workstation can query realtime and historical data of the other workstation, through realtime association and historical association that server1 and server 2 is part of. To achieve this, we have modified individual station.dat on each machine, so each station have slightly different file.
- Dev server is connected to all station, but only turned on when needed
Server01 and Server02 is running on Windows Server 2016, so we are sure that it should not encounter TCP/IP limit, which is usually 20 for Windows 10 platform. Now we have a problem, whereby from Server 1, I can see constant connection and disconnection to all the workstations. Sample error from server01 to 1 workstation as below. It happens to all workstation. You can see that every 40s it will get disconnected, and then after 30s it will reconnect again. I don't see timeout on watchdog message in the server.
2020/04/16,16:39:38.909,3,W,,3074,ADMINISTRATOR,1,Server->client connection abort, SERVER010S -> WORKSTN330C 2020/04/16,16:39:38.909,3,I,,3188,ADMINISTRATOR,1,WinSock error = 10054 WSAECONNRESET 2020/04/16,16:39:38.909,3,I,,3240,ADMINISTRATOR,1,Connection reset by peer 2020/04/16,16:40:08.913,3,I,,3070,ADMINISTRATOR,1,Server->client connection OK, SERVER010S -> WORKSTN330C 2020/04/16,16:40:08.959,3,I,,9008,ADMINISTRATOR,1,Networking message version of station WORKSTN33 = 120001 2020/04/16,16:40:48.965,3,W,,3074,ADMINISTRATOR,1,Server->client connection abort, SERVER010S -> WORKSTN330C 2020/04/16,16:40:48.965,3,I,,3188,ADMINISTRATOR,1,WinSock error = 10054 WSAECONNRESET 2020/04/16,16:40:48.965,3,I,,3240,ADMINISTRATOR,1,Connection reset by peer 2020/04/16,16:41:18.953,3,I,,3070,ADMINISTRATOR,1,Server->client connection OK, SERVER010S -> WORKSTN330C 2020/04/16,16:41:19.000,3,I,,9008,ADMINISTRATOR,1,Networking message version of station WORKSTN33 = 120001 2020/04/16,16:41:59.005,3,W,,3074,ADMINISTRATOR,1,Server->client connection abort, SERVER010S -> WORKSTN330C 2020/04/16,16:41:59.005,3,I,,3188,ADMINISTRATOR,1,WinSock error = 10054 WSAECONNRESET 2020/04/16,16:41:59.005,3,I,,3240,ADMINISTRATOR,1,Connection reset by peer 2020/04/16,16:42:29.009,3,I,,3070,ADMINISTRATOR,1,Server->client connection OK, SERVER010S -> WORKSTN330C 2020/04/16,16:42:29.056,3,I,,9008,ADMINISTRATOR,1,Networking message version of station WORKSTN33 = 120001 2020/04/16,16:43:09.061,3,W,,3074,ADMINISTRATOR,1,Server->client connection abort, SERVER010S -> WORKSTN330C 2020/04/16,16:43:09.061,3,I,,3188,ADMINISTRATOR,1,WinSock error = 10054 WSAECONNRESET 2020/04/16,16:43:09.061,3,I,,3240,ADMINISTRATOR,1,Connection reset by peer 2020/04/16,16:43:39.049,3,I,,3070,ADMINISTRATOR,1,Server->client connection OK, SERVER010S -> WORKSTN330C 2020/04/16,16:43:39.096,3,I,,9008,ADMINISTRATOR,1,Networking message version of station WORKSTN33 = 120001
I have refered to article on KB article KB1025KB and used nmap tool to check if there is any issue with destination server and also port, but all looks good. I also found something that I feel is not right from the audit, whereby I can see the amount of pending is very high and NOT reducing. I assume this is what causes the constant connection and disconnection, but anybody know how to fix this?
Your help is much appreciated. I can provide more information if you request. Also, is there anything else that I should check? Thanks, Kantha
Did you check pot status on both side? Installing nmap on server side and then on client side?
I don't think it can be related to but try to increase the flow regulation parameters for multistation.
Here are the default values:
Multply them by 10 and it should solve the issue regarding pending messages in LAN !
Hi Ludo,
Thanks for your reply. I have checked manually port is open on both server 1 and server 2. Also, we did nmap test from both server 1 and server 2 to all workstations. So we did not install nmap tool in any workstation. But yea, I will try this tomorrow just in case.
As for the regulation parameter, thanks for the tips, I will try this also tomorrow and will update if it solves the problem. My question, do I need to update value for Continuation (Variables) portion as well?
Thanks,
Kantha
Yes it is a good idea to multiply them by 10 too.
For me it is almost sure that client firewall prevent the use of TCP port 1980. At least on WORKSTN33 regarding the traces you sent.
Hmmm I am not really sure Ludo.
If the firewall blocks the port then we should never see the Connection OK at all isnt?
Nico
I don't remember well my previous tests. Even if the traces are similar as in KB1036 they are not the same, so I am afraid you are right.
We will wait for Kantha's update to better understand the issue. Please Kantha put your Log Files folder in attachments (both side, server and client).
Hello,
You can see the state of the connection with Netstat -ao DOS command.
I think also it a firewall problem, the reconnection period is by default set to 30s and the timeout is to 10s.
The main probem is, I guess, actually the message "Server->client connection OK" which should be "Server->client connection pending".
Thank you all for your answers.
Actually the issue is most probably coming from a discrepancy between the station.dat in Server and Clients. A clen-up will be done on site and we will see...
Nico
Hello!
I want to share a recent experience with new "smart" external firewalls. It happened twice, with two different customers using Fortinet and Checkpoint firewalls to separate PcVue Server and PcVue Client network - that an option like "SPI - Stateful Packet Inspection" was enabled by default inside that machines.
Result was intermittent connection/disconnection between PcVue stations, because that kind of firewalls can "read" the traffic and block packets that they think suspect.
I don't know if this is the case, if all PcVue stations in this case are on the same IP network/class maybe not but... at least I shared this info with you hoping it helps: in case of doubt and in presence of a firewall, ask your customer IT to double check.
Hi Filippo,
Good advise, thank you.
Finally we fixed the issue by cleaning-up station.dat files on all stations (42 actually!)
Nico
Thank you everyone for the update and sorry for the lack of updates from my end. As Nico said, the issue was due to discrepency in the station.dat. After we redo this, the problem has been solved. I appreciate all your feedbacks.
Thanks,
Kantha


