PcVue failed to store data when has this message

30 Posts
4 Users
0 Likes
146 Views
(@sq.zheng@pcvue.com.cn)
Posts: 0
New Member Guest
 

Hello all,

I got my customer complaint that on the site when HDS displayed "HDS can't treat all the events. Try to reduce the number of information recorded by your application.", then PcVue stopped to store the data to SQL. It happened sometimes and would last several minutes.

As the customer said, the project is for the disaster prevention system for the High speed train, so this kind of issue cannot be acceptable. And the problem should be solved today.:(

Project info:

PcVue 10.0 SP1 16016, 3500 IO around, with several hundreds registers recorded on change while 24 wind-speed registers recorded every seconds with svmgrRefreshData.dll. I have checked the HDS logs, it seems every day, HDS would have this error message for 1-3 times, and the customer said, every time HDS has the error, the 24 wind-speed registers could not be stored correctly.

Did somebody have this kind issue before? See the HDS log bellow:

2013/07/18,12:37:47.752,PYAPPSLAVE,Administrator,HDS,E,0,0,CSchTable::RecordAscii DB_E_INTEGRITYVIOLATION __ Call TreatComError ID = '20;130185958530810000;8;LogList01;1;[NULL];14;4096;8;开关量发送为1;8;PYLine.GW.gridalmConfirm;5;1;1;[NULL];2;192;1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];2;0;2;102;1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];256;2013-07-18T04:37:33.081Z;2;0;8;pygw;2;17'
2013/07/18,23:16:31.518,PYAPPSLAVE,Administrator,HDS,E,0,0,OPCCallback::OnDataChange __ HDS can't treat all the events. Try to reduce the number of information recorded by your application.
2013/07/18,23:16:32.065,PYAPPSLAVE,Administrator,HDS,E,0,0,OPCCallback::OnDataChange __ HDS is no more saturated.
2013/07/18,23:16:32.081,PYAPPSLAVE,Administrator,HDS,E,0,0,OPCCallback::OnDataChange __ HDS can't treat all the events. Try to reduce the number of information recorded by your application.
2013/07/18,23:16:32.502,PYAPPSLAVE,Administrator,HDS,E,0,0,OPCCallback::OnDataChange __ HDS is no more saturated.
2013/07/18,23:16:32.518,PYAPPSLAVE,Administrator,HDS,E,0,0,OPCCallback::OnDataChange __ HDS can't treat all the events. Try to reduce the number of information recorded by your application.
2013/07/18,23:16:33.049,PYAPPSLAVE,Administrator,HDS,E,0,0,OPCCallback::OnDataChange __ HDS is no more saturated.
2013/07/18,23:16:33.065,PYAPPSLAVE,Administrator,HDS,E,0,0,OPCCallback::OnDataChange __ HDS can't treat all the events. Try to reduce the number of information recorded by your application.
2013/07/18,23:16:33.596,PYAPPSLAVE,Administrator,HDS,E,0,0,OPCCallback::OnDataChange __ HDS is no more saturated.
2013/07/18,23:16:33.612,PYAPPSLAVE,Administrator,HDS,E,0,0,OPCCallback::OnDataChange __ HDS can't treat all the events. Try to reduce the number of information recorded by your application.
2013/07/18,23:16:34.034,PYAPPSLAVE,Administrator,HDS,E,0,0,OPCCallback::OnDataChange __ HDS is no more saturated.
2013/07/18,23:16:34.034,PYAPPSLAVE,Administrator,HDS,E,0,0,OPCCallback::OnDataChange __ HDS can't treat all the events. Try to reduce the number of information recorded by your application.
2013/07/18,23:24:36.596,PYAPPSLAVE,Administrator,HDS,E,0,0,OPCCallback::OnDataChange __ HDS is no more saturated.
2013/07/19,07:06:57.956,PYAPPSLAVE,Administrator,HDS,E,0,0,TreatComError __ ComError Source : 'Microsoft Cursor Engine' Description : '无法为更新定位行。一些值可能已在最后一次读取后已更改。' (80040E38)

Thanks a lot!

 
Posted : 19/07/2013 7:03 am
n.kunzer
(@n-kunzerarcinfo-com)
Posts: 1236
Member Moderator
 

You database is saturated.
The best way to troubleshoot is to set the SYSTEM.HDS.xxx variables in Trends (in proprietary Unit of course!).
Then you can follow-up after 1 day what's happening.

The main reason to saturate the HDS is when you have too many event/s.

You must check several things:
- How many trends?
- TRENDGROUP period and deadband?
- Registers Deadband
- For logs how many events/s

 
Posted : 19/07/2013 8:11 am
f.martin
(@f-martinarcinfo-com)
Posts: 148
Member Admin
 

OK, I'll take my breakfast later...

First we assume that the HDS is able to archive up to 200 or 300 records per seconds, depending on the computer characteristics, the projects and external actions. Over this limit, it pushes the datas into temporary files and re-record them when there is less events.
There is a limit for the number of event it can bufferize like that. By default it's 20.000, you can change it, it's the parameter "DefaultMaxBufferedEvents" in the HDS.ini file in the Bin directory. Usually we set it to 200.000. Don't change the other parameters.

But that's not sufficient, it just delay the problem.

As Nicolas said, you've to estimate the amount of records you have and to apply some rules.
for example, is it really necessary to record a value every second if there is a lot of noise on it (eg: 245.456789, then 245.456788, then 245.45679, ....) That's why we suggest to use deadband and period.

About ths error:

2013/07/18,12:37:47.752,PYAPPSLAVE,Administrator,HDS,E,0,0,CSchTable::RecordAscii DB_E_INTEGRITYVIOLATION __ Call TreatComError ID = '20;130185958530810000;8;LogList01;1;[NULL];14;4096;8;开关量发送为1;8;PYLine.GW.gridalmConfirm;5;1;1;[NULL];2;192;1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];2;0;2;102;1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];256;2013-07-18T04:37:33.081Z;2;0;8;pygw;2;17'

I think they decided to store all the attributes even the TextIndAttrxx. If they don't use them, remove them from the confirguration (and the database). Even if they don't consume diske because they are NULL, they are slowering the record process, as PcVue as to determine that it's a NULL value. In a general manner, check all the properties and ask them it's reallys mandatory to archive everything , What do they really need ?

Could you also give us the project architecture , Redundant servers with replication or redundant servers writing in the same database ? This error "RecordAscii DB_E_INTEGRITYVIOLATION" means that the record is already in the database or that somme field values are not correct in the request.

Lastly, what are the computer characterisitcs, the database size, the other tools connected to the database, and the SQL version installed ?
If there is a reporting tool that connects to the db and consume all the ressource, this is another problem. You can do high speed treatments with severeal clients connected and requesting with a SQL Express edition. There are some reason why it's free...

 
Posted : 19/07/2013 10:22 am
(@sq.zheng@pcvue.com.cn)
Posts: 0
New Member Guest
 

You database is saturated.
The best way to troubleshoot is to set the SYSTEM.HDS.xxx variables in Trends (in proprietary Unit of course!).
Then you can follow-up after 1 day what's happening.
【Already added them to the archive. We'll see.】
The main reason to saturate the HDS is when you have too many event/s.

You must check several things:
- How many trends?
【197 registers + 24 wind-speed registers.】
- TRENDGROUP period and deadband?
【500mm, now changed to 1s, but in their project he failed to change to 1 or 2 seconds, PcVue'd display 999mm. For the deadband for archiving, they dont like to use it as they are only too anxious to record data when it changes.】
- Registers Deadband
【No deadband as the PLC/OPC Server side has treated this issue and only provides XX.XX or XX.X to PcVue.】
- For logs how many events/s

【Not so many, because normally there should be alarms as there is no disaster.】

 
Posted : 19/07/2013 10:43 am
n.kunzer
(@n-kunzerarcinfo-com)
Posts: 1236
Member Moderator
 

@Florent: You are very brave to answer before your coffee :ohmy:

@Shu Quan
That's very strange they can saturate the HDS only with about 200 trends!
Let's wait the result of the system variables tracking and you will see after.

About the registers deadband you said the OPC Server provides XX.XX or XX.X
For XX.XX: Are they sure they want recording a datachange from 10.00 to 10.01? Why not but just ask them if they are sure about that.

 
Posted : 19/07/2013 10:54 am
f.martin
(@f-martinarcinfo-com)
Posts: 148
Member Admin
 

I agree with Nicolas, that's a bit strange to saturate with only 200 trends...

Maybe another system is locking/overloading the database several times a day. Then we need to know the other tools that are acessing the database, what for, and what is the SQL version and computer characteristics.

The increase of the DefaultMaxBufferedEvents parameter will be usefull in that situation (PcVue nedds to be restarted)

@Nicolas: before coffe, that's fine... but before my croissant it's harder !

 
Posted : 19/07/2013 11:05 am
(@sq.zheng@pcvue.com.cn)
Posts: 0
New Member Guest
 

OK, I'll take my breakfast later...

【Thanks Florent. I came back from my lunch...enjoy your meal!】

First we assume that the HDS is able to archive up to 200 or 300 records per seconds, depending on the computer characteristics, the projects and external actions. Over this limit, it pushes the datas into temporary files and re-record them when there is less events.
There is a limit for the number of event it can bufferize like that. By default it's 20.000, you can change it, it's the parameter "DefaultMaxBufferedEvents" in the HDS.ini file in the Bin directory. Usually we set it to 200.000. Don't change the other parameters.

But that's not sufficient, it just delay the problem.

【For the moment, we don't change the parameters. As Nico suggested, we'll check the amount of pendingrecords when HDS has this issue.】

As Nicolas said, you've to estimate the amount of records you have and to apply some rules.
for example, is it really necessary to record a value every second if there is a lot of noise on it (eg: 245.456789, then 245.456788, then 245.45679, ....) That's why we suggest to use deadband and period.

【In the project, they used B&R PLC and provides .x or .xx to PcVue via OPC, so they did not used the deadband for the register.】

About ths error:

2013/07/18,12:37:47.752,PYAPPSLAVE,Administrator,HDS,E,0,0,CSchTable::RecordAscii DB_E_INTEGRITYVIOLATION __ Call TreatComError ID = '20;130185958530810000;8;LogList01;1;[NULL];14;4096;8;开关量发送为1;8;PYLine.GW.gridalmConfirm;5;1;1;[NULL];2;192;1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];2;0;2;102;1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];256;2013-07-18T04:37:33.081Z;2;0;8;pygw;2;17'

I think they decided to store all the attributes even the TextIndAttrxx. If they don't use them, remove them from the confirguration (and the database). Even if they don't consume diske because they are NULL, they are slowering the record process, as PcVue as to determine that it's a NULL value. In a general manner, check all the properties and ask them it's reallys mandatory to archive everything , What do they really need ?

【I attached a HDS log, there are not many this kind of null, just on the PcVue startup as they told me. 】

Could you also give us the project architecture , Redundant servers with replication or redundant servers writing in the same database ? This error "RecordAscii DB_E_INTEGRITYVIOLATION" means that the record is already in the database or that somme field values are not correct in the request.

【There are only 1 server(both RT and Historical) + 4/5 clients and the Database server is on another computer. No replication on PcVue side, they use RAID。】

Lastly, what are the computer characterisitcs, the database size, the other tools connected to the database, and the SQL version installed ?

【They used Windows Server 2003, they buy the standard version of SQL, not use Express version as they need to keep data more than 10GB within 3-5 years or more.】

If there is a reporting tool that connects to the db and consume all the ressource, this is another problem. You can do high speed treatments with severeal clients connected and requesting with a SQL Express edition. There are some reason why it's free...

 
Posted : 19/07/2013 11:11 am
(@sq.zheng@pcvue.com.cn)
Posts: 0
New Member Guest
 

About the registers deadband you said the OPC Server provides XX.XX or XX.X
For XX.XX: Are they sure they want recording a datachange from 10.00 to 10.01? Why not but just ask them if they are sure about that.

【Yes, they are sure about that. No deadband needed for the registers and the register archiving.As they said, because they put XX.XX, that means they need to record them when they change.:)】

 
Posted : 19/07/2013 11:15 am
(@sq.zheng@pcvue.com.cn)
Posts: 0
New Member Guest
 

Maybe another system is locking/overloading the database several times a day. Then we need to know the other tools that are acessing the database, what for, and what is the SQL version and computer characteristics.

【But, as I understand, it's HDS says "HDS can't treat all the events. Try to reduce the number of information recorded by your application", not SQL says it cannot treat...】

 
Posted : 19/07/2013 11:18 am
f.martin
(@f-martinarcinfo-com)
Posts: 148
Member Admin
 

It's not because someone says something that he is the responsible of it...
When I say "I don't understand this italian document", is my fault because I don't understand italian, or the fault of the guy who didn't send it in english ?

There can be several reasons why the HDS displays this message:
- too many event per second => that's the configuration that has to be redefined
- SQL server not available because a huge treatment on it => there is a need to improve SQL performances
- Table locked by another treatment => some "bad" reports locks the table so that there is no change in it during the report extraction. But if it takes 1 hour....
- ...

Then we need to know what are the other treaments that are done on SQL Server.

And if I tell you to increase the parameter DefaultMaxBufferedEvents, it's because it's the first thing we do when we meet this problem, and just after we record the SYSTEM.HDS.XXX variables into a proprietary unit.

 
Posted : 19/07/2013 11:32 am
(@sq.zheng@pcvue.com.cn)
Posts: 0
New Member Guest
 

It's not because someone says something that he is the responsible of it...
When I say "I don't understand this italian document", is my fault because I don't understand italian, or the fault of the guy who didn't send it in english ?

【haha...good point! Maybe the guy does not you know English also...haha... B) 】

There can be several reasons why the HDS displays this message:
- too many event per second => that's the configuration that has to be redefined
- SQL server not available because a huge treatment on it => there is a need to improve SQL performances
- Table locked by another treatment => some "bad" reports locks the table so that there is no change in it during the report extraction. But if it takes 1 hour....
- ...

【I'll check these parts...】

Then we need to know what are the other treaments that are done on SQL Server.

And if I tell you to increase the parameter DefaultMaxBufferedEvents, it's because it's the first thing we do when we meet this problem, and just after we record the SYSTEM.HDS.XXX variables into a proprietary unit.

 
Posted : 19/07/2013 11:38 am
(@sq.zheng@pcvue.com.cn)
Posts: 0
New Member Guest
 

update the inf.

1. OS: Windows Server 2003 R2
2. Database: SQL 2008 R2 data center

For the report, there is only a small tool developed by themselves to query the alarms, but at the moment they don't use it often, and the HDS has the problem in the night or dawn.

 
Posted : 19/07/2013 12:12 pm
(@sq.zheng@pcvue.com.cn)
Posts: 0
New Member Guest
 

There is another interesting things, with the svmgrRefreshData.dll, we saw the windspeeds cannot be recorded in some seconds also. It seems that the two records have the same datetime, and then SQL just recorded one of them. See bellow:

2013/07/13,04:30:12.096,PYAPPSLAVE,Administrator,HDS,W,0,0,Receiving point with same datetime on trend PYLine.Station11.windSave.wind_Speed
2013/07/13,04:30:12.112,PYAPPSLAVE,Administrator,HDS,W,0,0,Receiving point with same datetime on trend PYLine.Station11.windSave.wind_Speed1
2013/07/13,04:30:12.127,PYAPPSLAVE,Administrator,HDS,W,0,0,Receiving point with same datetime on trend PYLine.Station11.windSave.wind_Speed2
2013/07/13,04:30:21.862,PYAPPSLAVE,Administrator,HDS,E,0,0,CSchTable::UpdateBatch __ Call TreatComError ID = 'HS2St11Wind1Data1' iTest = 40
2013/07/13,04:30:21.877,PYAPPSLAVE,Administrator,HDS,E,0,0,CSchTable::TraceInvalidRecords __ Unexpected Error ID = 'Error writing information. status : 4097 record : '20;130181346110000000;8;PYLine.Station11.windSave.wind_Speed;5;1.7;2;0;1;[NULL];1;[NULL];1;[NULL];8;风速;8;;8;;1;[NULL];2;1;1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];256;2013-07-12T20:30:11.000Z;2;1;2;2''
2013/07/13,04:30:21.893,PYAPPSLAVE,Administrator,HDS,E,0,0,CSchTable::TraceInvalidRecords __ Unexpected Error ID = 'Error writing information. status : 4097 record : '20;130181346110000000;8;PYLine.Station11.windSave.wind_Speed1;5;2.6;2;0;1;[NULL];1;[NULL];1;[NULL];8;1#风速仪风速;8;;8;;1;[NULL];2;1;1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];256;2013-07-12T20:30:11.000Z;2;1;2;2''
2013/07/13,04:30:21.909,PYAPPSLAVE,Administrator,HDS,E,0,0,CSchTable::TraceInvalidRecords __ Unexpected Error ID = 'Error writing information. status : 4097 record : '20;130181346110000000;8;PYLine.Station11.windSave.wind_Speed2;5;2.2;2;0;1;[NULL];1;[NULL];1;[NULL];8;2#风速仪风速;8;;8;;1;[NULL];2;1;1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];1;[NULL];256;2013-07-12T20:30:11.000Z;2;1;2;2''
2013/07/13,17:25:21.549,PYAPPSLAVE,Administrator,HDS,E,0,0,TreatComError __ ComError Source : 'Microsoft Cursor Engine' Description : '无法为更新定位行。一些值可能已在最后一次读取后已更改。' (80040E38)

2013/07/15,09:36:43.065,PYAPPSLAVE,Administrator,HDS,W,0,0,Receiving point with same datetime on trend PYLine.Station10.windSave.wind_Speed
2013/07/15,09:36:43.081,PYAPPSLAVE,Administrator,HDS,W,0,0,Receiving point with same datetime on trend PYLine.Station10.windSave.wind_Speed1
2013/07/15,09:36:43.096,PYAPPSLAVE,Administrator,HDS,W,0,0,Receiving point with same datetime on trend PYLine.Station10.windSave.wind_Speed2
2013/07/15,09:36:51.502,PYAPPSLAVE,Administrator,HDS,E,0,0,CSchTable::UpdateBatch __ Call TreatComError ID = 'HS2St10Wind1Data1' iTest = 40

2013/07/15,09:38:17.018,PYAPPSLAVE,Administrator,HDS,W,0,0,Receiving point with same datetime on trend PYLine.Station10.windSave.wind_Speed
2013/07/15,09:38:17.034,PYAPPSLAVE,Administrator,HDS,W,0,0,Receiving point with same datetime on trend PYLine.Station10.windSave.wind_Speed1
2013/07/15,09:38:17.049,PYAPPSLAVE,Administrator,HDS,W,0,0,Receiving point with same datetime on trend PYLine.Station10.windSave.wind_Speed2
2013/07/15,09:38:21.534,PYAPPSLAVE,Administrator,HDS,E,0,0,CSchTable::UpdateBatch __ Call TreatComError ID = 'HS2St10Wind1Data1' iTest = 40

I don't know it's the dll generated the two same records or it stored data faster than the data generation. And only for the 3 of 24 wind-speed.

 
Posted : 19/07/2013 12:55 pm
n.kunzer
(@n-kunzerarcinfo-com)
Posts: 1236
Member Moderator
 

Yes may be it is sending 2 times the same points with the same Timestamp.
What they can do to check is to write in Event viewer (using Manager Toolkit function) the Variable,Value,TS for each variable they write.
Of course they must do that only for a while for testing purpose then they must remove it.

A good trick should be to create an internal bit variable GENERAL.DLL.DEBUG
The Dll subscribes this variable and write the trace messages when the bit to 1 in PcVue and stop when the bit is reset.

 
Posted : 19/07/2013 1:43 pm
(@sq.zheng@pcvue.com.cn)
Posts: 0
New Member Guest
 

But, I am afraid that if we can only test it for a while, then we'll never meet this issue, because we don't know when the DLL will send two same points...

It maybe a bug of the dll.

 
Posted : 19/07/2013 1:55 pm