Welcome, Guest
Username: Password: Remember me

TOPIC: redundant node crash after close runtime navigator

redundant node crash after close runtime navigator 1 month 1 week ago #12242

  • jds
  • jds's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 120
  • Thank you received: 2
  • Karma: 0
Hello,

in following what thought to be an user error (cfr. help topic) this is what happens (reproducable):
the following setup consist of 4 nodes, all running debian 12 and proview 6.1.1. The system has 2 redundant nodes ,node1/node2 where node1 is the primairy node. Those nodes run a simple PLC program at 20ms (one pulstrain, 2 analog values read and putted out, all values are written in the sql table). The kernel on those nodes it the preeamp kernel. The 3rd node is a sevserver (nodesql), the 4rd an op station with one xttgraph (2 buttons). At last there's an eng station on Ubuntu (22.04).
Node1 and the eng. station are running on 1 esxi server, node2 and the sevserver on a second esxi (both siemens ipc, esxi 7.03). the op station is a small seperated PC and the I/O connected is an et200 island. All is connected at the same 1Gb switch.
Every handling is done on the eng. station. The first startup of the system (boor sequence: node1, node2, nodesql, op station) every node turns green in the supervision center. Redundant mode works perfectly.
When rebooting node 1 it turns back to green, when rebooting node 2 it stays yellow stating "starting up server", but everything keeps working perfectly.

Now the problem: when opening and closing runtime navigator, everything works fine, when opening runtime navigator on node2 it works also fine (green/yellow state doesnt matter), but when closing the navigator, node2 crashes every time!
Second test because this was happening when proview was started through init.d: start node 2 with rt_ini -i to see if errors occure. The runtime monitor can be opened and closed without crashing node2, but less services are started (even redcomlink gives problems). The node starts in passive mode and can be activated, but not deactivated anymore. It gets even worst, when quiting the node (ctrl 'C' while running rt_ini) node1 crashes too?!?

As you can see, some very strange behaivior is going on! In the next few days I'm going to try if reinstalling the second esxi (node2/nodesql) helps me out, maybe the whole load is corrupt?!?
The administrator has disabled public write access.

redundant node crash after close runtime navigator 1 month 1 week ago #12243

  • jds
  • jds's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 120
  • Thank you received: 2
  • Karma: 0
Small update: I moved the second node over a new installation on a different esxi server. Unfortunatly the same result :(
Will try out a complete new project to see if this can make a difference using the same setup.
The administrator has disabled public write access.

redundant node crash after close runtime navigator 1 month 1 week ago #12245

  • jds
  • jds's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 120
  • Thank you received: 2
  • Karma: 0
Hello,

I started back from 0 creating a "simpler" setup, so without redundant node: one process node (node1), one sevserver, one op and one eng. station.
- Did the same test: started everything up (all auto start after reboot), all green in the supervision centre. Open the runtime navigator, close it again and the process node exits.
- Started the runtime back again manually (rt_ini -i), same test, but now after closing the navigator the process node keeps working.
- Stopped the runtime and started it back again manually with the init.d script, evverything keeps working.
- Disabled the init.d script and reboor the server, afterwards start manually the init.d script -> no problems
- Re-enabled the init.d script and a reboot -> same problem as first, the node crashes after closing the navigator

With disabling and re-enabling the init.d script a message came across complaining about my locale (never had the message when enabling the init.d script). Could this be the issue causing the crash?

locale_error.jpg


When looking closer when opening the runtime navigator on my eng. station I noticed a same kind of message:

locale_error2.jpg



Before I reinstall everything back again I first like to know if this could have anything to do causing this strange behaivior?
The administrator has disabled public write access.

redundant node crash after close runtime navigator 1 month 1 week ago #12247

  • AutoMate
  • AutoMate's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 159
  • Thank you received: 5
  • Karma: 1
Hi JDS,

I have a similar set-up in the Eastern Time Zone (UTC-5). I have not seen what you are experiencing, but it appears you may have identified a cause? As a test, maybe try setting your machine's time zones to UTC 0 to see if it changes the behavior.

Ron
The administrator has disabled public write access.

redundant node crash after close runtime navigator 1 month 1 week ago #12251

  • jds
  • jds's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 120
  • Thank you received: 2
  • Karma: 0
Hey Ron,
unfortunatly after installing a new develop station with the same locales as all the nodes (the old dev station was running Ubuntu, now everything run debian 12 with the same settings) all the locale errors dissappers but the crash of the nodes still remains.

The processes seems to disappear "magicly" without error notifications (system messages from node1):

procesdissapeared.jpg


Vol 0.1.1.12 is the volume of node1. All the PLC processes are set to keep running so halt is everywhere set to no action. I'm starting to thing the sevserver has something to do with it because all the problems started when adding this one.
Will do some more test, the only problem is that I've got no clue how to debug or analyse the system so it's a bit of a blind spot to me.
The administrator has disabled public write access.

redundant node crash after close runtime navigator 1 month 6 days ago #12254

  • AutoMate
  • AutoMate's Avatar
  • OFFLINE
  • Expert Boarder
  • Posts: 159
  • Thank you received: 5
  • Karma: 1
FYI, I simply remove the Halt objects from the PLC threads.

I was going to mention a Ubuntu Workstation with Debian 12 nodes might cause issues, but not sure why it would? It's probably too late now since you already built a Debian 12 workstation, but out of curiosity, I researched and found these links:

What is the purpose of canberra-gtk-module? Answer: Ubuntu GUI and sound. That seems important...

askubuntu.com/questions/971560/what-is-t...-canberra-gtk-module

askubuntu.com/questions/342202/failed-to...ut-already-installed


I have a few questions.

1) When you say: "Did the same test: started everything up (all auto start after reboot), all green in the supervision centre. Open the runtime navigator, close it again and the process node exits"

Which machine are you opening the runtime navigator, then closing that causes the process node to exit? OP station, controller, other?

2) When nodes disappear, are you getting crash dumps?
3) When you say enable, disable and start init.d script, which one? pwr ?
4) Why are you starting nodes with rt_ini -i instead of the normal command: rt_xtt -q Op & ? I will use rt_ini -i to debug a problem, but not normally.
5) When using the term sevserver, I suppose you are using the latest methods to build and configure this node? The history server software had a major overhaul with wholesale differences from the past.
6) Are you located in North America?


/Ron
Last Edit: 1 month 6 days ago by AutoMate. Reason: More to say...
The administrator has disabled public write access.
Time to create page: 8.919 seconds