Hi all,
I have been having problems in a PLC program (here:
PLC assertion crash).
The calls were generated in mb_recv_data() in rt_io_m_mb_tcp_slave.c, while processing the timeouts. Luckly that modbus code in proview is familiar to me
.
Investigating a bit more with gdb, I have found something that I think is a reentrancy/concurrency problem. The static declaration of time and t variables in time_FloatToD() function is not thread-safe, that memory is shared between threads, which can call the function "at the same time"...
I think that is detectable now because I've some problems in network conmmunications, so sometimes all threads in the PLC are delayed at exactly the same time. And I've 13 threads currently.
Using gdb I saw that the pointer t, which is usually pointing to dt, changes internally (sometimes...) when stepping the code in gdb:
Breakpoint 1, mb_recv_data (local=0xa8803558, rp=0xa88006d8, sp=0xb2be8388) at ../rt_io_m_mb_tcp_slave.c:216
216 ../rt_io_m_mb_tcp_slave.c: No existe el archivo o el directorio.
< ....... >
(gdb) step
[Cambiando a Thread 0xaeb0ab40 (LWP 3844)] -----> [Changing to Thread 0xaeb0ab40 (LWP 3844)]
Breakpoint 1, mb_recv_data (local=0xa8707a90, rp=0xa87006d8, sp=0xb2bde658) at ../rt_io_m_mb_tcp_slave.c:216
216 in ../rt_io_m_mb_tcp_slave.c
< ........ >
(gdb) p max_timeout
$56 = 2.14199996
(gdb) step
220 in ../rt_io_m_mb_tcp_slave.c
(gdb) step
time_FloatToD (dt=0xaeb09dec, f=2.14199996) at ../../co_time.c:928
928 ../../co_time.c: No existe el archivo o el directorio.
(gdb) p *dt
$57 = {tv_sec = 595337119464038556, tv_nsec = -5859009220405757068}
< ... steping and repeated prints .... >
(gdb) p *t
$61 = {tv_sec = 0, tv_nsec = 999995887}
(gdb) p *dt
$62 = {tv_sec = 2, tv_nsec = -5859009220405757068}
(gdb) p t
$63 = (pwr_tDeltaTime *) 0xb3aeae98
(gdb) p dt
$64 = (pwr_tDeltaTime *) 0xaeb09dec
(gdb) p f
$66 = 2.14199996
(gdb) up
#1 0x0839845c in mb_recv_data (local=0xa8707a90, rp=0xa87006d8, sp=0xb2bde658) at ../rt_io_m_mb_tcp_slave.c:220
220 ../rt_io_m_mb_tcp_slave.c: No existe el archivo o el directorio.
(gdb) p max_dt
$67 = {tv_sec = 2, tv_nsec = -5859009220405757068}
After that, in the call to time_Adiff() in line 222 file rt_io_m_mb_tcp_slave.c the assert fails, because the value of tv_nsec.
It's the example in this website:
Tamtrajnana
I think that's because if we are using t in one thread, and another one starts time_FloatToD() function, t is redirected to
time at the begining dt from other thread, so the value returned by time_FloatToD() in the first thread has no sense, because part could be writen in the
time static variable other thread memory...
. Usually, I get the tv_sec correct, but some garbage in tv_nsec.
Just to test, I use mutex surrounding the code in the function, and problem is solved. In the website above, they use __thread declaration. I'm not sure if this could be safe to the proview intentions of that statics variables... so I'm not sure wich is the correct way to solve it, or even if what I've found is really a bug or if I'm missing something.
Sugestions??
Alfonso Abella,