Welcome, Guest
Username: Password: Remember me
  • Page:
  • 1

TOPIC: PLC assertion crash: time_FloatToD() in co_time.c

PLC assertion crash: time_FloatToD() in co_time.c 9 years 7 months ago #6706

  • abella
  • abella's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 42
  • Karma: 0
Hi all,

I have been having problems in a PLC program (here: PLC assertion crash).
The calls were generated in mb_recv_data() in rt_io_m_mb_tcp_slave.c, while processing the timeouts. Luckly that modbus code in proview is familiar to me :laugh: .

Investigating a bit more with gdb, I have found something that I think is a reentrancy/concurrency problem. The static declaration of time and t variables in time_FloatToD() function is not thread-safe, that memory is shared between threads, which can call the function "at the same time"... :S

I think that is detectable now because I've some problems in network conmmunications, so sometimes all threads in the PLC are delayed at exactly the same time. And I've 13 threads currently.

Using gdb I saw that the pointer t, which is usually pointing to dt, changes internally (sometimes...) when stepping the code in gdb:
Breakpoint 1, mb_recv_data (local=0xa8803558, rp=0xa88006d8, sp=0xb2be8388) at ../rt_io_m_mb_tcp_slave.c:216
216     ../rt_io_m_mb_tcp_slave.c: No existe el archivo o el directorio.

< ....... >

(gdb) step
[Cambiando a Thread 0xaeb0ab40 (LWP 3844)]   -----> [Changing to Thread 0xaeb0ab40 (LWP 3844)]

Breakpoint 1, mb_recv_data (local=0xa8707a90, rp=0xa87006d8, sp=0xb2bde658) at ../rt_io_m_mb_tcp_slave.c:216
216     in ../rt_io_m_mb_tcp_slave.c

< ........ >

(gdb) p max_timeout
$56 = 2.14199996
(gdb) step
220     in ../rt_io_m_mb_tcp_slave.c
(gdb) step
time_FloatToD (dt=0xaeb09dec, f=2.14199996) at ../../co_time.c:928
928     ../../co_time.c: No existe el archivo o el directorio.
(gdb) p *dt
$57 = {tv_sec = 595337119464038556, tv_nsec = -5859009220405757068}

< ... steping and repeated prints .... >

(gdb) p *t
$61 = {tv_sec = 0, tv_nsec = 999995887}
(gdb) p *dt
$62 = {tv_sec = 2, tv_nsec = -5859009220405757068}
(gdb) p t
$63 = (pwr_tDeltaTime *) 0xb3aeae98
(gdb) p dt
$64 = (pwr_tDeltaTime *) 0xaeb09dec

(gdb) p f
$66 = 2.14199996
(gdb) up
#1  0x0839845c in mb_recv_data (local=0xa8707a90, rp=0xa87006d8, sp=0xb2bde658) at ../rt_io_m_mb_tcp_slave.c:220
220     ../rt_io_m_mb_tcp_slave.c: No existe el archivo o el directorio.
(gdb) p max_dt
$67 = {tv_sec = 2, tv_nsec = -5859009220405757068}

After that, in the call to time_Adiff() in line 222 file rt_io_m_mb_tcp_slave.c the assert fails, because the value of tv_nsec.

It's the example in this website: Tamtrajnana
I think that's because if we are using t in one thread, and another one starts time_FloatToD() function, t is redirected to time at the begining dt from other thread, so the value returned by time_FloatToD() in the first thread has no sense, because part could be writen in the time static variable other thread memory... :pinch:. Usually, I get the tv_sec correct, but some garbage in tv_nsec.

Just to test, I use mutex surrounding the code in the function, and problem is solved. In the website above, they use __thread declaration. I'm not sure if this could be safe to the proview intentions of that statics variables... so I'm not sure wich is the correct way to solve it, or even if what I've found is really a bug or if I'm missing something.

Sugestions?? :dry:

Alfonso Abella,
Last Edit: 9 years 7 months ago by abella. Reason: Some errors explaining the bug. It's no pointing to time, it's pointing to other threads dt.
The administrator has disabled public write access.

PLC assertion crash: time_FloatToD() in co_time.c 9 years 7 months ago #6708

  • claes
  • claes's Avatar
  • OFFLINE
  • Platinum Boarder
  • Posts: 3170
  • Thank you received: 497
  • Karma: 133
Hi Alfonso,

I agree with you, time_FloatToD() doesn't seem to be thread safe, and a mutex will solve the problem. But I think there is an easier way to solve it. If you look at the code of time_FloatToD
pwr_tDeltaTime *
time_FloatToD (
  pwr_tDeltaTime  *dt,
  pwr_tFloat32    f
)
{
  static pwr_tDeltaTime time;
  static pwr_tDeltaTime *t = &time;

  if (dt != NULL) t = dt;

  t->tv_sec  = f;
  t->tv_nsec = (f - t->tv_sec) * 1e9;

  return t;
}

there is actually no reason that t should be static and common for all calls. By just removing static it will be dynamically allocated when the function is called and unique for each call. If dt is NULL, it's still not thread safe and this usage should be restricted to single thread usage. The correct appearance should then be
pwr_tDeltaTime *
time_FloatToD (
  pwr_tDeltaTime  *dt,
  pwr_tFloat32    f
)
{
  static pwr_tDeltaTime time;
  pwr_tDeltaTime *t = &time;

  if (dt != NULL) t = dt;

  t->tv_sec  = f;
  t->tv_nsec = (f - t->tv_sec) * 1e9;

  return t;
}

/Claes
The administrator has disabled public write access.
The following user(s) said Thank You: abella

PLC assertion crash: time_FloatToD() in co_time.c 9 years 7 months ago #6709

  • abella
  • abella's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 42
  • Karma: 0
Hi Claes,

If the static declaration could be removed, I think is the best solution too. I don't know the proview internals so I wasn't sure if I could remove the static. The mutex was just to test, it's a bit overkill.

It's quite obvious, but I'm going to test the code without the static, just in case.


Thank you very much!


Alfonso Abella
The administrator has disabled public write access.
The following user(s) said Thank You: marc

PLC assertion crash: time_FloatToD() in co_time.c 9 years 7 months ago #6726

  • marc
  • marc's Avatar
  • OFFLINE
  • Platinum Boarder
  • Posts: 710
  • Thank you received: 70
  • Karma: 24
Hi Abella,

Good hunting and thanks for your feedback.

/Marc
Please, use the Wiki if you succeeded your project or solved your problem. Share your work, so we can learn from each other.
The administrator has disabled public write access.

PLC assertion crash: time_FloatToD() in co_time.c 9 years 7 months ago #6735

  • abella
  • abella's Avatar
  • OFFLINE
  • Senior Boarder
  • Posts: 42
  • Karma: 0
Hi,

Removing static as Claes sugested solved the problem.

Thanks!


Alfonso Abella
The administrator has disabled public write access.
  • Page:
  • 1
Time to create page: 8.761 seconds