GenServer timeout

lud · October 18, 2019, 10:45am

I mean this problem is so obvious

I agree but the problem exists because we use a Registry for our processes. I think that there is no way to ensure that your process cannot stop in between a fetch on the registry and a call ot the process.
I believe you can only mitigate the problem and that people either just go optimistic (ignore the problem because the timeframe is soooo small that it virtually never happens) or just try/catch a no proc error.

Possible solutions

You could create a fetch_pid that executes a lookup in the registry and then ask (call) the process if it is alive (which will refresh the timeout), and finally return the pid. Not very satisfying and adds more load on the system.

If you do not have many users yet you could say it is premature optimisation and just reload your data. Or use an ETS cache without concurrency so the data either exists or not, period. I personally use a queue for that instead of a dedicated process for each entity, but the data is copied back and forth between ETS and the worker.

Another solution which is technically correct but heavy is to use a synchronous registry, and have the registry handle stopping your temporary processes : on timeout, the temporary process sets a flag idle: true, and tells the registry “i am idle” (with a cast). Then the registry will call the process, asking if “still idle ?” and if yes, terminate it. But it is so heavy I would not do that.

Use a pool : instead of temporary processes that handle each entity, have N workers that handle multiplie entities, hash your entity to a worker with erlang:phash2/1 and have the worker manage the cleanup of memory for each entity of its own.

I also have an elixir library called mutex. There is a feature that I would add to it : when a process locks a key that was previously locked, the new process can inherit data from the previous owner. I can add the feature if you want. So you lock the key, inherit the data (or load it from database if it expired), do your work, and release the key with the new data. But again your 100K of data will be copied multiple times in memory.

In the end, a try/catch is the simple thing to do.