Our long-standing THREE.js application has been running 24/7 but is facing crashes after a few days of continuous use. I have conducted stress tests that mimic user interactions, running in a loop until a WebGL_Context_Lost
event occurs, indicating a possible GPU process crash.
Despite my thorough examination using the Chrome Dev Tools Heap profiler, no memory leaks were found in the JavaScript heap after multiple tests simulating user interactions.
The screenshots provided show only system objects remaining, with no significant increase leading up to the crashes, suggesting that there might not be a memory leak causing the issue.
I have observed both JavaScript and GPU memory usage increasing in the Chrome task manager, stabilizing over time due to probable garbage collection delays caused by frequent operations.
System Details: Chrome version 65-66, Windows 10, THREE.js r91
Questions:
Can there be a scenario where the JavaScript heap is free from leaks, but the GPU experiences memory leaks?
What tools are available to identify GPU memory leaks?
Is it possible to determine the exact cause of a WebGL_context_lost event, perhaps through Chrome logs?
Has anyone encountered a similar issue before?
Any suggestions or ideas?
Thank you in advance for any assistance provided.
UPDATE:
The simulation was run at 30-minute intervals, capturing heap snapshots along with Chrome task manager screenshots. As far as I know, capturing heap snapshots also triggers garbage collection.
5:00 - Initial Snapshot from Home Screen
https://i.sstatic.net/STPcK.png
5:30
https://i.sstatic.net/sFJYC.png
6:00
https://i.sstatic.net/p1wxe.png
6:30
https://i.sstatic.net/YePFO.png
7ish
https://i.sstatic.net/C8od5.png
8PM
https://i.sstatic.net/1bknO.png
An interesting observation was made when manual garbage collection did not reduce GPU memory consumption until tab switching occurred.
https://i.sstatic.net/Qhaia.png
This raises speculation about how Chrome handles GPU object disposal, potentially causing memory pressure on the machine leading to eventual crashes.
Note: Tests conducted on Intel i5 with Iris Graphics 540 running latest drivers (ver. 23.20.16.4973) and Iris 640 with updated drivers.
A comparison between heap snapshots captured at 7:30 and 5:30 can be viewed here:
https://i.sstatic.net/aM0Wz.png
UPDATE 2 - leaning towards driver issues
Upon page reload, the GPU crashed within 2 minutes with a "Rats, WebGL hit a Snag" error message, indicating a sudden failure rather than a gradual memory leak.
Windows System logs show warnings of graphics driver failures occurring simultaneously with the GPU crashes.
https://i.sstatic.net/AAy76.png
Timestamp of WebGL Context lost error in Chrome: 10:07:52.938PM
Timestamp of Windows System log driver issue (rounded up): 10:07:53PM
1. Can we attribute this issue to a driver problem?
2. Did Chrome initiate the GPU process termination due to the misbehaving driver, or vice versa?
The current driver is updated via Windows Update; I plan to uninstall and reinstall using the latest Intel driver for further testing.