I am currently experimenting with a new technique, and based on my understanding, the most fundamental approach involves:
- Requesting permission to use the user's webcam to capture a video.
- Upon receiving permission, creating a canvas to display the video feed.
- Applying a black and white filter to the video stream.
- Adding control points within the canvas area (a defined region where pixel colors are recorded).
- Implementing a function for each frame to detect left-right gestures for demonstration purposes.
During each frame:
- If it is the first frame (F0), continue as usual.
- If not: Compare the current frame's pixels (Fn) with the previous frame's pixels.
- If there was no movement between Fn and F(n-1), all pixels will be black.
- If movement is detected, visualize the difference Delta = Fn - F(n-1) as white pixels.
- Evaluate the lit-up areas of the control points and record them ( ** )x = DeltaN
Repeat these operations until you have multiple Delta variables. By subtracting the control points at different time intervals, you will obtain a vector:
- ( **)x = DeltaN
- ( ** )x = Delta(N-1)
- ( +2 )x = DeltaN - Delta(N-1)
You can then test if the vector is positive or negative, or compare its values to a predefined threshold:
if positive on x and value > 5
If the conditions are met, trigger an event and listen for it:
$(document).trigger('MyPlugin/MoveLeft', values)
$(document).on('MyPlugin/MoveLeft', doSomething)
To enhance precision, consider caching vectors or aggregating them before triggering an event once they reach a significant value.
You can also anticipate shapes during initial subtractions and attempt to identify a "hand" or "box," monitoring changes in their coordinates. However, note that gestures exist in 3D space while the analysis is performed in 2D, causing shape variations during movement.
For a more detailed explanation, refer to this resource. I hope this clarifies the concept for you.