When the Projector.unprojectVector() function is called, it treats the vec3 as a position. As part of the process, the vector is translated, hence why we use .sub(camera.position) on it. After this translation, normalization is needed.
In this post, I will be adding some graphics to help illustrate these concepts. For now, let's delve into the geometry behind these operations.
Imagine the camera as a pyramid in terms of its geometric representation. This pyramid can be defined by 6 planes - left, right, top, bottom, near, and far (with near being the plane closest to the tip).
If we were visually observing these operations in a 3D environment, we would see the pyramid positioned arbitrarily with a rotation in space. Let's consider the origin of this pyramid at its tip, with its negative z-axis pointing towards the bottom.
Anything contained within these 6 planes will ultimately be rendered on our screen through the application of various matrix transformations. In OpenGL, the sequence typically looks like:
NDC_or_homogenous_coordinates = projectionMatrix * viewMatrix * modelMatrix * position.xyzw;
This series of transformations takes an object from its object space to world space, then to camera space before projecting it using a perspective projection matrix which condenses everything into a small cube within NDC ranges of -1 to 1.
Object space refers to a set of xyz coordinates where something is generated procedurally or modeled in 3D by an artist following symmetry, aligning neatly with the coordinate system. On the other hand, architectural models from programs like REVIT or AutoCAD might differ. An objectMatrix could come between the model matrix and view matrix to handle specific adjustments beforehand.
Considering our flat 2D screen but envisioning depth similar to the NDC cube, we adjust for aspect ratio based on the screen height. This scaling ensures proper alignment with x coordinates on the screen.
Now back into the 3D realm...
Suppose we are surrounded by a 3D scene featuring the pyramid structure. If we isolate the surroundings and place the pyramid at the origin while positioning its bottom along the -z axis, the resulting transformation can be represented as:
viewMatrix * modelMatrix * position.xyzw
Multiplying this with the projection matrix expands the single point at the tip into a square via adjustments in the x and y axes, essentially transforming the pyramid into a box.
During this transformation, the box scales to fit ranges of -1 and 1 providing perspective projection, turning the pyramid into a rectangular volume.
In this virtual space, we manage a 2D mouse event that exists on our 3D screen, adhering to the boundaries of the NDC cube. Given its two-dimensional nature, we only have information about X and Y, necessitating the use of ray casting for determining Z coordinates.
As rays are cast, they extend through the cube perpendicular to one of its sides aiming to intersect objects in the scene. To enable computations, these rays must be transformed from NDC space to world space for further analysis.
A ray, unlike a simple vector, represents an infinite line with direction running through a specific point in space. The Raycaster handles this setup efficiently.
By retracing steps back to the pyramid-box analogy, squeezing components back into the original pyramid produces rays originating from the tip traversing down towards intersections within predetermined ranges.
The associated method effectively transforms directions with normalized vectors facilitating computations.
Throughout these processes, consistency is upheld courtesy of the NDC cube properties, mapping near and far points onto -1 and 1 ranges.
Unpacking this, when firing a ray at coordinates [mouse.x | mouse.y | someZpositive], aiming from (0,0,someZpositive), induces reflection rooted in the camera's world matrix alignments.
For unprojection purposes, reversing the procedure converts this infinite line into a tangible position resembling a physical entity. By accounting for the camera's translations and rotations, positions get recalibrated accordingly after subtracting the camera's specified location.