Procedural grass rendering
Introduction
Detailed vegetation plays an important part in improving immersion in video games. One part of this vegetation is grass. In this blog post, we want to utilize mesh shaders to generate patches of grass on the GPU. To do this, we took inspiration from Jahrmann’s and Wimmer’s 2017 i3D Paper Responsive realtime grass rendering for general 3D scenes who utilize tesselation shaders to subdivide predefined blades of grass. This comes with the benefit that additional detail can be generated without needing to store it explicitly. We take this concept further by using one mesh shader thread group to render a whole patch of grass. While in this post we create a stylized meadow to keep things simple, we are confident that our technique can be applied to more realistic scenes, as well.
From a blade of grass to a meadow
This blog post is structured as follows: First, we explain which parameters are used to represent a blade of grass and how we use Bézier curves to represent our grass. Next, we calculate vertices, their normal and primitives from our Bézier representation of a blade of grass, and illustrate how blades of grass are combined into a patch. We then explain how we write the index and vertex in a for the GPU efficient way. Next, we outline how we reduce the amount of geometry in the distance while keeping the appearance consistent and explain how we simulate the effect of wind on our meadow. We describe how our pixel shader is used to improve the appearance of our grass, and finally, we provide some ideas on how our work could be extended and improved.
Growing one blade of grass
Each blade of grass has a position bladePosition
, direction bladeDirection
and a height bladeHeight
. We use these to calculate the control points P_0, P_1 and P_2 of a quadratic Bézier curve to represent the shape of a blade of grass.
P_0 is simply bladePosition
. P_1 is P_0 translated upwards by bladeHeight
h. To obtain P_2, we translate P_1 by the bladeDirection
vector \vec{d} scaled with bladeHeight
times a leaning factor of grassLeaning
= 0.3. This preserves the shape of the blade of grass, regardless of its bladeHeight
.
To animate the grass we move P_2, which modifies the length of the Bézier curve. To preserve the length of the curve, we use a function to modify P_1 and P_2 to retain the length of the curve, using the function from Jahrmann and Wimmer.
A width w for each control point defines the width of the grass blade. To apply the width, we translate each control point outwards by length w using the perpendicular vector of our bladeDirection
. The new projected control points are called P_0^,P_0^+,P_1^,P_1^+,P_2^ and P_2^+. All positive P^+ form a blade edge, and all negative P^ form the other one. Thus, we now have two Bézier curves representing the edges of the grass blade as can be seen in the following figure:
Bézier to triangles
To create geometry, we evaluate each of our two edge curves n=4 times. Thus we get 4 vertices per blade edge, or 8 vertices in total. A Bézier curve can be evaluated as follows:
Connecting these V=8 vertices results in T=6 triangles.
Assuming a counterclockwise winding order, this results in the following primitives:
Primitive 
0 
1 
2 
3 
4 
5 

i0 
0 
3 
2 
5 
4 
7 
i1 
1 
2 
3 
4 
5 
6 
i2 
2 
1 
4 
3 
6 
5 
To perform shading, we calculate the normal vectors, depending on the derivative of the Bézier curve:
To get the normal vector, we first need to calculate a normalized vector perpendicular to the bladeDirection
. We then calculate the cross product between this sideVec
and the derivative at the current interpolation parameter t
.
Combining grass blades to a grass patch
One mesh shader work group generates the geometry for one patch of grass. A patch of grass has the following arguments:
We assume the buffer of GrassPatchArguments
as given. We access the buffer at the index of gid
, with gid
being the SV_GroupID
of our thread group. We randomly scatter the blades of grass in a circle around patchPosition
. Since the ground we place the grass on typically is not flat, blades further away from the patchPosition
would start floating mid air. To fix this, we require the groundNormal
to project the blade scattering circle onto the terrain surface. The variable patchRadius
is a global parameter and describes the radius of the scattering circle, thus the maximum distance to the center of the grass patch. To calculate the patchPosition
of a blade of grass in a patch, we obtain a random radius r_{\mathrm{blade}} (bladeRadius
) and a random angle \alpha (alpha
). With these, we can calculate the bladeOffset
from the center of the patch patchPosition
. Each blade is then rotated with a random angle \beta (beta
).
In the following code example, we compute bladeDirection
and bladePosition
. Note that the function rand(...)
provides a seeded and uniformly distributed pseudorandom value between 0 and 1.
...
uint seed = combineSeed(globalSeed, bladeId);
float beta = 2. * PI * rand(seed);
float2 bladeDirection = float2(cos(beta), sin(beta));
float3 tangent = normalize(cross(float3(0,1,0),groundNormal));
float3 bitangent = normalize(cross(groundNormal, tangent));
float alpha = 2. * PI * rand(seed);
float bladeRadius = patchRadius * sqrt(rand(seed));
float3 bladeOffset = bladeRadius * (cos(alpha) * tangent + sin(alpha) * bitangent);
float3 bladePosition = patchPosition + bladeOffset;
...
We also get a height
for the whole patch, which is the mean height of all grass blades of the patch. For a more diverse appearance, we slightly vary the height of each grass blade in a patch:
Thread allocation
Since it is specified in the DirectXSpecs that mesh shaders can only output up to 256 vertices, our patch of grass consists of a maximum of \frac{256}{8}=32 blades of grass. We have 6 primitives and 8 vertices per blade. This results in 192 primitives and 256 vertices per patch. Our vertices have the following attributes:
To write the index and vertex buffer, we use the best practices described in an earlier blog post of this series Mesh Shader Opimizations and Best Practices. To recap, we set the thread group size to its limit of GROUP_SIZE
= 128. We have to make sure that the ith primitive and the ith vertex is written by the ith thread in the thread group. Since our primitive count is greater than the thread group size of 128, we use a threadgroupsided stride of 128. Each thread then calculates a maximum of two vertices and two primitives.
Writing to the vertex buffer
First, we look at how vertices are generated and written, given the group thread ID gtid
.
With this forloop, we run up to two times per thread. When vertId
is larger than the number of vertices V=256 we want to generate, we exit the loop. With this arithmetic, each thread of the group computes the following values in the first loop iteration:
GTID 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
… 

vertId 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 

bladeId 
0 
0 
0 
0 
0 
0 
0 
0 
1 
1 
1 
1 
1 

vertIdLocal 
0 
1 
2 
3 
4 
5 
6 
7 
0 
1 
2 
3 
4 

offsetSign 
– 
+ 
– 
+ 
– 
+ 
– 
+ 
– 
+ 
– 
+ 
– 

t 
0  0  \frac{1}{3}  \frac{1}{3}  \frac{2}{3}  \frac{2}{3}  1  1  0  0  \frac{1}{3}  \frac{1}{3}  \frac{2}{3} 
In the second iteration vertId
is offset by GROUP_SIZE
= 128:
GTID 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
… 

vertId 
128 
129 
130 
131 
132 
133 
134 
135 
136 
137 
138 
139 
140 

bladeId 
16 
16 
16 
16 
16 
16 
16 
16 
17 
17 
17 
17 
17 

vertIdLocal 
0 
1 
2 
3 
4 
5 
6 
7 
0 
1 
2 
3 
4 

offsetSign 
– 
+ 
– 
+ 
– 
+ 
– 
+ 
– 
+ 
– 
+ 
– 

t 
0  0  \frac{1}{3}  \frac{1}{3}  \frac{2}{3}  \frac{2}{3}  1  1  0  0  \frac{1}{3}  \frac{1}{3}  \frac{2}{3} 
With the maximum value of gtid
= 127, we get the following ranges for our variables:
Value 
Range 


0..255 

0..31 

0..7 
With these values, we can determine which vertex has to be generated. But first, we generate control points out of GrassPatchArguments
. Depending on our vertIdLocal
, we modify our control points P to P^ or P^+:
//vector perpendicular to the blade direction
float3 sideVec = normalize(float3(bladeDirection.y, bladeDirection.x, 0));
float3 offset = tsign(vertIdLocal, 0) * WIDTH_SCALE * sideVec;
const static float w0 = 1.f;
const static float w1 = .7f;
const static float w2 = .3f;
p0 += offset * w0;
p1 += offset * w1;
p2 += offset * w2;
The utility function tsign(uint value, int bitPos)
returns 1 or +1 depending on if the bit at bitPos
in value
is set. Thus, when vertIdLocal
is even, we move P in the negative direction, and into the positive direction, when it is odd. We scale the offset at each control point with respectively w_0, w_1 and w_2.
Since we evaluate the Bézier curve at 4 locations, we need 4 different values for the interpolation parameter t
.
float t = (vertIdLocal/2) / float(verticesPerBladeEdge  1);
Vertex vertex;
vertex.height = height;
vertex.rootHeight = p0.z;
vertex.worldSpacePosition = bezier(p0, p1, p2, t);
vertex.worldSpaceNormal = cross(sideVec, normalize(bezierDerivative(p0, p1, p2, t)));
vertex.clipSpacePosition = mul(DynamicConst.viewProjectionMatrix, float4(vertex.worldSpacePosition, 1));
verts[vertId] = vertex;
} //end forloop
...
The previous tables show those different values for t
depending on gtid
and i
. After calculating each needed value, we write the vertex at index vertId
in the output buffer.
We can see that the first thread with gtid
= 0 writes the vertex vertId
= 0 and vertex vertId
= 128.
Writing to the index buffer
Writing to the index buffer works analogously to writing to the vertex buffer. The topology of the primitives is described in Bézier to triangles.
Similarly to how we create our vertex IDs, we generate the triangle IDs: Instead of dividing by verticesPerBlade
, we divide by trianglesPerBlade
.
The offset
depends on the vertices so we multiply with verticesPerBlade
. Depending on if triIdLocal
is even or odd, we either write the right or left triangle of the quad.
The following table shows how the gtid
maps to primitives written.
GTID 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
… 

triId 
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 

bladeId 
0 
0 
0 
0 
0 
0 
1 
1 
1 
1 
1 
1 

triIdLocal 
0 
1 
2 
3 
4 
5 
0 
1 
2 
3 
4 
5 

offset 
0 
0 
2 
2 
4 
4 
8 
8 
10 
10 
12 
12 

Primitive 












We can see that the first thread with gtid
= 0 writes the first primitive in the index buffer at triId
= 0. And in the second iteration, it writes at triId
= 128.
Level of detail
To improve the performance of our grass mesh shader, we reduce the amount of geometry rendered when a patch is further away from the camera. For this, we reduce the number of blades of grass in the distance. To compensate for this, we increase the width of the remaining grass blades for the whole patch.
Fractional scaling
To hide the transition, we implemented a fractional scaling for the number of grass blades. For this, we introduce two variables bladeCount
and its real value version bladeCountF
.
All the grass blades with a bladeId
smaller than bladeCount1
are drawn without modification. The width of the last grass blade at bladeId
= bladeCount1
gets scaled with the fractional part of bladeCountF
.
Without fractional scaling 
With fractional scaling 

Geometry compensation
To keep the visual appearance consistent between every distance from the camera, we modify the width of each grass blade in a patch.
The animation shows the effect in a greatly exaggerated manner, but in a dense meadow, this effect is barely noticeable.
With exaggerated widening 

Wind animation
To simulate the effect of wind, we use a simple approach inspired by the GDC talk from Gilbert Sanders from Guerrilla Games Between Tech and Art: The Vegetation of Horizon Zero Dawn, which uses sine waves in x– and ydirection. To enhance the effect, we add some Perlin noise to the time.
float3 GetWindOffset(float2 pos, float time){
float posOnSineWave = cos(WindDirection) * pos.x  sin(WindDirection) * pos.y;
float t = time + posOnSineWave + 4 * PerlinNoise2D(0.1 * pos);
float windx = 2 * sin(.5 * t);
float windy = 1 * sin(1. * t);
return ANIMATION_SCALE * float3(windx, windy, 0);
}
Wind effect on a single patch of grass 
Wind effect on a meadow 

Pixel shader
To improve the look of our grass when shading, we utilize two simple tricks: First, we fake a selfshadow effect by darkening the grass near its roots. Secondly, we apply Perlin noise to create dark patches in the meadow.
Self shadow 
Perlin noise grass color 

...
static const float3 grassGreen = float3(0.41, 0.44, 0.29);
float selfshadow = clamp(pow((input.worldSpacePosition.y  input.rootHeight) / input.height, 1.5), 0, 1);
output.baseColor.rgb = pow(grassGreen, 2.2) * selfshadow;
output.baseColor.rgb *= 0.75 + 0.25 * PerlinNoise2D(0.25 * input.worldSpacePosition.xz);
...
Note that, as we use a deferred renderer for development, we leave the implementation of the actual shading to the reader. We darken the pixel depending on its height from the root of the blade of grass and apply Perlin noise depending on their world space position.
Furthermore, from experimentation we found that interpolating the grass normal with the up vector gave the blades a softer look.
Future work
Our grass system could be extended and improved in many different areas.
Seasonal effects
By applying a downward force to P_2 we could simulate the effects of seasons. Grass has more springiness in the warmer seasons. During the colder seasons, it is less stiff and lower to the ground.
Further geometry reduction
To further reduce the geometry in the distance we could implement a sparse grass shader. This shader would mimic the appearance of grass with much less geometry by using billboarding.
Other types of vegetation
The mesh shader could be modified to generate different kinds of vegetation. This could include different species of grass, flowers, shrubs and other clutter.
Conclusion
In this blog post, we described how mesh shaders can be used to generate meadows. We explained how grass can be represented by Bézier curves and how to efficiently write our generated geometry to index and vertex buffer. We provided ways to reduce the amount of geometry based on camera distance and illustrated how to animate the grass moving in the wind. We described a simple pixel shader implementation to improve the visuals of our grass. Finally, we provided some ideas on how to improve our implementation.
Appendix
Full grass mesh shader
int tsign(in uint gtid, in int id) {
return (gtid & (1u << id)) ? 1 : 1;
}
struct Vertex
{
float4 clipSpacePosition : SV_POSITION;
float3 worldSpacePosition : POSITION0;
float3 worldSpaceNormal : NORMAL0;
float rootHeight : BLENDWEIGHT0;
float height : BLENDWEIGHT1;
};
static const int GROUP_SIZE 128
static const int GRASS_VERT_COUNT 256
static const int GRASS_PRIM_COUNT 192
[NumThreads(GROUP_SIZE, 1, 1)]
[OutputTopology("triangle")]
void MeshShader(
uint gtid : SV_GroupThreadID,
uint gid : SV_GroupID,
out indices uint3 tris[GRASS_PRIM_COUNT],
out vertices Vertex verts[GRASS_VERT_COUNT]
)
{
const GrassPatchArguments arguments = //Load arguments
SetMeshOutputCounts(GRASS_VERT_COUNT, GRASS_PRIM_COUNT);
static const int verticesPerBladeEdge = 4;
static const int verticesPerBlade = 2 * verticesPerBladeEdge;
static const int trianglesPerBlade = 6;
static const int maxBladeCount = 32;
const float3 patchCenter = arguments.position;
const float3 patchNormal = arguments.normal;
const float spacing = DynamicConst.grassSpacing;
const int seed = combineSeed(asuint(int(patchCenter.x / spacing)), asuint(int(patchCenter.y / spacing)));
float distanceToCamera = distance(arguments.position, DynamicConst.cullingCameraPosition.xyz);
float bladeCountF = lerp(float(maxBladeCount), 2., pow(saturate(distanceToCamera / (GRASS_END_DISTANCE * 1.05)), 0.75));
int bladeCount = ceil(bladeCountF);
const int vertexCount = bladeCount * verticesPerBlade;
const int triangleCount = bladeCount * trianglesPerBlade;
for (uint i = 0; i < 2; ++i){
int vertId = gtid + GROUP_SIZE * i;
if (vertId >= vertexCount) break;
int bladeId = vertId / verticesPerBlade;
int vertIdLocal = vertId % verticesPerBlade;
const float height = arguments.height + float(rand(seed, bladeId, 20)) / 40.;
//position the grass in a circle around the patchPosition and angled using the patchNormal
float3 tangent = normalize(cross(float3(0, 1, 0), patchNormal));
float3 bitangent = normalize(cross(patchNormal, tangent));
float bladeDirectionAngle = 2. * PI * rand(seed, 4, bladeId);
float2 bladeDirection = float2(cos(bladeDirectionAngle), sin(bladeDirectionAngle));
float offsetAngle = 2. * PI * rand(seed, bladeId);
float offsetRadius = spacing * sqrt(rand(seed, 19, bladeId));
float3 bladeOffset = offsetRadius * (cos(offsetAngle) * tangent + sin(offsetAngle) * bitangent);
float3 p0 = patchCenter + bladeOffset;
float3 p1 = p0 + float3(0, 0, height);
float3 p2 = p1 + bladeDirection * height * 0.3;
p2 += GetWindOffset(p0.xy, DynamicConst.shaderTime);
MakePersistentLength(p0, p1, p2, height);
float width = 0.03;
width *= maxBladeCount / bladeCountF;
if (bladeId == (bladeCount1)){
width *= frac(bladeCountF);
}
Vertex vertex;
vertex.height = arguments.height;
vertex.worldSpaceGroundNormal = arguments.normal;
vertex.rootHeight = p0.z;
float3 sideVec = normalize(float3(bladeDirection.y, bladeDirection.x, 0));
float3 offset = tsign(vertIdLocal, 0) * width * sideVec;
p0 += offset * 1.0;
p1 += offset * 0.7;
p2 += offset * 0.3;
float t = (vertIdLocal/2) / float(verticesPerBladeEdge  1);
vertex.worldSpacePosition = bezier(p0, p1, p2, t);
vertex.worldSpaceNormal = cross(sideVec, normalize(bezierDerivative(p0, p1, p2, t)));
vertex.clipSpacePosition = mul(DynamicConst.viewProjectionMatrix, float4(vertex.worldSpacePosition, 1));
verts[vertId] = vertex;
}
for (uint i = 0; i < 2; ++i){
int triId = gtid + GROUP_SIZE * i;
if (triId >= triangleCount) break;
int bladeId = triId / trianglesPerBlade;
int triIdLocal = triId % trianglesPerBlade;
int offset = bladeId * verticesPerBlade + 2 * (triIdLocal / 2);
uint3 triangleIndices = (triLocal & 1) == 0? uint3(0, 1, 2) :
uint3(3, 2, 1);
tris[triId] = offset + triangleIndices;
}
}
Full pixel shader
struct PixelShaderOutput {
float3 patchPosition : SV_Target0;
float4 baseColor : SV_Target1;
float3 normal : SV_Target2;
};
PixelShaderOutput GrassPatchPixelShader(const Vertex input, bool isFrontFace : SV_IsFrontFace)
{
PixelShaderOutput output;
output.position = input.worldSpacePosition;
float selfshadow = clamp(pow((input.worldSpacePosition.y  input.rootHeight)/input.height, 1.5), 0, 1);
output.baseColor.rgb = pow(float3(0.41, 0.44, 0.29), 2.2) * selfshadow;
output.baseColor.rgb *= 0.75 + 0.25 * PerlinNoise2D(0.25 * input.worldSpacePosition.xy);
output.baseColor.a = 1;
float3 normal = normalize(input.worldSpaceNormal);
if (!isFrontFace) {
normal = normal;
}
output.normal.xyz = normalize(lerp(float3(0, 0, 1), normal, 0.25));
return output;
}
Make persistent length
void MakePersistentLength(in float3 v0, inout float3 v1, inout float3 v2, in float height)
{
//Persistent length
float3 v01 = v1  v0;
float3 v12 = v2  v1;
float lv01 = length(v01);
float lv12 = length(v12);
float L1 = lv01 + lv12;
float L0 = length(v2v0);
float L = (2.0f * L0 + L1) / 3.0f; //http://steve.hollasch.net/cgindex/curves/cbezarclen.html
float ldiff = height / L;
v01 = v01 * ldiff;
v12 = v12 * ldiff;
v1 = v0 + v01;
v2 = v1 + v12;
}
Disclaimers
Links to thirdparty sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites, and no endorsement is implied. GD98
Microsoft is a registered trademark of Microsoft Corporation in the US and/or other countries. Other product names used in this publication are for identification purposes only and may be trademarks of their respective owners.
DirectX is a registered trademark of Microsoft Corporation in the US and/or other countries.