» Mesh shaders on AMD RDNA™ graphics cards » Procedural grass rendering

# Procedural grass rendering

## Introduction

Detailed vegetation plays an important part in improving immersion in video games. One part of this vegetation is grass. In this blog post, we want to utilize mesh shaders to generate patches of grass on the GPU. To do this, we took inspiration from Jahrmann’s and Wimmer’s 2017 i3D Paper Responsive real-time grass rendering for general 3D scenes who utilize tesselation shaders to subdivide predefined blades of grass. This comes with the benefit that additional detail can be generated without needing to store it explicitly. We take this concept further by using one mesh shader thread group to render a whole patch of grass. While in this post we create a stylized meadow to keep things simple, we are confident that our technique can be applied to more realistic scenes, as well.

This blog post is structured as follows: First, we explain which parameters are used to represent a blade of grass and how we use Bézier curves to represent our grass. Next, we calculate vertices, their normal and primitives from our Bézier representation of a blade of grass, and illustrate how blades of grass are combined into a patch. We then explain how we write the index and vertex in a for the GPU efficient way. Next, we outline how we reduce the amount of geometry in the distance while keeping the appearance consistent and explain how we simulate the effect of wind on our meadow. We describe how our pixel shader is used to improve the appearance of our grass, and finally, we provide some ideas on how our work could be extended and improved.

### Growing one blade of grass

Each blade of grass has a position bladePosition, direction bladeDirection and a height bladeHeight. We use these to calculate the control points P_0, P_1 and P_2 of a quadratic Bézier curve to represent the shape of a blade of grass.

Copied!

const static float grassLeaning = 0.3f;
float3 p1 = p0 + float3(0, 0, bladeHeight);


P_0 is simply bladePosition. P_1 is P_0 translated upwards by bladeHeight h. To obtain P_2, we translate P_1 by the bladeDirection vector \vec{d} scaled with bladeHeight times a leaning factor of grassLeaning = 0.3. This preserves the shape of the blade of grass, regardless of its bladeHeight.

To animate the grass we move P_2, which modifies the length of the Bézier curve. To preserve the length of the curve, we use a function to modify P_1 and P_2 to retain the length of the curve, using the function from Jahrmann and Wimmer.

Copied!

MakePersistentLength(p0, p1, p2, bladeHeight);      //Function body in the appendix


A width w for each control point defines the width of the grass blade. To apply the width, we translate each control point outwards by length w using the perpendicular vector of our bladeDirection. The new projected control points are called P_0^-,P_0^+,P_1^-,P_1^+,P_2^- and P_2^+. All positive P^+ form a blade edge, and all negative P^- form the other one. Thus, we now have two Bézier curves representing the edges of the grass blade as can be seen in the following figure:

### Bézier to triangles

To create geometry, we evaluate each of our two edge curves n=4 times. Thus we get 4 vertices per blade edge, or 8 vertices in total. A Bézier curve can be evaluated as follows:

Copied!

float3 bezier(float3 p0, float3 p1, float3 p2, float t)
{
float3 a = lerp(p0, p1, t);
float3 b = lerp(p1, p2, t);
return lerp(a, b, t);
}


Connecting these |V|=8 vertices results in |T|=6 triangles.

Assuming a counter-clockwise winding order, this results in the following primitives:

Primitive

0

1

2

3

4

5

i0

0

3

2

5

4

7

i1

1

2

3

4

5

6

i2

2

1

4

3

6

5

To perform shading, we calculate the normal vectors, depending on the derivative of the Bézier curve:

Copied!

float3 bezierDerivative(float3 p0, float3 p1, float3 p2, float t)
{
return 2. * (1. - t) * (p1 - p0) + 2. * t * (p2 - p1);
}


To get the normal vector, we first need to calculate a normalized vector perpendicular to the bladeDirection. We then calculate the cross product between this sideVec and the derivative at the current interpolation parameter t.

Copied!

float3 sideVec = normalize(float3(bladeDirection.y, -bladeDirection.x, 0));
float3 normal  = cross(sideVec, normalize(bezierDerivative(p0, p1, p2, t)));


### Combining grass blades to a grass patch

One mesh shader work group generates the geometry for one patch of grass. A patch of grass has the following arguments:

Copied!

struct GrassPatchArguments {
float3 patchPosition;
float3 groundNormal;
float  height;
};


We assume the buffer of GrassPatchArguments as given. We access the buffer at the index of gid, with gid being the SV_GroupID of our thread group. We randomly scatter the blades of grass in a circle around patchPosition. Since the ground we place the grass on typically is not flat, blades further away from the patchPosition would start floating mid air. To fix this, we require the groundNormal to project the blade scattering circle onto the terrain surface. The variable patchRadius is a global parameter and describes the radius of the scattering circle, thus the maximum distance to the center of the grass patch. To calculate the patchPosition of a blade of grass in a patch, we obtain a random radius r_{\mathrm{blade}} (bladeRadius) and a random angle \alpha (alpha). With these, we can calculate the bladeOffset from the center of the patch patchPosition. Each blade is then rotated with a random angle \beta (beta).

In the following code example, we compute bladeDirection and bladePosition. Note that the function rand(...) provides a seeded and uniformly distributed pseudo-random value between 0 and 1.

Copied!

    ...

float beta = 2. * PI * rand(seed);

float3 tangent = normalize(cross(float3(0,1,0),groundNormal));
float3 bitangent = normalize(cross(groundNormal, tangent));

float alpha = 2. * PI * rand(seed);

...


We also get a height for the whole patch, which is the mean height of all grass blades of the patch. For a more diverse appearance, we slightly vary the height of each grass blade in a patch:

Copied!

const float bladeHeight = height + float(rand(seed)) * RAND_HEIGHT_SCALE;


Since it is specified in the DirectX-Specs that mesh shaders can only output up to 256 vertices, our patch of grass consists of a maximum of \frac{256}{8}=32 blades of grass. We have 6 primitives and 8 vertices per blade. This results in 192 primitives and 256 vertices per patch. Our vertices have the following attributes:

Copied!

struct Vertex
{
float4 clipSpacePosition   : SV_POSITION;
float3 worldSpacePosition  : POSITION0;
float3 worldSpaceNormal    : NORMAL0;
float  rootHeight          : BLENDWEIGHT0; //Used for fake self shadow
float  height              : BLENDWEIGHT1; //Used for fake self shadow
};


To write the index and vertex buffer, we use the best practices described in an earlier blog post of this series Mesh Shader Opimizations and Best Practices. To recap, we set the thread group size to its limit of GROUP_SIZE = 128. We have to make sure that the i-th primitive and the i-th vertex is written by the i-th thread in the thread group. Since our primitive count is greater than the thread group size of 128, we use a thread-group-sided stride of 128. Each thread then calculates a maximum of two vertices and two primitives.

#### Writing to the vertex buffer

First, we look at how vertices are generated and written, given the group thread ID gtid.

Copied!

...
for (uint i = 0; i < 2; ++i) {
int vertId = gtid + GROUP_SIZE * i;

if(vertId >= vertexCount) break;            //Depends on the number of blades generated

int vertIdLocal = vertId % verticesPerBlade;
...


With this for-loop, we run up to two times per thread. When vertId is larger than the number of vertices |V|=256 we want to generate, we exit the loop. With this arithmetic, each thread of the group computes the following values in the first loop iteration:

GTID

0

1

2

3

4

5

6

7

8

9

10

11

12

vertId

0

1

2

3

4

5

6

7

8

9

10

11

12

0

0

0

0

0

0

0

0

1

1

1

1

1

vertIdLocal

0

1

2

3

4

5

6

7

0

1

2

3

4

offsetSign

+

+

+

+

+

+

t

0 0 \frac{1}{3} \frac{1}{3} \frac{2}{3} \frac{2}{3} 1 1 0 0 \frac{1}{3} \frac{1}{3} \frac{2}{3}

In the second iteration vertId is offset by GROUP_SIZE = 128:

GTID

0

1

2

3

4

5

6

7

8

9

10

11

12

vertId

128

129

130

131

132

133

134

135

136

137

138

139

140

16

16

16

16

16

16

16

16

17

17

17

17

17

vertIdLocal

0

1

2

3

4

5

6

7

0

1

2

3

4

offsetSign

+

+

+

+

+

+

t

0 0 \frac{1}{3} \frac{1}{3} \frac{2}{3} \frac{2}{3} 1 1 0 0 \frac{1}{3} \frac{1}{3} \frac{2}{3}

With the maximum value of gtid = 127, we get the following ranges for our variables:

Value

Range

vertID

0..255

bladeId

0..31

vertIdLocal

0..7

With these values, we can determine which vertex has to be generated. But first, we generate control points out of GrassPatchArguments. Depending on our vertIdLocal, we modify our control points P to P^- or P^+:

Copied!

    //vector perpendicular to the blade direction
float3 offset  = tsign(vertIdLocal, 0) * WIDTH_SCALE * sideVec;

const static float w0 = 1.f;
const static float w1 = .7f;
const static float w2 = .3f;

p0 += offset * w0;
p1 += offset * w1;
p2 += offset * w2;


The utility function tsign(uint value, int bitPos) returns -1 or +1 depending on if the bit at bitPos in value is set. Thus, when vertIdLocal is even, we move P in the negative direction, and into the positive direction, when it is odd. We scale the offset at each control point with respectively w_0, w_1 and w_2.

Since we evaluate the Bézier curve at 4 locations, we need 4 different values for the interpolation parameter t.

Copied!

    float t = (vertIdLocal/2) / float(verticesPerBladeEdge - 1);

Vertex vertex;
vertex.height             = height;
vertex.rootHeight         = p0.z;
vertex.worldSpacePosition = bezier(p0, p1, p2, t);
vertex.worldSpaceNormal   = cross(sideVec, normalize(bezierDerivative(p0, p1, p2, t)));
vertex.clipSpacePosition  = mul(DynamicConst.viewProjectionMatrix, float4(vertex.worldSpacePosition, 1));

verts[vertId] = vertex;
}   //end for-loop
...


The previous tables show those different values for t depending on gtid and i. After calculating each needed value, we write the vertex at index vertId in the output buffer.

We can see that the first thread with gtid = 0 writes the vertex vertId = 0 and vertex vertId = 128.

#### Writing to the index buffer

Writing to the index buffer works analogously to writing to the vertex buffer. The topology of the primitives is described in Bézier to triangles.

Copied!

for (uint i = 0; i < 2; ++i) {
int triId = gtid + GROUP_SIZE * i;

if (triId >= triangleCount) break;

int triIdLocal = triId % trianglesPerBlade;


Similarly to how we create our vertex IDs, we generate the triangle IDs: Instead of dividing by verticesPerBlade, we divide by trianglesPerBlade.

Copied!

    int offset = bladeId * verticesPerBlade + 2 * (triIdLocal / 2);

uint3 triangleIndices = (triLocal & 1) == 0? uint3(0, 1, 2) :
uint3(3, 2, 1);

tris[triId] = offset + triangleIndices;
}   //end for-loop


The offset depends on the vertices so we multiply with verticesPerBlade. Depending on if triIdLocal is even or odd, we either write the right or left triangle of the quad.

The following table shows how the gtid maps to primitives written.

GTID

0

1

2

3

4

5

6

7

8

9

10

11

triId

0

1

2

3

4

5

6

7

8

9

10

11

0

0

0

0

0

0

1

1

1

1

1

1

triIdLocal

0

1

2

3

4

5

0

1

2

3

4

5

offset

0

0

2

2

4

4

8

8

10

10

12

12

Primitive

(0,1,2)

(3,2,1)

(2,3,4)

(5,4,3)

(4,5,6)

(7,6,5)

(8,9,10)

(11,10,9)

(10,11,12)

(13,12,11)

(12,13,14)

(15,14,13)

We can see that the first thread with gtid = 0 writes the first primitive in the index buffer at triId = 0. And in the second iteration, it writes at triId = 128.

## Level of detail

To improve the performance of our grass mesh shader, we reduce the amount of geometry rendered when a patch is further away from the camera. For this, we reduce the number of blades of grass in the distance. To compensate for this, we increase the width of the remaining grass blades for the whole patch.

### Fractional scaling

To hide the transition, we implemented a fractional scaling for the number of grass blades. For this, we introduce two variables bladeCount and its real value version bladeCountF.

Copied!

...

}
...


All the grass blades with a bladeId smaller than bladeCount-1 are drawn without modification. The width of the last grass blade at bladeId = bladeCount-1 gets scaled with the fractional part of bladeCountF.

Without fractional scaling

With fractional scaling

### Geometry compensation

To keep the visual appearance consistent between every distance from the camera, we modify the width of each grass blade in a patch.

Copied!

width *= maxBladeCount / bladeCountF;


The animation shows the effect in a greatly exaggerated manner, but in a dense meadow, this effect is barely noticeable.

With exaggerated widening

## Wind animation

To simulate the effect of wind, we use a simple approach inspired by the GDC talk from Gilbert Sanders from Guerrilla Games Between Tech and Art: The Vegetation of Horizon Zero Dawn, which uses sine waves in x– and y-direction. To enhance the effect, we add some Perlin noise to the time.

Copied!

float3 GetWindOffset(float2 pos, float time){
float posOnSineWave = cos(WindDirection) * pos.x - sin(WindDirection) * pos.y;

float t     = time + posOnSineWave + 4 * PerlinNoise2D(0.1 * pos);
float windx = 2 * sin(.5 * t);
float windy = 1 * sin(1. * t);

return ANIMATION_SCALE * float3(windx, windy, 0);
}


Wind effect on a single patch of grass

To improve the look of our grass when shading, we utilize two simple tricks: First, we fake a self-shadow effect by darkening the grass near its roots. Secondly, we apply Perlin noise to create dark patches in the meadow.

Perlin noise grass color

Copied!

...
static const float3 grassGreen = float3(0.41, 0.44, 0.29);

float selfshadow     = clamp(pow((input.worldSpacePosition.y - input.rootHeight) / input.height, 1.5), 0, 1);
output.baseColor.rgb = pow(grassGreen, 2.2) * selfshadow;
output.baseColor.rgb *= 0.75 + 0.25 * PerlinNoise2D(0.25 * input.worldSpacePosition.xz);
...


Note that, as we use a deferred renderer for development, we leave the implementation of the actual shading to the reader. We darken the pixel depending on its height from the root of the blade of grass and apply Perlin noise depending on their world space position.

Furthermore, from experimentation we found that interpolating the grass normal with the up vector gave the blades a softer look.

Copied!

output.normal.xyz = normalize(lerp(float3(0, 0, 1), normal, 0.25));


## Future work

Our grass system could be extended and improved in many different areas.

Seasonal effects
By applying a downward force to P_2 we could simulate the effects of seasons. Grass has more springiness in the warmer seasons. During the colder seasons, it is less stiff and lower to the ground.

Further geometry reduction
To further reduce the geometry in the distance we could implement a sparse grass shader. This shader would mimic the appearance of grass with much less geometry by using billboarding.

Other types of vegetation
The mesh shader could be modified to generate different kinds of vegetation. This could include different species of grass, flowers, shrubs and other clutter.

## Conclusion

In this blog post, we described how mesh shaders can be used to generate meadows. We explained how grass can be represented by Bézier curves and how to efficiently write our generated geometry to index and vertex buffer. We provided ways to reduce the amount of geometry based on camera distance and illustrated how to animate the grass moving in the wind. We described a simple pixel shader implementation to improve the visuals of our grass. Finally, we provided some ideas on how to improve our implementation.

## Appendix

Copied!


int tsign(in uint gtid, in int id) {
return (gtid & (1u << id)) ? 1 : -1;
}

struct Vertex
{
float4 clipSpacePosition   : SV_POSITION;
float3 worldSpacePosition  : POSITION0;
float3 worldSpaceNormal    : NORMAL0;
float  rootHeight          : BLENDWEIGHT0;
float  height              : BLENDWEIGHT1;
};

static const int GROUP_SIZE       = 128;
static const int GRASS_VERT_COUNT = 256;
static const int GRASS_PRIM_COUNT = 192;

[OutputTopology("triangle")]
uint gid : SV_GroupID,
out indices uint3 tris[GRASS_PRIM_COUNT],
out vertices Vertex verts[GRASS_VERT_COUNT]
)
{
const GrassPatchArguments arguments = //Load arguments

SetMeshOutputCounts(GRASS_VERT_COUNT, GRASS_PRIM_COUNT);

static const int verticesPerBladeEdge = 4;
static const int trianglesPerBlade = 6;
static const int maxBladeCount = 32;

const float3 patchCenter = arguments.position;
const float3 patchNormal = arguments.normal;
const float  spacing     = DynamicConst.grassSpacing;
const int seed           = combineSeed(asuint(int(patchCenter.x / spacing)), asuint(int(patchCenter.y / spacing)));

float distanceToCamera = distance(arguments.position, DynamicConst.cullingCameraPosition.xyz);

for (uint i = 0; i < 2; ++i){
int vertId = gtid + GROUP_SIZE * i;

if (vertId >= vertexCount) break;

int vertIdLocal = vertId % verticesPerBlade;

const float height = arguments.height + float(rand(seed, bladeId, 20)) / 40.;

//position the grass in a circle around the patchPosition and angled using the patchNormal
float3 tangent   = normalize(cross(float3(0, 1, 0), patchNormal));
float3 bitangent = normalize(cross(patchNormal, tangent));

float  offsetAngle  = 2. * PI * rand(seed, bladeId);
float3 bladeOffset  = offsetRadius * (cos(offsetAngle) * tangent + sin(offsetAngle) * bitangent);

float3 p0 = patchCenter + bladeOffset;
float3 p1 = p0 + float3(0, 0, height);
float3 p2 = p1 + bladeDirection  * height * 0.3;

MakePersistentLength(p0, p1, p2, height);

float width = 0.03;

}

Vertex vertex;
vertex.height                 = arguments.height;
vertex.worldSpaceGroundNormal = arguments.normal;
vertex.rootHeight             = p0.z;

float3 offset  = tsign(vertIdLocal, 0) * width * sideVec;

p0 += offset * 1.0;
p1 += offset * 0.7;
p2 += offset * 0.3;

float t = (vertIdLocal/2) / float(verticesPerBladeEdge - 1);
vertex.worldSpacePosition = bezier(p0, p1, p2, t);
vertex.worldSpaceNormal   = cross(sideVec, normalize(bezierDerivative(p0, p1, p2, t)));
vertex.clipSpacePosition  = mul(DynamicConst.viewProjectionMatrix, float4(vertex.worldSpacePosition, 1));

verts[vertId] = vertex;
}

for (uint i = 0; i < 2; ++i){
int triId = gtid + GROUP_SIZE * i;

if (triId >= triangleCount) break;

int triIdLocal = triId % trianglesPerBlade;

int offset = bladeId * verticesPerBlade + 2 * (triIdLocal / 2);

uint3 triangleIndices = (triLocal & 1) == 0? uint3(0, 1, 2) :
uint3(3, 2, 1);

tris[triId] = offset + triangleIndices;
}
}


Copied!

struct PixelShaderOutput {
float3 patchPosition : SV_Target0;
float4 baseColor : SV_Target1;
float3 normal : SV_Target2;
};

{
output.position = input.worldSpacePosition;

float selfshadow = clamp(pow((input.worldSpacePosition.y - input.rootHeight)/input.height, 1.5), 0, 1);
output.baseColor.rgb = pow(float3(0.41, 0.44, 0.29), 2.2) * selfshadow;
output.baseColor.rgb *= 0.75 + 0.25 * PerlinNoise2D(0.25 * input.worldSpacePosition.xy);
output.baseColor.a = 1;

float3 normal = normalize(input.worldSpaceNormal);

if (!isFrontFace) {
normal = -normal;
}

output.normal.xyz = normalize(lerp(float3(0, 0, 1), normal, 0.25));

return output;
}


### Make persistent length

MakePersistentLength Source

Copied!

void MakePersistentLength(in float3 v0, inout float3 v1, inout float3 v2, in float height)
{
//Persistent length
float3 v01 = v1 - v0;
float3 v12 = v2 - v1;
float lv01 = length(v01);
float lv12 = length(v12);

float L1 = lv01 + lv12;
float L0 = length(v2-v0);
float L = (2.0f * L0 + L1) / 3.0f; //http://steve.hollasch.net/cgindex/curves/cbezarclen.html

float ldiff = height / L;
v01 = v01 * ldiff;
v12 = v12 * ldiff;
v1 = v0 + v01;
v2 = v1 + v12;
}


## Disclaimers

Links to third-party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such linked sites, and no endorsement is implied. GD-98

Microsoft is a registered trademark of Microsoft Corporation in the US and/or other countries. Other product names used in this publication are for identification purposes only and may be trademarks of their respective owners.

DirectX is a registered trademark of Microsoft Corporation in the US and/or other countries.

## Looking for a good place to get started with exploring GPUOpen?

Explore our huge collection of detailed tutorials, sample code, presentations, and documentation to find answers to your graphics development questions.

Create wonder. No black boxes. Meet the AMD FidelityFX SDK!

The home of great performance and optimization advice for AMD RDNA™ 2 GPUs, AMD Ryzen™ CPUs, and so much more.

Browse all our useful samples. Perfect for when you’re needing to get started, want to integrate one of our libraries, and much more.

Discover what our SDK technologies can offer you. Query hardware or software, manage memory, create rendering applications or machine learning, and much more!

Analyze, Optimize, Profile, Benchmark. We provide you with the developer tools you need to make sure your game is the best it can be!

New or fairly new to AMD’s tools, libraries, and effects? This is the best place to get started on GPUOpen!

Looking for tips on getting started with developing and/or optimizing your game, whether on AMD hardware or generally? We’ve got you covered!