UE4 Graphics Profiling: Pipeline and Bottlenecks

July 31, 2019 posted by

Hi everyone! Oskar Świerad here. Welcome to part 2 of the tutorial about
graphics pipeline in Unreal Engine 4. In this video I’ll explain the
general graphics pipeline of modern GPUs and
what you should learn to be able to find bottlenecks
in your scenes and places you can optimize. Before we begin we should be able
to define the problem Is our framedrop happening
in a certain area? In some specific game scenario? For example, some barrels exploding or some enemies
entering the area? Or maybe on a hardware
from a certain vendor? Then, as we proceed, we should isolate what we are testing. To come with a meaningful test we have to be able
to make assumptions. So that’s why it’s very important that you learn about
the rendering pipeline both in general on the GPU and in Unreal itself, because it’s engine
has its own path, it’s own potential problems. I encourage you to approach it like a scientist. Make observations what’s
happening in your scene which areas cause the trouble then formulate some hypotheses like, for example: “I think that translucency
is the biggest problem”. Then you should be able to develop
some testable predictions. Like: put your camera in a fixed place, insert a translucent, heavy object, measure performance, delete the object, measure performance again. And with this approach you’ll be able to gather data to test
your predictions. Now let me proceed to
the anatomy of a frame. To display a frame,
both calculations on the CPU and the GPU
have to be finished. So all the game code
has to be computed, all the pixels have
to be shaded and so on, and only then we can dispatch a
frame to the screen. That’s why the cost of a frame, the time it takes to show a frame, is the bigger number of
either CPU or the GPU. If the GPU finished first, then
it would wait for the CPU to dispatch new commands
for the GPU. While, if the GPU
is taking too long, the CPU has to wait for it. If you press [~] and enter: STAT UNIT you’ll be shown these
values you can see here. On the GPU we have concepts of
parallelism and the pipeline. If something is parallel, it
means that multiple cores, hundreds or thousands of cores, work on the same task simultaneously. So obviously, cores are best utilized when they are working on the same task. For example, the same big triangle, that is shaded with a single material. This is important because all cores assigned for a specific task on the GPU have to be exactly in the same place, for example: exactly the same moment in the shader. Now, pipeline is like an assembly line. So the frame is divided into steps and for example, the vertex shader passes the data further to the pixel shader where the pixels are computed. and only then it’s passed onto the screen. So this means that the pixel shader can’t continue before it’s fed with data from the vertex shader. Now, the simplified pipeline in OpenGL and DirectX 11 looks like that: We start with providing vertices to the vertex shader, it does it’s transformations, then comes the tessallation phase, the geometry shader – which is the only one we have direct access to as an artist in Unreal Engine – and last comes the pixel shader which is responsible for materials,
lighting and postprocess. An important thing, that can
really affect your framerate are the draw calls. So draw calls are the commands
sent by the CPU to control the GPU. So, for example, commands like: “change my mesh” or: “change my material” because if you want to draw a
triangle set with a different material, different shader, you have to dispatch a command
from the CPU first, which then goes through the driver, only then is translated and only then is submitted to the GPU. So having a lot of materials, a lot of different, separate objects, is a lot of work for the CPU. You can check your amount of
draw calls by pressing [~] (tilde) and entering: STAT SceneRendering Now let me explain pixel-bound
sources of trouble Because pixels are
most probably the slowest part
in your pipeline. The bigger the resolution, the more pixels we
have to be shaded! Unsurprisingly, I think… So heavy lighting, which
is done per pixel heavy shaders and also
post process effects depend on the resolution
of the screen. And given the current
FullHD and 4K resolutions, this can really mean a lot. So how to check if
we’re pixel-bound? While running our game,
we can press [~] and enter: r.ScreenPercentage
(for example) 25 or r.SetRes 480×270,
for example. Then if your framerate
improved a lot, then it means you’re pixel-bound. Not for example vertex-
bound or memory-bound. The pixels are the problem. The biggest common problem with
shading pixels is translucency. Opaque is very cheap because only the mesh closest
to the camera is being rendered. While, when you have a translucent
object, you have to draw everything: the translucent object,
the next one behind it sometimes even
the next one and only then the
rest of the scene. So it’s a big cost.
And of course translucent particles
can be an unexpected but very heavy source
of computation So, you can use not only level of detail
on the meshes but you can also do level of detail on
particle emitters. You can have very detailed,
translucent particles when you’re nearby but as you go further
away from the emitter, you can replace them
with some more crude particle systems with
fewer particles. to minimize the impact
on performance. Now, an unexpected source
of performance issues that I thinks is not so well
known among artists, while it should be, is the so called “quad overdraw”. This is the reason why
small polygons waste GPU time. In this case, a “quad” means a block of 4 pixels (2 by 2). And most of the
operations on the GPU when it comes to pixel shading are done on full quads or even bigger tiles, like 8×8. Not on single individual pixels. It’s easier for the GPU,
or sometimes necessary, to perform operations
on bigger tiles and only then discard
unnecessary pixels. So the discared pixels
are basically wasted. As you can imagine,
the smaller the triangle, or more thin the triangle,
the bigger the problem. So the triangle count itself is actually not a problem very often. Quite contrary to the popular opinion, the triangle count by itself doesn’t matter It’s much more important to avoid small and thin and long triangles. Use level of detail for
that to control this. Try to keep polygons big and even in screen space. So watch from your camera, from your
game, not just in the 3D package. And empty pixels in foliage are extra-waste. Because you have translucency
or some overdraw. So clip like crazy. Spend more polygons, but keep
closer to the actual shape of the plant to be drawn. So back to the triangle count:
is it really not a problem? Mostly it isn’t, but keep in mind that it can explode after tessallation. When you don’t control tessallation too well or assign too big values, you can end up with a lot of quad overdraw. And the amount of vertices matters for shadow casting. Because a lamp that has
some meshes on its way needs to, sort of, make
a copy of the mesh to draw a shadow map. So the more vertices,
the bigger the problem. Now, this it a nice thing
to keep in mind, but don’t think about it too much –
it’s just good to know – that hard edges (or: “smooth groups”) or splits on your UVs add to actual vertex count because the information
has to be stored several times for a single vertex. But it only affects memory
usage and disk space. Not the actual rendering performance. So as I said before,
dynamic shadows can be a vertex-bound
source of trouble. The heavier meshes the
light has on its way, the bigger the performance cost. So to avoid it it’s best to disable dynamic
shadow wherever you can. Probably, for many of your lights
you don’t need dynamic shadows or you can change them to static. Then it’s also a very good practice to keep the attenuation radius
as small as you can. This will not only limit
the cost of shadowing but also the cost to render
the light in pixel shaders. Now, what issues
can we encounter when dealing with the memory? The first thing is that too many
texture samples in a material use up the bandwidth. And the bandwith is the
amount of data than can that can be transferred
between memory and the actual cores that
perform the calculations. So compression helps a lot. Compression is supported
directly on the GPU, so don’t disable it in your
textures, unless you need it. And the so-called “texture
packing” helps too. It helps you to save on the
amount of texture samplers and to optimize when it
comes to memory. It means that you take, for
example, 3 grayscale textures like roughness, metalness and ambient occlusion and pack them into
specific channels of a single RGB texture. So each one occupies
only a single channel then you can store them
like a single texture. The GPU tries to keep last
accessed area of a texture in the cache. So the cache is a
very fast memory that is located very close
to the actual cores that perform the computations. So instead of going
all the way to the VRAM memory, it can fetch the data it needs
from the nearby cache. So if you want the GPU
to use the cache, which is very small, keep the UV continuous. I know that some clever
shader tricks can, for example, jump over the UVs for
some noisy effects or something like that,
but use it sparingly. Keep it in mind, because this is
not what the GPU expects and it can waste the cache. But normally, with standard
art and materials you should be fine
by default. There is also the
streaming cache. It’s not related to the
the cache on the GPU. It’s a cache in normal RAM that is maintained by Unreal. This is like a storage of textures
that can be loaded at once. So too many textures in a level, and rember that lightmaps
are textures too, can fill the cache. Unreal will try to load the
lowest possible resolution, the lowest resolution that’s needed. For example, some far away
object doesn’t need the biggest resolution
to be loaded. But after the cache is full, You can encounter the same
behavior even on close objects, because it can’t load
any more textures. So press [~] and enter:
STAT streaming when you such behavior
in your scenes. Currently, Unreal has two
ways of rendering lighting. One is called “deferred rendering”
and it is the default method and the second is called
“forward rendering”. The forward rendering
is an older technique but it was recently optimized so it’s now called “Forward+”
[errata: in UE it’s “Cluster Forward Shading”]. – this new approach to forward rendering. And “deferred rendering”
means that all the data are collected first, per pixel, for example normals and normal maps, roughness, base color and only then, in a kind
of a post process, the light is rendered using this data. So only the actual pixels on the screen are being lit. It also applies to reflections. But there’s a significant limitation that if we only draw the
closest pixel to the camera into the final buffer, we have no way to render
transparent objects. So this is when forward
rendering kicks in. In the standard, default settings you have actually both
engines working at once. You can switch to only use
the forward rendering. It can be helpful, because it can
lower the cost of the memory you need to store all the buffers and the final result of the screen. And it’s useful when you have
a statically lit scene. For example, for some VR projects you probably can’t afford too
many dynamic lights anyway, so it can be helpful
to switch to forward. While if you have
many light sources, it can be cool to stay with the
default, deferred rendering. It doesn’t matter for shadows. It only matters for direct lighting. Now let me show you some
optimization view modes. Okay. So this is our
test scene again. In the upper left corner,
press “Lit” go to “Optimization Viewmodes” and here we have them. These are our basic tools to check some performance issues before we proceed with
the actual profiling. So for example, this one
is Light Complexity. Light Complexity shows us the
radius of each particular light. And we can see them overlapping. Let me move this light away… As you can see, this
is the area… …that is affected by the light. Okay. Now it should be better. So this is this light’s
area of influence. If I copy the light… …you can see how
the cost increases in the place where
they overlap. That’s because the pixels have to be shaded
with two sources. With two sources of lighting. Obviously, more light
overlapping mean more troubles with
performance. Here is a particularly bad area. So a good place to change
some lights to static or just delete them. Another interesting mode when it comes to the complexity of
rendering pixels is Shader Complexity. Now you can see
that this area is very, very bad when it
comes to rendering of shaders. That’s because the cost of shading when it comes to translucent objects is not only the closest one but [also] all the objects behind. So this is a sum of the cost of shading – of all the objects from camera up to the end of visibility. That’s why if we have only some translucent things overlapping, it’s not that bad. But as we go further with the amount of additional planes being blended together, we start to have a problem. As you can see here, when
I go back to Shader Complexity, it’s… extremely bad. Now, in “Optimization Viewmodes”
we also have the Quad Overdraw. This is the problem that I mentioned earlier. As you can see, it shows the general overdraw. Not just the quad overdraw problem. So when we have multiple translucent things in behind each other, it goes all the way to white. But for example, our level of detail settings for the asteroids seems to be set to a bit too high, because we have very small polygons, frequently overlapping each other. This building is very well optimized. Because we have almost no overlapping here. Now let me disable the grass. Come on, grass… Disappeared. Thank you. Currently, the tessallation of the
landscape is quite fine. But if you ever have some
problem with that, Then there is a thing you can do. Go here to… “Landscape”>”Manage Mode”>the “Selection” tool for components. Select the components that you
think have too many triangles and it the “Details” tab,
go to “LOD bias” and set for example 1 or 2 or sometimes even higher. And as you can see, the amount
of vertices used decreased. This is a very good practice to to lower the amount of
triangles in your landscape except for the areas
that really matter. When it comes to memory, an interesting
optimization view mode is Lightmap Density. This mode shows the size of the
actual pixels of lightmaps that will be calculated and stored. So the landscape has
quite a low resolution and it’s OK, because the
entire landscape is big. And remember that lightmaps are
stored for each object separately. Each [occurrence] of the same mesh will get it’s unique lightmap anyway,
because it stores lighting, which is different for all
the places in the scene. So if you’re not careful, this can really explode
into huge numbers. And the lightmaps don’t
have to be too detailed. They can be quite blurry
most of the time. It’s better to change your
idea for lighting a bit or some smoother shadows then for example to increase
lightmap density. Here each module has 32 pixels,
I think… no, 64. As I change it, you can see how
the density of pixels increases and the color changes through green all the way to red. Another way to check how well you’re
going with the amount of textures used is to go to “Window”>”Statistics” and this window pops up. This windows pops up, where you can see
all the textures – and you can sort them by the amount of memory used. For example, this material is 2k x 2k [2048 px] but the current usage – the usage that was really
loaded into the streaming pool – is just 64 x 64. Probably it’s used only
by some small object I don’t know which, so I can check it here. Okay, some asteroid is
using this texture in a very low resolution. Now, if I want to suppress
the resolution anyway I can click on the texture name. It shows up in the content browser. And open it. Then in the “Compression” tab, I can expand this setting and say, for example, “Maximum Texture Size”: 256. As you can see, the resolution lowered and this is a very good method
to control your resolution without going to the
actual texture editor. You can change it anytime. “Max In-Game” is shown as 512. So, thank you for watching this part
about GPU pipeline and optimization view modes. Please try this stuff in your projects. If you get stuck, ask in the comments. If you know the answer, help each other. In part 3, I will show you the tool
that is called GPU Visualizer. You run it by pressing Ctrl Shift , (comma) and with the knowledge you
acquired in this video you’ll be able to examine your scene pass by pass. For example here, the pass of
light rendering is the biggest cost. We’ll dive into this in the next part. Thank you again for watching
and – see ya!


34 Replies to “UE4 Graphics Profiling: Pipeline and Bottlenecks”

  1. Captain Black says:

    How to contact with you?

  2. iamisandisnt says:

    I haven't even watched this yet and I love it already 😀

  3. German Viktorovich says:

    Very useful, as usual, thanks

  4. Ronak Singh says:

    this is awesome thanks

  5. SellusionStar says:

    thank you mate!!

  6. Bruno Afonseca says:

    Such good information! Thanks a lot!

  7. Johan Ronner says:

    The tesselation landspace trick helped me out alot! thanks, Our deadline is in less than a week now is there any chance you'll be done with part 3 by then for a final check regarding the GPU visualizer to understand some of the processes before we hand it in? (:

  8. Michał Miłkowski says:

    Hey, do you plan to make closed captions?

  9. Joseph Govi says:

    Love your videos, hoping to see more

  10. N nOni says:

    I can truly say this series is one of the best tutorial about optimization , i found at least 2 point i need to immediate check on my project wihc is tesselation landspace and texture compress size .

  11. N nOni says:

    Can you make a small tutorial on what is Object pooling – Object caching and can it be done only in BP without C++ or plugin?

  12. Serge Lyukshin says:

    Yeah man. That's it.

  13. Carlos A. says:

    I know your channel is mainly focused on art, but i would really love and appreciate a series like this (awesome by the way) about UE4's Game Thread! Profiling both graphics and gameplay would help me tons! Thanks for this.

  14. Hansi HansHans says:

    first of all a big compliment how you gather and bring these Tech-Art infos to the point. keep this channel like it is, spacially for people who are seriously interested in the UE4 pipeline.
    Could you give some hints about how to avoid lightbleeds? giving objects just a bigger lightmap resolution is probably not the best solution i guess.
    A big THX for this channel and plz keep going 😉

  15. Zloty says:

    Hey, Could you help me with this problem?

  16. Ninjin says:

    I could listen to you all day <3. Don't be shy to go overboard, I don't mind if the tutorial is 1 hour and you explain every detail 🙂 *Also, the Links and Sources look like high quality stuff, any pro tips for googling?

  17. esparafucio says:

    Awesome video! In regards to Forward Rendering, Unreal seems to use "Clustered Forward Rendering" based on Ola Olsson's work, which compute lights in a frustrum-space grid, instead of a screen space grid, like Forward+.

  18. IceeyIceey says:

    Wielkie dzięki za tą serię. Dziś sprawdziłem lightmapy w swojej grze i wykryłem wiele niepotrzebnie wysokich rozdzielczości lightmap. Cała seria przyda się jeszcze raz przed wypuszczeniem gry!

  19. Silence Moon says:

    Love it!Is really Help me too much 🙂

  20. Priareos says:

    Hey man, great video! I'm curious about what you said concerning UVs and the Cache (min 11:20). Do you mean UV manipulations like Flowmaps/UV Distortion, UV Tiling changes etc. in the material should be avoided? I do this a lot in my particles and decals, so maybe you could give me some more details on how it will affect performance?

  21. Jacob Ben-David says:

    This content is great, just great. Thank you so much!

  22. Adityasingh Sisodiya says:

    Made my day. Greetings Mate!!! Have a nice cup of coffee

  23. Pepito Grillo says:

    Thank you so much. You are helping me a lot to improve my work.

  24. lesha1955 says:

    Can you please tell me, texture streaming pool locate on RAM or VRAM?

  25. Project Rat says:

    This is some really good stuff. Thanks!

  26. charlie brownau says:

    Do you have any advice or tutorials to support
    OpenGL 4.5 + OpenAL + SDL2 in UE4 instead of DirectX11

  27. Ni says:

    Great tutorial,
    can I ask you what is the difference between Mesh draw calls and Draw primitive calls? I found Draw primitive calls on "stat rhi" and it seems few times bigger. What does it even mean?

  28. dipeyes says:

    very useful!

  29. Amir Saeed Isazadeh says:

    It was very useful thank you 🙂

  30. Navoda Maduwantha says:

    Didn't get

  31. omri1324 says:

    I am so glad I found this! great resource!!

  32. Polygon Academy says:

    dude this is awesome, I'm an environment artist and have a basic technical knowledge base but this really helps when optimizing a scene. thanks for the info 🙂

  33. Kailash Diengdoh says:

    very helpful…thank you

  34. Rogel Salaysay says:

    for every technical surely I'll subscribe!

Leave a Comment

Your email address will not be published. Required fields are marked *