-
Notifications
You must be signed in to change notification settings - Fork 63
Description
So, I noticed that when I switch a local game with lowest graphics preset from prediction off and on (cg_nopredict off
on client or g_synchronousClients 0
on server), performance drops from 1000fps to 500fps.
On an online game using a public server, and ultra graphics preset, disabling prediction gave me 200fps more. I actually reached 500fps on a public game.
It happens that when client-side prediction is enabled, most of the CPU time is spent in CG_PredictPlayerState
, and in that function, most of the time is spent in trap_GetUserCmd
.
There is such code in CG_PredictPlayerState
:
for ( cmdNum = current - CMD_BACKUP + 1; cmdNum <= current; cmdNum++ )
{
// get the command
trap_GetUserCmd( cmdNum, &cg_pmove.cmd );
This code is calling trap_GetUserCmd
63 times per frame… 😱️
In engine, CMD_BACKUP
is 64
, this value was already 64
at Quake3 source code release time.. Some comment that were already there at the time also said:
allow a lot of command backups for very fast systems
So, despite the code always using the max, the comment says such max is not for everyone.
My first attempt to save performance without entirely disabling the prediction was to add a cvar that would only fetch a given amount of command backups. And it works. We may still use such cvar in graphics preset to do less prediction on lower one. The good thing with such patch is that it doesn't break engine compatibility.
But, but, but. I assume doing IPC is slow, very slow, compared to just running code directly in CPU cache.
On engine side, the function behind trap_GetUserCmd
just does that:
bool CL_GetUserCmd( int cmdNumber, usercmd_t *ucmd )
{
// cmds[cmdNumber] is the last properly generated command
// can't return anything that we haven't created yet
if ( cmdNumber > cl.cmdNumber )
{
Sys::Drop( "CL_GetUserCmd: %i >= %i", cmdNumber, cl.cmdNumber );
}
// the usercmd has been overwritten in the wrapping
// buffer because it is too far out of date
if ( cmdNumber <= cl.cmdNumber - CMD_BACKUP )
{
return false;
}
*ucmd = cl.cmds[ cmdNumber & CMD_MASK ];
return true;
}
I see nothing in that function that can eat 200 or 500 fps. But well, IPC is always slower than what can do a code in CPU cache
So, I thought… What if we do a trap_GetUserCmds
function that would fetch packets in one go? We would query the whole array of 64
commands in one go (or the amount we would only want if using a cvar to customize this), in a single IPC, and then, iterate over all the commands?
I tried to implement a CL_GetUserCmds
function that does just that but my code failed to build because it missed some dedicated Write
function.
What do you think about it? To implement the "single trap" I would need some help…