Skip to content

Optimizing trap calls and giving extra 200 fps to anyone #846

@illwieckz

Description

@illwieckz

So, I noticed that when I switch a local game with lowest graphics preset from prediction off and on (cg_nopredict off on client or g_synchronousClients 0 on server), performance drops from 1000fps to 500fps.

On an online game using a public server, and ultra graphics preset, disabling prediction gave me 200fps more. I actually reached 500fps on a public game.

It happens that when client-side prediction is enabled, most of the CPU time is spent in CG_PredictPlayerState, and in that function, most of the time is spent in trap_GetUserCmd.

There is such code in CG_PredictPlayerState:

	for ( cmdNum = current - CMD_BACKUP + 1; cmdNum <= current; cmdNum++ )
	{
		// get the command
		trap_GetUserCmd( cmdNum, &cg_pmove.cmd );

This code is calling trap_GetUserCmd 63 times per frame… 😱️

In engine, CMD_BACKUP is 64, this value was already 64 at Quake3 source code release time.. Some comment that were already there at the time also said:

allow a lot of command backups for very fast systems

So, despite the code always using the max, the comment says such max is not for everyone.

My first attempt to save performance without entirely disabling the prediction was to add a cvar that would only fetch a given amount of command backups. And it works. We may still use such cvar in graphics preset to do less prediction on lower one. The good thing with such patch is that it doesn't break engine compatibility.

But, but, but. I assume doing IPC is slow, very slow, compared to just running code directly in CPU cache.

On engine side, the function behind trap_GetUserCmd just does that:

bool CL_GetUserCmd( int cmdNumber, usercmd_t *ucmd )
{
	// cmds[cmdNumber] is the last properly generated command

	// can't return anything that we haven't created yet
	if ( cmdNumber > cl.cmdNumber )
	{
		Sys::Drop( "CL_GetUserCmd: %i >= %i", cmdNumber, cl.cmdNumber );
	}

	// the usercmd has been overwritten in the wrapping
	// buffer because it is too far out of date
	if ( cmdNumber <= cl.cmdNumber - CMD_BACKUP )
	{
		return false;
	}

	*ucmd = cl.cmds[ cmdNumber & CMD_MASK ];

	return true;
}

I see nothing in that function that can eat 200 or 500 fps. But well, IPC is always slower than what can do a code in CPU cache

So, I thought… What if we do a trap_GetUserCmds function that would fetch packets in one go? We would query the whole array of 64 commands in one go (or the amount we would only want if using a cvar to customize this), in a single IPC, and then, iterate over all the commands?

I tried to implement a CL_GetUserCmds function that does just that but my code failed to build because it missed some dedicated Write function.

What do you think about it? To implement the "single trap" I would need some help…

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions