a multiplayer game of parenting and civilization building
You are not logged in.
I profiled the server for 4 straight minutes just now, sampling the call stack 20 times per second, using this nifty new profiler that I just wrote:
https://github.com/jasonrohrer/wallClockProfiler
The result was 3840 stack samples (so the timing is a little off, and it's not hard-and-fast 20 times per second, but close).
Here's the raw profile data:
In particular, this is the most telling part of the profile. Specific lines in main that occupy a large portion of the execution time:
Partial stacks of depth [1] with more than one sample:
32.214% ===================================== (1237 samples)
1: main (at server.cpp:7128)
7128:| readySock = sockPoll.wait( (int)( pollTimeout * 1000 ) );
14.740% ===================================== (566 samples)
1: main (at server.cpp:12404)
12404:| recomputeHeatMap( players.getElement( r ) );
6.979% ===================================== (268 samples)
1: main (at server.cpp:13297)
13297:| playerYD );
4.844% ===================================== (186 samples)
1: main (at server.cpp:7832)
7832:| int curOverID = getMapObject( curPos.x, curPos.y );
4.740% ===================================== (182 samples)
1: main (at server.cpp:8049)
8049:| readSocketFull( nextPlayer->sock, nextPlayer->sockBuffer );
3.099% ===================================== (119 samples)
1: main (at server.cpp:14290)
14290:| false, false );
3.047% ===================================== (117 samples)
1: main (at server.cpp:8066)
8066:| nextPlayer->id, message );
2.552% ===================================== (98 samples)
1: main (at server.cpp:13743)
13743:| false, false );
2.292% ===================================== (88 samples)
1: main (at server.cpp:13601)
13601:| false, false );
1.823% ===================================== (70 samples)
1: main (at server.cpp:12551)
12551:| stepMap( &mapChanges, &mapChangesPos );
1.745% ===================================== (67 samples)
1: main (at server.cpp:14229)
14229:| playerListString );
1.432% ===================================== (55 samples)
1: main (at server.cpp:12557)
12557:| getNewCurseTokenHolders( &newCurseTokenEmails );
1.016% ===================================== (39 samples)
1: main (at server.cpp:13843)
13843:| false, false );
Also, note that 32% of the time is spent waiting on epoll. This is normal and fine (it just means that the server is idle 32% of the time).
Offline
Okay this way to profile gets you good wall clock times, at least for a single-logical thread app.
However, without summing up, I find this hard to read. This way highlights single lines where much time is spend, rather than functions, or logical blocks.
I know it's not the brightest but when I had to profile things like this, I just put in manually counters (switchable with preprocessor) in the code and summed out, writing a report at the end. At least also when also using external computing on the graphic card (CUDA) it was the only sensible approach.
Offline
PS: If most of the time is kernel space spend waiting for I/O, one idea might also be to go more to non-blocking I/O. While this may open a can of worms for race conditions, if done well, it should considerably improve things.
I mean there is no reason why a map I/O should block the whole server thread and it couldn't handle player stuff on a far away player at the same time. Proper clunking together chunks to be logical connected is not super trivial tough.
For example the game "Eve Online" had this that it chunked players together in close proximity on "grid": And I suppose they were one logical block on server side. If you came in effective proximity to another player your grids were joined. Two fleets could be on sperate grids or joined grids. Theoretically it should be invisible to the player, as far I recall there were some arcane effects exploited by players tough, like extending their grid by making a chain of players etc.
About profiling with non-blocking I/O pseudo-threads I don't see any reasonable profiling other than manually clock counting in interesting places of the code.
Offline
The full profile linked above shows a break-down by functions. Here's a sample of it:
Functions with more than one sample:
99.818% ===================================== (3833 samples)
main
32.214% ===================================== (1237 samples)
epoll_wait
32.214% ===================================== (1237 samples)
SocketPoll::wait
14.740% ===================================== (566 samples)
recomputeHeatMap
13.802% ===================================== (530 samples)
getMapObject
11.016% ===================================== (423 samples)
Socket::send
9.271% ===================================== (356 samples)
LINEARDB3_getOrPut
9.219% ===================================== (354 samples)
__libc_send
7.344% ===================================== (282 samples)
LINEARDB3_get
7.161% ===================================== (275 samples)
sendMapChunkMessage
7.057% ===================================== (271 samples)
new_do_write
7.057% ===================================== (271 samples)
_IO_new_do_write
7.005% ===================================== (269 samples)
write
7.005% ===================================== (269 samples)
_IO_new_file_write
6.849% ===================================== (263 samples)
getChunkMessage
6.797% ===================================== (261 samples)
LINEARDB3_considerFingerprintBucket
6.484% ===================================== (249 samples)
getMapObjectRaw
6.432% ===================================== (247 samples)
checkDecayObject
6.250% ===================================== (240 samples)
FileLog::logStringV
6.250% ===================================== (240 samples)
PrintLog::logStringV
6.250% ===================================== (240 samples)
AppLog::infoF
5.260% ===================================== (202 samples)
dbGet
4.844% ===================================== (186 samples)
readSocketFull
4.818% ===================================== (185 samples)
Socket::receive
4.583% ===================================== (176 samples)
timed_read
4.245% ===================================== (163 samples)
select
4.062% ===================================== (156 samples)
_IO_puts
4.036% ===================================== (155 samples)
getMapFloor
4.036% ===================================== (155 samples)
_IO_new_file_overflow
4.036% ===================================== (155 samples)
getPTrans
4.010% ===================================== (154 samples)
dbFloorGet
3.984% ===================================== (153 samples)
getTrans
3.906% ===================================== (150 samples)
fseeko
3.802% ===================================== (146 samples)
_IO_new_file_seekoff
3.776% ===================================== (145 samples)
getMapBiomeIndex
3.646% ===================================== (140 samples)
computeMapBiomeIndex
3.229% ===================================== (124 samples)
read
3.151% ===================================== (121 samples)
operator
2.969% ===================================== (114 samples)
getXYFractal
2.917% ===================================== (112 samples)
File::readFileContents
2.865% ===================================== (110 samples)
_IO_vfprintf_internal
2.630% ===================================== (101 samples)
getXYRandomBN
2.630% ===================================== (101 samples)
HashTable<double>::lookupBin
2.578% ===================================== (99 samples)
SettingsManager::getFloatSetting
2.448% ===================================== (94 samples)
HashTable<double>::lookupPointer
2.318% ===================================== (89 samples)
getMapBiome
2.188% ===================================== (84 samples)
Time::getCurrentTime
2.031% ===================================== (78 samples)
dbGetCached
1.979% ===================================== (76 samples)
LINEARDB3_put
1.979% ===================================== (76 samples)
_IO_new_file_xsputn
1.979% ===================================== (76 samples)
__printf
1.901% ===================================== (73 samples)
SettingsManager::getStringSetting
1.875% ===================================== (72 samples)
SettingsManager::getSetting
1.823% ===================================== (70 samples)
stepMap
1.745% ===================================== (67 samples)
SettingsManager::getSettingContents
1.719% ===================================== (66 samples)
zipCompress
1.589% ===================================== (61 samples)
xxTweakedHash2D
1.536% ===================================== (59 samples)
__GI___libc_malloc
1.458% ===================================== (56 samples)
SettingsManager::getIntSetting
1.458% ===================================== (56 samples)
stepCurses
1.432% ===================================== (55 samples)
getNewCurseTokenHolders
1.432% ===================================== (55 samples)
SimpleVector<int>::getElementDirect
1.276% ===================================== (49 samples)
getFirstEmptyBucketIndex
1.224% ===================================== (47 samples)
dbPut
1.224% ===================================== (47 samples)
PrintLog::generateLogMessage
1.224% ===================================== (47 samples)
getBaseMap
1.172% ===================================== (45 samples)
biomeGetCached
1.146% ===================================== (44 samples)
__GI___xstat
1.146% ===================================== (44 samples)
__tz_convert
1.068% ===================================== (41 samples)
SimpleVector<double>::size
1.016% ===================================== (39 samples)
autoSprintf
1.016% ===================================== (39 samples)
Socket::setNoDelay
Offline
As far as detecting time spent in kernel space, this method will do that just fine. In fact, that's the whole point. Where is the program actually spending its time? If you interrupt it right now, where is it? Is it blocked waiting for i/o?
Counting profilers can't do this, because they can't "count" while blocked on a system call.
However, there is one case that I might be missing: if the process is in an uninterruptable sleep state:
https://support.microfocus.com/kb/doc.php?id=7002725
GDB can't interrupt it in that case. So the signal would be ignored for a bit, and then it would land.... where? I'm not sure. I hope it would land in the system call somewhere after the uninterruptable sleep is over (so it would still pinpoint the i/o operation, at least).
Offline
Okay, another thing I realized, after looking at some logs from the server1 stress test.
The server step loop runs each time new messages arrive from clients.
This means that it runs more often if there are more players connected, because messages arrive more often.
In one sample, when there were 100 players connected, the server step loop ran 65 times in a single second. That's a lot of updating.
Yes, there are some "realtime" things that happen in response to client messages, and those should be processed right away.
But some other "regular maintenance" things, like recomputing heat maps, are ridiculous to run 65 times per second. I'm computing heat maps for two players per step, round-robin, and this means that we are recomputing heat maps for 130 players every second (recomputing them for everyone more than once a second). This is obviously a waste.
So, a number of these "regular maintenance" things should be triggered on a fixed timestep, regardless of how many players are connected and how fast messages are arriving.
That will result in a dramatic reduction in CPU usage as the number of connected players grows.
Offline
"That will result in a dramatic reduction in CPU usage as the number of connected players grows."
Nice, 100+ players server incoming
Offline
yeah, unnecessary realism hurting game balance
temp running was op, i get it, but this high refresh on heat maps arent that necessary
so even if that would be like 5 seconds wouldn't be that much a problem, if it means that that reduces lag a bit, it still wouldn't be same as temp running
thing is most cities are desert cities, and got higher chances of keeping more people alive, but it means heat can change a lot of times
i guess it doesn't matter if recalculates anyway
jungle wont fluctuate much, so heat could be calculated on much higher area and would be same anyway
https://onehouronelife.com/forums/viewtopic.php?id=7986 livestock pens 4.0
https://onehouronelife.com/forums/viewtopic.php?id=4411 maxi guide
Playing OHOL optimally is like cosplaying a cactus: stand still and don't waste the water.
Offline
I don't even know what the data stuff means. Guess my past 10 years in china is all goin to trash.
Offline
I don't think heat map is actual temp of your character. It's everything about you. Unfortunate name.
Offline