a multiplayer game of parenting and civilization building
You are not logged in.
You have now 1.5M null lookups/s. I don't think you need more than that.
So now real test. Set the key size and value same the same as in production database. Insert similar number of records. Cleart the cache (allocating 900MB for StackDB, 900MB minus size of the hash tables for KissDB on Linode will do). Run one of the test (e.g. null lookups). Clear the cache. Run next test. Rinse, repeat. Try to give KissDB hash table large enough for 50% load, if memory permits.
We need to reproduce hotspot in StackDB. Maybe just giving it larger hash table will help enough. So then there will be no need to go back to KissDB.
Yes, I could strip the ubuntu build. It's the default linode build, so I don't think it has any GUI or anything installed. Actually, it doesn't even come with a web server installed.
Here's top output sorted by RES column:
Tasks: 88 total, 1 running, 49 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 1009052 total, 160104 free, 80288 used, 768660 buff/cache KiB Swap: 262140 total, 262140 free, 0 used. 753708 avail Mem
You don't need to do anything. It uses only 80MB, 160+768 is free. Almost all is in cache because that's the only sensible use for it in this situation. Once some program starts using it, system will reclaim almost all of it.
The micro Linodes that I'm using have 1GiB system RAM.
I just installed a fresh Ubuntu node for testing, and with nothing of mine running, there are 226M free.
Where it goes? Are you sure you don't count cache as used?
Can you strip it? I always run such systems with no gui, nothing unimportant running. So I'm used to having 90%+ RAM free after boot. Just checked now, even with simple gui 188M/1536M, everything else is free or cache.
The only point I can see to holding the hash table in RAM is that it speeds up NULL results. And those are rather common, I suppose.
Yes, no I/O for null result. You can compress all pages except first. They will take next to nothing in 50% case. But it's stopgap solution, so probably not worth looking into.
Also, all this testing is happening on a magnetic platter hard drive.
If it fits in cache, it shouldn't matter. If it doesn't, it'll be 100-1000 times slower.
Ah. Max bin depth for KISSDB is 31.
Thus, there's not some insane "hot spot" in the djb2 hash that's causing a bunch of values to get piled in one bin.
31 for 50% load? If that's not a typo, it's horrible. I would expect 2-4.
EDIT: I've missed that it's for 2500% version.
On collision, KISSDB is going to insert a whole new page of the entire table, so that's not good either.
That's the weakest point of its design. The goal is to get right hash function and load low enough for keep longest chain to 2-3 items. Almost all items should be in first table.
Also the "Used ______ bytes" lines are recording changes in the total size of allocated memory. So you can see that KISSDB is using quite a bit of RAM here
it uses 8 bytes per hash table pointer. You can halve memory use it by changing uint64 to 32 (will alter disk format as well).
Also, interesting to note that STACKDB with 2500% loading is 2.8x faster overall than KISSDB with 50% loading, but also that all the slowdown in KISSDB comes from INSERT, which is a relatively rare operation.
I would ignore insert performance for now. Most of it is from writting huge hash table once.
KISS with 50% loading is on-par or better than STACK with 2500% loading for the other operations.
When you ignore initial insert and final seq scan it's not:
KDB Random look used 0 bytes, took 2.839000 sec
SDB1 Random look used 0 bytes, took 4.563000 sec
SDB2 Random look used 0 bytes, took 5.093000 sec
KBD Last-inserted lookup used 0 bytes, took 2.056000 sec
SDB1 Last-inserted lookup used 0 bytes, took 2.033000 sec
SDB2 Last-inserted lookup used 0 bytes, took 3.058000 sec
KBD Random look/miss used 0 bytes, took 1.365000 sec
SDB1 Random look/miss used 0 bytes, took 3.774000 sec
SDB2 Random look/miss used 0 bytes, took 2.468000 sec
KBD Inserts after miss used 31997952 bytes, took 6.675000 sec
SDB1 Inserts after miss used 0 bytes, took 5.525000 sec
SDB2 Inserts after miss used 0 bytes, took 7.326000 sec
Anyway, it's clear that "just make the table bigger" isn't going to solve all of the problems here. With 12M map cells recorded, we'd need a KISS table with 24M cells, which gets doubled when there are collisions, and KISS stores these tables entirely in RAM. I'm pretty sure I'd be exhausting system memory.
12M or 120M rows? It uses 8 bytes/hash entry/table. With 3 tables it's 300M for 12M rows. Changing uint64 to 32 will halve it. 150M. Well, I had hundred times more RAM in my previous laptop.
BTW. I don't suggest that as a final solution. Just quick and dirty way to fix lag issues for now, until better solution is implemented.
sc0rp, STACKDB is 7x faster than KISSDB for common operations, and uses infinitely less RAM (because it uses no RAM), and slightly less disk space. The reason we could ever support 100+ players on one server was because of the switch to STACKDB. Compare this benchmark:
KISSDB: Opening DB (table size 4000000) Inserted 1999396
Rise table size to 8M elements (aka reduce load to 25%) and rerun the test.
EDIT: Alternatively can you check easily the longest chain? I want to make sure that crappy hash function is not the main culprit.
It is not so hard really to hit somebody while he is running. People tend to do things in patterns, same goes for running so if you are patient enough you use the knife or the bow with prediction.
But if you miss, you put down the knife. Then if it's real griefer, not noob, he'll pick it up and now he has two knifes and you have none. GG. You may have next occasion to practice next week. So a very slow way to learn how to fight. The only way to train is to become griefer yourself. That's why I asked for practice weapons:
https://onehouronelife.com/forums/viewtopic.php?id=2145
After all you don't practice fight with real weapons with you colleague, and then rush him up to hospital to prop him up before next session. At least most people don't do that. Those which do end up with darwin award and life prison sentence respectively.
Wow, senpai noticed me!
Quite situational as an artwork, though. I doubt that teleportation will remain a thing for long...
Nah, Jason has pretty good track record for keeping fun stuff.
Dying while on a mushroom trip makes you perma-trip in the next life:
This is not intentional.
But now that you have discovered it, I'm intentionally leaving it in there...
Eve locked in tuto house:
Wonder... did you by any chance die of old age inside the tutorial a while back?
That might have set your "eve respawn" location.
I'm tempted to leave this in there... seems like a cool easter egg.
This jogged a memory of some posts I've seen
"A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set"."
I thought about this. It's not that usefull in this case. It needs ~10 bits per key for 1% false positive rate with random access all over it, so you'll pretty much have to keep it in RAM and it'll grow proprtional to number of records in db. You'll need to completely recreate it from time to time to keep 1% false positive rate with increasing number of records. And it will deal with part of the problem only - null lookups. Getting old data will still cause lag with hundreds of reads mulitplied by hundred tiles/thousand items and subitems. Fixing that will need fix to underying structure anyway and then your bloom filter will be useless. So it's waste of time IMO.
So, instead of linear probing or chains to handle collisions, why not always handle any collision by adding a bucket at the end and rehashing the bucket pointed to by S? Then try inserting again, if there's a collision, add a bucket, and so on?
I'm not sure if I understand your question correctly.
If you ask why not to rehash only bucket with collision: lookup needs some deterministic order of buckets to find items later on. If you keep adding buckets randomly, you'll need to keep some mapping to find them later on. With naive implementation this mapping will grow to be as big as the hash table itself. Non naive implementation will lead to virtual/extendible hashing.
If you ask why not to rehash everything up to the bucket with collision: that's really much more elaborate way of doubling the whole table in one go, which is the problem you want to solve. On first collision on average you will need to rehash half of elements in the table. On collision in already split bucket you'll have to first finish rehashing all remaining elements and then start another rehash go up to collision bucket. So in long run it's just doubling on collision.
Otherwise, you need some other metric to determine when you need to add a bucket (how long is the list in this bucket, or how many failed linear probings steps did I execute?)
Typical choices are:
1. Count the number of inserted items. You know what number of elements you can store, so every time you go past expected load, add a bucket.
2. Add a bucket every Nth insert. If your bucket holds 16 elements and you want 50% load, just add new bucket every 8 inserts. That's pretty much equivalent to point 1, but you need to start with one bucket and do something sane on restarts.
3. Add a bucket every time some bucket overflows.
Seems like a collision is a good enough signal that a new bucket is needed, and the new bucket operation is cheap enough that we don't need to do it sparingly (it's not like we're doubling the table at the first sign of a collision, we're just adding one new bucket).
Yes, that's one of the approaches. It will stablize on some load, that will depend mainly on how many items you have per bucket. Drawback is that you cannot control load, so you may not like the performace. Eg. it still will be O(1), but you may need on average 8 reads per lookup/insert with high variation on occasional long chains.
HOWEVER, I also did spend some time today looking around for simple, single C-file databases, just to see how they were doing things. Besides KISSDB, there's really nothing else out there, especially not something with a permissive license. Most of the other options are bloated, heavy-weight libraries.
So, if you finish your implementation, I'd highly recommend putting it on GitHub so that other people can use it.
Maybe I'll put it there. But IMHO it has very narrow use case. When your dataset is big enough that need constantly to tweak KissDB parameters. When you don't need concurrent readers with writer. And when you don't need perfect durability, so few records going missing on power loss is not something you care about. Quite rare IMO.
It would be really interesting to see how it performs on various test data sets when compared to KISSDB and other similar embedded databases like Kyoto or LMDB.
Should be equal or faster than KissDB for all cases that matter. It'll faster than LMDB, but we're not competing it the same category. LMDB does full ACID, readers don't block writer and your values can have variable size. I can't provide that in a simple way.
And why not use LMDB? Looking at this header, my eyes glaze over:
It's pretty solid choice. You will be better off with it, than any hand made solution. You can also implement all further optimizations I've suggested with it (prefetch thread, grouping data).
The other option to consider is Berkeley DB. Version 5 has quite permissive license. But most people are moving away from it after Oracle aquired it and changed license in v6 to Affero GPL.
If you want some quick solution with minimal changes to the code there is another option. Move back to KissDB. IIRC you rewrite whole database on shutdown, to wipe old data. Instead of wipe, calculate number of elements, and allocate new KissDB with 25% load. That should work great until next restart. There is only one small fix I'd make to ensure there are no problems. KissDB uses very bad hash function. So even with 25% load you may end up with some long chain that will waste a lot of memory for all (almost empty) hash tables. Replace it with MurmurHash2 or SipHash.
Kubassa posts are the best though.. Plz no ban!
He has classic insults and never breaks character.
You have to earn your Kubassa insults, but they are always worth the effort <3
Kubassa 2020!
Yep, he is really oldskool. Maybe we should keep him around as endangered species. And his posts work very well as litmus test for anti-griefer solutions. The more insults you get, the better the outcome.
But the standard "it's too full, rehash everything" approach to preventing a hash table from overloading is exactly the O(n) worst case operation that we're seeking to avoid.
It's O(1) amortized cost. If you have 10 items and you need to double, you need to rehash all of them. But then you can insert 10 more elements. Then to double you need to rehash 20, but after that you can insert 20 more. Then to double you need to rehash 40, but after that you can insert 40 more. And so on. So amortized cost is O(1). Most applications doesn't care if it's amortized or not. It matters a lot for real-time apps though and here you need something like Linear Hashing to de-amortize it - spread the cost evenly over all inserts.
The balanced trees and other dynamic data structures will never have an insert operation that goes over O(log n).
Linear hash is dynamic data structure and it keeps all inserts O(1) whp. You can't beat it with tree, unless you need other operations that tree supports and hash does not. Tree needs both more lookups and more overhead space.
Here's a worst case O(1) time data structure for you:
1. Tell me your largest key K.
2. I allocate an array A with K elements.
3. Insert(k,d) => A[k] = d
4. Lookup(k) => return A[k]A hash table is like the randomized, shrunken cousin of this. By knowing all about your dataset ahead of time, sheer miracles are possible!
If you know your keys in advance, there are metods to generate perfect hash - N elements will go uniquely into exactly N bins. Key size doesn't matter then at all. You'll get perfect O(1) lookup with smallest space possible. This array can be seen like final version of perfect hash when you want to keep all keys.
I think Chard's point is that it's kinda dumb to get excited about O(1) average case for hash tables, which makes them look better than O(log n) trees and such, because they can't really be compared to dynamic data structures. In this application, a tree would have been way better, because log of any imaginable number is way less than 152.
They are always faster and more space efficient that trees. You use trees only if you want in order iteration.
The wrong assumption you're making is that you should keep hash table size static. In practice you do it only if you know in advance how big you dataset will be. For every other use case, if load gets too high (say >50%), you double the size and rehash, if it gets too low (say <25%) you halve its size and rehash.
I mean, yeah, if you pre-allocate an array that is 2x as large as your entire data set, you get O(1) performance with a hash table... but... at the end of the day, you have an array. It's really O(n) under the hood.
Linear hashing may deal with some of these issues, but it's a pretty obscure algorithm, relatively speaking.
It deals with all problems there. And it pretty simple, if you know what it tries to achieve. Usually if load gets too high, you just allocate new table twice as big and rehash everything. This still gives you amortized O(1) so it's fine for most purposes. It's not fine when you have real-time application, because you then have this huge spike from time to time.
Linear hashing deamortizes it, by extending hash table by one bucket at a time instead of everything at once. It takes first unsplit bucket and moves half of elements to the new bucket at the end. Then for lookup you need to take this into account. You have three zones: first zone at the beginning holds already split buckets, middle one holds yet unsplit buckets and at the end you have section of split buckets the same size as first. So to get an item you need to figure out in which section your bucket is and use new hash for first and third section and old hash for middle one. That's all.
There's a prevailing "when in doubt, hash table" sentiment out there....
Because they are the best if you don't have some additional requirements like order of iteration. Heresey here is keeping their size static no matter the load and expecting good performance. If you go over recommended load (50-85% depending on collision handling) it degenerates from nearly perfect O(1) structure into totally useless O(N) one.
What new players may not understand too is the PTSD suffered by veteran players, lol. I've been murdered dozens of times for being a female of child-bearing age, for being the "wrong" color, or just for fun. The longer you play, they warier you become of out-of-place behavior and weapons.
This. In the past I couldn't care less about somebody making bow or knife. Nowadays I just drop everything and investigate whether it's somebody sane. I doesn't make much sense to continue doing anything else if we're about to have a blood bath in a minute.
and curse points
It is clear to me that we aren't communicating well here. I'm not a computer scientist or an algorithm specialist.
I didn't know that.
I'm a mathematician and I was making a joke about an ambiguity in the choice of language surrounding complexity. Clearly it didn't land, so I'm dropping this.
It did land. But I thought you are programmer and your assumptions about this data structure was so off, that I felt more inclined to correct them instead of going on with the joke.
scOrp, none of the profiling has yet demonstrated that random access of spots on the disk is the main bottleneck. I.e., that cache misses are the main problem.
Did you clear disk cache between profiler runs? It may not be a problem now with warm cache, but it will be if db size grows larger than RAM. You'll hit very steep performance cliff when WDS gets larger that RAM.
And later on, if you would like to optimize responses below 1ms you will hit this bottleneck even with few I/O requests per user request. But that's the problem for future Jason
Instead, we are marching through 152 (on average) entries of a linked list to determine that a value is not present. That's a problem whether or not the data access is sequential or random.
That said, the basic linear hashing algorithm described here does sound like an elegant way to handle an ever-growing data set:
That's mostly what I implemented, with two important improvements from paper I quoted - linear probing instead of chains and then extending hash table on overflow in last bucket, instead of wrap around.
It was a joke. There was no N in O(1). Saying that the hash tables are O(1) if you keep them small is kinda like saying that they are O(1) if N is small.
If you keep LOAD small (as in 50%-85%). You cannot go over 100% anyway, without some additional data structure tackled on, as you need to store items somewhere. There is no limit to the size of hash table. And you can scale them easily up and down with inserts and deletes. There is also no additional space required for collisions if you use right method for handling it (e.g. double hashing, linear probing).
Anyway as I followed up the key issue is one of space/compute tradeoffs and their relative complexities which I think is well understood and Jason mentioned as a work around his original post.
There are no such tradeoffs, if you choose the right algorithm. Scale the hash table itself, don't append long collisions chains to it - this degrades into linked list. With right approach you have O(1) time whp and O(N) space no matter how big your dataset is.
BTW. Over the weekend I wrote most of the implementation for Linear Hashing (KissDB API compatible). If you're interested I may contribute it, once I have unittests with good coverage. I does keep O(1) gets (including misses) and inserts, no matter how much data you put there.
sc0rp wrote:3. Run separate thread that will prefetch map data for players. You send 32x30 tiles. So prefetch thread should pre-cache map in all directions, something like 96x90. When player does move in any direction, you should have tile and item data already in the RAM. Special case Eve spawn to precache all data before she's born.
The servers are currently single core I believe so this would need extra fixing.
Only a mutex or two around player list and map cache. Thread is only needed to get around blocking I/O, so not CPU-bound.
sc0rp wrote:Hash tables ars O(1) whp. At this point you have bunch of linked lists, not hash table. Hash tables work fine with load up to 50%-85% (depending on how you handle collisions). You run it with load of 15200%, so it degenerated into linked list.
O(1) for small N sounds like a bit of a cheat to me.
What do you mean by small N?
Although we are only discussing compute complexity here not space complexity.
Compute for proper hash table is O(1) whp, space is O(N) (in practice 2N is enough).
Doing a full profile of server1 today, with and without a big map database, showed that the large databases are indeed the main bottleneck. DB_GET operations are 66x slower with the "full" server1 db as compared with an empty database running on the same hardware, for a test case where I'm walking around an unexplored part of the map solo.
So things I've suggested blindly would have fixed the problem. This one solves it completely:
1. Get rid of long random access chains in db. Linear probing with doubling the size of table when load gets high should work fine. It will make it possible to run read request in parallel as a side effect (necessary for point 3).
And this one would reduce lookups hundredfold:
4. Expliot spatial locality of requests. Instead of having multiple records per tile (biome, items, subitems, etc.), clump up all information about tiles and items in 8x8 or 16x16 block in one record. For a some small write amplifictation, it would give you two-three orders of magnitude speed up on reading map data.
I think this one will be also necessary later on to get to sub ms responses:
3. Run separate thread that will prefetch map data for players. You send 32x30 tiles. So prefetch thread should pre-cache map in all directions, something like 96x90. When player does move in any direction, you should have tile and item data already in the RAM. Special case Eve spawn to precache all data before she's born.
But in general, the problem will still remain: as the database gets bigger, access times for the NULL result will get slower. A 2x bigger database will make these operations 2x slower, etc.
That's what you need:
Linear Hashing with Overflow-Handling by Linear Probing
http://www.cs.tau.ac.il/~shanir/advance … larson.pdf
Note that if you keep load <50% (and for real time app I would recommend to do so) significantly simpler algorithm would do. You can get rid of multiple passes/swipes. They are only needed if you want to save space and keep load at 80-85%. But that costs few lookups for each get and insert, so it's kind of bad tradeoff for real-time app. Disk space is cheap.
It's interesting that so many people talk about hash tables being O(1) average case, when in fact they are actually O(n) average case, if they are of fixed size. Here we have O( n / 80000 ), which is still O(n). Linear growth in execution time as n grows.
Hash tables ars O(1) whp. At this point you have bunch of linked lists, not hash table. Hash tables work fine with load up to 50%-85% (depending on how you handle collisions). You run it with load of 15200%, so it degenerated into linked list.
Thanks--not sure how I missed that. It'd be great if adopted kids had a link to their adoptive parent, the same way that "killed by" names are linked.
+1
And probably milk from cows.
Commits on Jul 23, 2018
Working on bison.
https://github.com/jasonrohrer/OneLifeD … its/master