Scalability and Asserts I’m Not Yet Fixing

I’ve been a bit quiet because I’ve been working on (hopefully) completely invisible stuff involving backend server scalability.  What does that mean, you ask?  In practical player-facing terms, it means I’m trying to get the lobby and registration system robust enough to first invite the rest of the waiting list in,1 and then to open the beta up completely, allowing people to hear about the game, visit the site, pay their money, and get into the beta instantly.  Between 10 and 500 people sign up a day, depending on how much press the game is getting at the time, and I’m very curious to find out how many of those will join the beta if it’s available immediately, and they aren’t told they have to wait.

I’m currently working on the text-mode robot client.  I’ve got it logging into the lobby and pretending it’s a full SpyParty client, and it can chat, and invite people to games.  I posted on facebook and twitter asking for suggestions of what the robots should say to each other as they’re hanging out in the lobby.

Next up, I’m going to get the robots playing fake matches and games and reporting the game results to the lobby, and logging in and out a bunch.  Then, I’m going to hack up the bees with machineguns load testing app to launch a bunch of EC2 micro instances with multiple robots running on each, and aim them all at my test server.  I expect chaos.

Once I’ve got the obvious stuff fixed, and have figured out how many clients my system can host at a given machine size, I’ll do some tests with beefier machines to make sure it scales linearly.  There are four basic machine resources: CPU, memory, network, and disk…I assume I’ll run out of memory first, but I don’t know for sure.  I was originally going to try to get the entire backend infrastructure running in the cloud, but I think for the near term I’m going to just make sure I get a machine that can accept way more clients signing up and playing than I think I’m going to get when I open up the beta, and hope it holds up.  If it dies due to too much traffic, well, I guess that’s a good problem to have, at least!  I did this same kind of thing with when I started taking beta signups, and I stress tested and optimized it well enough that it didn’t even break a sweat, so hopefully the beta launch itself will go as smoothly. If not, then if Blizzard can do it, so can I! Yikes.

Then, once I’ve got the backend scalable for the robots, I’ll start inviting much larger groups into the beta to loadtest with live players.  There are some features I’m going to need to add to make this work, like chat rooms and colored chat text, and it’s going to be pretty raw to start with, but it should work okay.  The beta testers have been very forgiving of my incredibly primitive lobby so far, so hopefully that attitude will continue!

After everybody’s in, I’ll shut off the signups and let it stew for a few days, and if it’s working, I’ll open it up!  I hesitate to give time estimates on this stuff, because I’ve never hit a date, but most of this will be happening over the next few weeks.

Asserts!

Programming is complicated, and handling errors makes it even more complicated.  Oftentimes, it’s good programming practice to not handle some types of errors gracefully, but simply assert that you aren’t in a state you shouldn’t be in, and if the assertion fails, you exit with an error.

assert(1 + 1 == 2);  // integer arithmetic better work or we're hosed

Of course, during development, the impossible is not only possible, but likely, and so your asserts fire.  And, often in complicated code, you’ll assert things that should be true, but aren’t catastrophic if they’re not true, and so you usually pop up a dialog that gives you the choice to break into the debugger, abort the program, or ignore the assert (just this once, or forever).  The problem is, once you let yourself have that ignore option, you can get lazy, and start using asserts as popup reminders of things to fix.  This tends to be really bad on a large team, because the game is asserting every 2 seconds, and you’re just hitting ignore all the time to other people’s asserts, and it becomes a habit.  However, on an indie sized team, which in my case means exactly one programmer, one can use asserts this way and they can still be useful.  I can get a feel for how often certain types of things are going wrong while I’m testing, and I can remember what I was doing when a specific assert fired most of the time…it goes from a purely quantitative tool to a qualitative tool.

I also can leave the asserts on in beta builds, but in a way that fires silently once and then auto-ignores-forever, and then have them shipped off over the network to the server, so I can see what kinds of things are going wrong on player machines, and how frequently they go wrong.

I tend to be pretty liberal with asserts in my code, and so they fire a lot, and in turn the server logs a lot of them.  About 30000 of them so far in the beta.  Here they are, sorted by frequency:

4918 internalMcostTest(sx, sy)
object_system\subsystems\pathing\pathing.cpp
2237 holding
object_system\subsystems\animation\animation.cpp
2237 ad
object_system\subsystems\animation\animation.cpp
2145 am && (right || left)
object_system\subsystems\animation\animation.cpp
1780 am->playing.blend_out == am->queue.front().blend_in
object_system\subsystems\animation\animation.cpp
1485 am && (boneId != -1) && ad
object_system\subsystems\animation\animation.cpp
1415 cd->object_of_interest
character.cpp
1350 d->NearBookcaseID
situations\bookcase\bookcase.cpp
1114 (err = glGetError()) == GL_NO_ERROR, “code: 0x00000501”
spyparty.cpp
1105 cd->object_of_interest && (cd->object_of_interest == BriefcaseID)
situations\briefcase\briefcase.cpp
994 StatueAD && (StatueAD->Object->Type == object_types::STATUE) && (object_system::GetObject(StatueAD->ParentID)->Type == object_types::PEDESTAL)
situations\pedestal\pedestal.cpp
952 am && (am->playing.type == &core_talks)
situations\conversation\conversation.cpp
813 BriefcaseAD && HoldingBriefcase && d->PlayingCycle
situations\briefcase\briefcase.cpp
801 !d->PlayingCycle
situations\drinks\drinks.cpp
628 verify( pathing::pathGetCharacterValue(x, y) == BriefcasePathValue )
situations\briefcase\briefcase.cpp
539 verify( animation::HandleDetach(am, am->Events[i].event->boneId, cd->holding_right, cd->holding_left, &OnRight) && OnRight )
situations\briefcase\briefcase.cpp
478 cd->object_of_interest == BriefcaseID
situations\briefcase\briefcase.cpp
469 !PoppedYesNoQuestioner
spyparty.cpp
413 d->GoalPedestalID == d->NearPedestalID
situations\pedestal\pedestal.cpp
278 HoldingBriefcase && BriefcaseAD
situations\briefcase\briefcase.cpp
252 verify( HandleDetach(am, am->Events[i].event->boneId, cd->holding_right, cd->holding_left, &OnRight) && OnRight )
situations\pedestal\pedestal.cpp
236 IK.Bone && IK.Target && IK.MeshHardpoint
object_system\subsystems\animation\animation.cpp
229 am->queue.empty()
object_system\subsystems\animation\animation.cpp
216 x == p->ex && y == p->ey
object_system\subsystems\pathing\pathing.cpp
193 0, “unknown packet type: 9”
spy_server.cpp
188 !”stuck!”
character.cpp
172 cd->disposable_left
situations\pedestal\pedestal.cpp
160 verify( pathing::pathGetCharacterValue(x, y) == pathing::PATH_VALUE_INFINITE )
situations\briefcase\briefcase.cpp
151 !cd->holding_right || (cd->holding_right == d->BookID)
situations\bookcase\bookcase.cpp
139 (cd->holding_left == d->BookID) && (!cd->holding_right || (cd->holding_right == d->BookID))
situations\bookcase\bookcase.cpp
110 e.MatchTimestamp > RoundTimeline[LastMarkSuspectIdx].MatchTimestamp
round_events.cpp
97 verify( GetConnectionNames(Us, sizeof(Us), Them, sizeof(Them)) )
spyparty.cpp
78 0, “unknown packet type: 15”
spy_server.cpp
77 0, “unknown packet type: 9”
sniper_client.cpp
74 Distance2(cd->Object->Position, ps_sci->Object->Position) <= MaxHandoffDistance2
situations\briefcase\briefcase.cpp
70 !gd->ForceGoToPedestalID
situations\steal_statue\steal_statue.cpp
66 am->playing.type == &core_briefcase_pickups
situations\briefcase\briefcase.cpp
59 !”spy stuck!”
spy_server.cpp
59 0, “unknown packet type: 9”
spyparty.cpp
55 pathTestOpen(sx, sy)
object_system\subsystems\pathing\pathing.cpp
50 0, “unknown packet type: 7”
spy_server.cpp
48 d->BookBookcaseID && cd->IsGoalOwner(this)
situations\bookcase\bookcase.cpp
43 !”what do to here?”
examples\lobby\lobbyclient.cpp
40 verify( animation::GetPlayingAnimationInfo(am, &time, &duration) )
situations\conversation\conversation.cpp
40 HoldingBriefcase
situations\briefcase\briefcase.cpp
40 am->playing.animid == -1
situations\pedestal\pedestal.cpp
35 n
spyparty.cpp
32 e.MatchTimestamp > RoundTimeline[LastMarkBookIdx].MatchTimestamp
round_events.cpp
26 verify( animation::HandleDetach(am, am->Events[i].event->boneId, cd->holding_right, cd->holding_left, &OnRight) && !OnRight )
situations\bookcase\bookcase.cpp
26 object_system::GetObject(cd->holding_left) && (object_system::GetObject(cd->holding_left)->Type == FNV1(“BOOK”))
situations\bookcase\bookcase.cpp
26 it != am->AnimHandleMap.end()
network.cpp
26 IsTypingString
spyparty.cpp
26 0, “unknown packet type: 8”
spyparty.cpp
25 DefaultCharacterStatePacket && (ndata == CharacterStatePacketSizeBytes)
sniper_client.cpp
25 cd->object_of_interest == d->BookID
situations\bookcase\bookcase.cpp
24 !p2pauth_con && (p2pauthn_state == WAITING_AUTHN)
examples\lobby\lobbyclient.cpp
24 !IsSpy && d->TargetBookcaseID
situations\bookcase\bookcase.cpp
24 HoldingDrink
situations\drinks\drinks.cpp
23 obj->Rotation.IsIdentity()
character.cpp
23 HoldingBook
situations\bookcase\bookcase.cpp
23 !err, “krb5_rd_priv err: -1765328342”
examples\lobby\lobbyclient.cpp
22 (w >= 0) && (w <= 1)
object_system\subsystems\animation\animation_cal3dutils.cpp
22 d_cust && (d_cust->State == drinks_data::INVALID)
situations\serving\serving.cpp
21 !”somehow in a valid state but !HoldingDrink?!”
situations\drinks\drinks.cpp
20 propid && ( (Spy->holding_right == propid) || (Spy->holding_left == propid))
situations\steal_statue\steal_statue.cpp
20 !HoldingDrink
situations\drinks\drinks.cpp
18 !”somehow didn’t get statue”
situations\pedestal\pedestal.cpp
18 ad && (ad->Object->Type == object_types::STATUE)
situations\pedestal\pedestal.cpp
16 o && !(o->Flags & object::UNMANAGED)
object_system\object_manager.cpp
16 am->queue.empty()
situations\drinks\drinks.cpp
13 !”something went wrong picking up briefcase!”
situations\briefcase\briefcase.cpp
12 !”should have detached”
situations\bookcase\bookcase.cpp
11 !”should not get here”
situations\conversation\conversation.cpp
11 0, “unknown packet type: 11”
spyparty.cpp
10 (w >= 0) && (w <= 1) && (u >= 0) && (u <= 1) && (v >= 0) && (v <= 1)
checkerlib\misc\geomutils.cpp
10 verify( network::SendPacket(&gs, sizeof(gs), true) )
spy_server.cpp
10 (rem >= 0.0f) && (rem < 1.0f)
spyparty.cpp
10 Pedestal
situations\pedestal\pedestal.cpp
10 (err = glGetError()) == GL_NO_ERROR, “code: 0x00000505”
spyparty.cpp
9 Statue && (Statue->Type == object_types::STATUE)
object_utils.cpp
9 Level
network.cpp
9 !err
examples\lobby\async_krb5_wrapper.cpp
9 0, “unknown packet type: 1”
sniper_client.cpp
8 SpyCheck == player_control::state::TESTING
situations\check_watch\check_watch.cpp
8 ObjectIDToByte.empty() || nettest_mode
network.cpp
8 !IsSpy && cd->object_of_interest
situations\pedestal\pedestal.cpp
8 ByteToObjectID.empty() || nettest_mode
network.cpp
8 0, “unknown packet type: 20”
sniper_client.cpp
8 0, “unknown packet type: 13”
spy_server.cpp
8 0, “unknown packet type: 13”
spyparty.cpp
7 ObjectIDToByte.empty() && ByteToObjectID.empty()
network.cpp
7 (CameraMode == SNIPER_CAMERA) && Level
round_events.cpp
7 0, “unknown packet type: 24”
spy_server.cpp
6 SwapStatueID && ( (Spy->holding_right == SwapStatueID) || (Spy->holding_left == SwapStatueID))
situations\steal_statue\steal_statue.cpp
6 RoundTimeline.size() < round_events_packet::MAX_NUM_EVENTS
round_events.cpp
6 o
network.cpp
6 mc && (mc->Object->Type == object_types::STATUE) && (StatueMeshIndex < mc->Meshes.size())
object_utils.cpp
6 IsChatAllowed()
spyparty.cpp
6 ad
object_utils.cpp
6 0, “unknown packet type: 21”
sniper_client.cpp
5 !”shouldn’t get here”
situations\steal_statue\steal_statue.cpp
5 mark_value <= 1.0f
spyparty.cpp
5 client && client->OtherClientID
spyparty_lobby.cpp
5 0, “unknown packet type: 25”
spy_server.cpp
4 !”shouldn’t get here, must have moved while holding book”
situations\book_transfer\book_transfer.cpp
4 object_system::GetObject(History.States[i].ID)
network.cpp
4 Level
sniper_client.cpp
4 Level->ActiveGameTypeIndex < Level->GameTypes.size()
sniper_client.cpp
4 !err
examples\lobby\lobbyclient.cpp
4 am
network.cpp
4 0, “unknown packet type: 10”
spyparty.cpp
3 SpyCheck == player_control::state::TESTING
situations\seduction\seduction.cpp
3 !”shouldn’t get here”
examples\lobby\lobbyclient.cpp
3 i_ind == i_dep+1
character.cpp
3 !Focus
player_control.cpp
3 Control->GetSpyTriggeredResult(this) == player_control::state::TESTING
situations\double_agent\double_agent.cpp
3 !(ChooserCurrentCharacter->Object->Flags & object_system::object::UNMANAGED)
ui.cpp
3 am->Events[i].animation->getCoreAnimation()
situations\drinks\drinks.cpp
3 0, “unknown packet type: 16”
spyparty.cpp
3 0, “unknown packet type: 15”
spyparty.cpp
2 verify( network::SendPacket(&p, sizeof(p), true) )
network.cpp
2 verify( network::SendPacket(&gs, sizeof(gs), true) )
spyparty.cpp
2 verify( ConfirmGameIDToLobby(CurrentPlayID, CurrentGameID) )
spy_server.cpp
2 verify( animation::GetPlayingAnimationInfo(am, &time, &duration) )
situations\book_transfer\book_transfer.cpp
2 t >= 0
round_events.cpp
2 State.CurrentNode->Parent->StringSoFar == State.CurrentLeaf->StringSoFar
spyparty.cpp
2 s
checkerlib\misc\glutils.cpp
2 mc
situations\steal_statue\steal_statue.cpp
2 glMultiTexCoord2f_ && glActiveTexture_
spy_server.cpp
2 GameIDs.find(PacketPlayID) == GameIDs.end()
examples\lobby\lobbyclient.cpp
2 !err && p2pauth_con && krbtgt.data && (krbtgt.length < KRBTGT_LIMIT)
examples\lobby\lobbyclient.cpp
2 (err = glGetError()) == GL_NO_ERROR, “code: 0x00000506”
spyparty.cpp
2 !decoder.underflowed() && decoder.on_last_byte()
examples\lobby\lobbyclient.cpp
2 CurrentNetworkObjectID
network.cpp
2 !ChooserScrollDrag
ui.cpp
2 (cd->object_of_interest == d->BookID) || (ps->status != pathing::PATH_success)
situations\bookcase\bookcase.cpp
2 cd->object_of_interest && (cd->object_of_interest == d->BookID)
situations\bookcase\bookcase.cpp
2 ATPCachedDescription
player_control.cpp
2 Array && Num
checkerlib\misc\utils.h
2 am->playing.type == core_book_hidefilm_okay
situations\book_transfer\book_transfer.cpp
2 0, “unknown packet type: 8”
sniper_client.cpp
2 0, “unknown packet type: 2”
spy_server.cpp
2 0, “unknown packet type: 22”
sniper_client.cpp
1 verify( network::SendPacket(DefaultCharacterStatePacket, CharacterStatePacketSizeBytes, true) )
spy_server.cpp
1 verify( network::SendPacket(CommandsPacketBuffer, SizeBytes, true) )
network.cpp
1 verify( Camera.Project(vector_3(0, 0,0), &origin) )
spyparty.cpp
1 StatueAD && (StatueAD->Object->Type == object_types::STATUE)
situations\pedestal\pedestal.cpp
1 !”somehow didn’t get book”
situations\bookcase\bookcase.cpp
1 ScreamSound
spy_server.cpp
1 network::IsConnected()
spy_server.cpp
1 it != NetTestByteMap.end()
network.cpp
1 fabs(1-Length2(Axis)) < 1e-5
checkerlib\misc\math4d.h
1 (err = glGetError()) == GL_NO_ERROR, “code: 0x00000502”
spyparty.cpp
1 DefaultCharacterStatePacket && (ndata == sizeof(*DefaultCharacterStatePacket) + DefaultCharacterStatePacket->NumCharacters*sizeof(DefaultCharacterStatePacket->States[0]))
sniper_client.cpp
1 !decoder.underflowed()
network.cpp
1 !decoder.underflowed() && decoder.on_last_byte()
network.cpp
1 (d >= 0) && (d <= 1)
c:\users\checker\dev\spyparty\project\spyparty\code\network.h
1 CharacterStatePacket && (CharacterStatePacket->NumCharacters <= MaxNumCharacters)
network.cpp
1 ByteToObjectID.find(CurrentNetworkObjectID) == ByteToObjectID.end()
network.cpp
1 0, “unknown packet type: 8”
spy_server.cpp
1 0, “unknown packet type: 6”
spy_server.cpp
1 0, “unknown packet type: 6”
sniper_client.cpp
1 0, “unknown packet type: 4”
sniper_client.cpp
1 0, “unknown packet type: 2”
sniper_client.cpp
1 0, “unknown packet type: 25”
sniper_client.cpp
1 0, “unknown packet type: 19”
spyparty.cpp
1 0, “unknown packet type: 18”
spyparty.cpp
1 0, “got new play id with existing: 0xca98 “
sniper_client.cpp
1 0, “got new play id with existing: 0x57a5 “
sniper_client.cpp
1 0, “got new play id with existing: 0x5267 “
sniper_client.cpp
1 0, “got new play id with existing: 0x3e15 “
sniper_client.cpp
1 0, “got new play id with existing: 0x33fc “
sniper_client.cpp
1 0, “got new play id (0xabeb) with existing (0xcdb6) “
sniper_client.cpp
1 0, “got new play id (0x6f9f) when already playing with existing (0xfb2e) “
sniper_client.cpp
  1. currently at 16681 people as of this post []

11 Comments

  1. jordy says:

    Interested to see how a full-sized community will change and evolve SpyParty game-play.

  2. nat says:

    I cannot wait until I can get into the beta. Every time I get an email notification on my phone I hope it is my beta invite. Excited once you roll out all the invites

  3. Quirken says:

    I think I’d go crazy if I had a todo list as long as yours is. Still, the promise of finally getting paid has got to be a solid motivator.

    Nice to hear an update!

    • checker says:

      Getting paid will be nice, since I’m definitely spending lots of my savings, but the bigger motivator is finally getting all these people invited in and playing!  Once that’s mostly working I can go back to working on the game itself!

  4. Wessel says:

    Hey, great, a new post! Thanks!

    • Keith says:

      I agree, i enjoy reading about what’s going on in that studio of his. I would demand a new post every week, but i don’t think that’d be possible.

  5. Phil says:

    My thoughts when reading the list of asserts: ” Ah yes… of course… we need to compound the hyperlink structure code and hack into the main cpu to boost optimal paths.”

    My signup request date-time is 2011-05-10 17:01:05 and last week you tweeted you’re at 2011-05-10 16:18:43… I’m so very, very close…

  6. Exetera says:

    Why make bots when you can just invite real players? I only have sixteen minutes and forty seconds left…

    • checker says:

      Bots don’t judge me when my code breaks…that I know of.

    • m0r7if3r says:

      Bots judging people’s bad code was what started the robot uprising before the beginning of Terminator.

I have temporarily disabled blog comments due to spammers, come join us on the SpyParty Discord if you have questions or comments!