<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SpyParty - A Spy Game About Subtle Behavior</title>
	<atom:link href="http://www.spyparty.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.spyparty.com</link>
	<description>Chris Hecker&#039;s new espionage game about subtle behavior, deception, performance, and perception.</description>
	<lastBuildDate>Tue, 21 May 2013 19:43:10 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Loadtesting for Open Beta, Part 4: Done optimizing the lobbyserver!</title>
		<link>http://www.spyparty.com/2013/05/21/loadtesting-for-open-beta-part-4-done-optimizing-the-lobbyserver/</link>
		<comments>http://www.spyparty.com/2013/05/21/loadtesting-for-open-beta-part-4-done-optimizing-the-lobbyserver/#comments</comments>
		<pubDate>Tue, 21 May 2013 06:13:11 +0000</pubDate>
		<dc:creator>checker</dc:creator>
				<category><![CDATA[beta]]></category>
		<category><![CDATA[indie games]]></category>
		<category><![CDATA[metrics]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.spyparty.com/?p=3210</guid>
		<description><![CDATA[Check out Loadtesting for Open Beta, Part 1, Part 2, and Part 3 to read the previous installments of this epic tale! It&#8217;s been a while since the last update in this series, sorry about that!  At the end of Part 3, I mentioned the SimCity launch giving me pause about my goal of testing [...]]]></description>
				<content:encoded><![CDATA[<p><em>Check out <a title="Loadtesting for Open Beta, Part 1" href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/">Loadtesting for Open Beta, Part 1</a>, <a title="Loadtesting for Open Beta, Part 2" href="http://www.spyparty.com/2013/03/03/loadtesting-for-open-beta-part-2/">Part 2</a>, and <a title="Loadtesting for Open Beta, Part 3" href="http://www.spyparty.com/2013/03/18/loadtesting-for-open-beta-part-3/">Part 3</a> to read the previous installments of this epic tale!</em></p>
<p>It&#8217;s been a while since the last update in this series, sorry about that!  At the end of <a title="Loadtesting for Open Beta, Part 3" href="http://www.spyparty.com/2013/03/18/loadtesting-for-open-beta-part-3/">Part 3</a>, I mentioned the <a href="http://kotaku.com/tag/sim-city">SimCity launch</a> giving me pause about my goal of testing the <strong>SpyParty</strong> lobbyserver to 1000 simultaneous robots.  Well, I got scared enough after their launch that I increased my optimization target to 2000 simultaneous robots on my old and slow server, and then I also decided to bite the bullet and upgrade the server hardware after I hit 2000 to give myself some extra headroom.  I really don&#8217;t think I&#8217;m going to hit these numbers at Open Beta launch or even for a long time after that, but I&#8217;d rather err on the conservative side and have it purr along nicely.</p>
<p>Since I waited so long to post this Part 4, I can&#8217;t really give a play-by-play of all the optimizations I did as they happened, so I&#8217;m going to give the general arc I followed, and then talk about some of the interesting stops along the way.</p>
<a name="iprof%2C+atop%2C+oprofile%2C+et+al."></a><h3>iprof, atop, oprofile, et al.</h3>
<p>As I mentioned at the end of the last post, I&#8217;d fixed some of the huge and obvious things with the network bandwidth usage, so it was time to start profiling the CPU usage.  There are lots of different kinds of profilers, but the one I use the most is based on <a href="http://silverspaceship.com/src/iprof/">Sean Barrett&#8217;s iprof</a>.  I&#8217;ve modified it a fair bit over the years,<sup><a href="http://www.spyparty.com/2013/05/21/loadtesting-for-open-beta-part-4-done-optimizing-the-lobbyserver/#footnote_0_3210" id="identifier_0_3210" class="footnote-link footnote-identifier-link" title="I&rsquo;ll&nbsp; release my changes at some point.">1</a></sup> but the core of the system is still the same.  It&#8217;s a runtime profiler that requires instrumenting your code into blocks, it&#8217;s efficient enough that you can leave it on all the time as long as you don&#8217;t stick a &#8220;prof block&#8221; in an inner loop, and you can generally see where you&#8217;re spending your time hierarchically.  It can draw to the screen, but I also have it output to a string, and so on the lobbyserver I can have it output to the log after a spike, and also catch a signal I send and it&#8217;ll force a prof dump.  Here&#8217;s an example:</p>
<pre style="padding-left: 30px;"><span style="font-size: x-small;">2013/04/17-16:12:10: 85.156 ms/frame (fps: 11.74)  sort self - current frame</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10: zone                                                     self     hier    count</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:  ProcessMessages                                      59.2910  59.2910     1.00</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10: +Send                                                 17.8164  18.4493  1120.69</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10: +ClientsUpkeepAndCloseLoop                             2.3034   2.6989   793.15</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:  Log                                                   1.3559   1.3559    25.97</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:  unpack_bytes                                          0.7311   0.7311    26.56</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:  unpack                                                0.6674   0.6674  3551.71</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10: +ClientsPacketLoop                                     0.5492   3.7160   792.58</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10: +ClientsUpdated                                        0.4023  10.9843     0.56</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10: +SendQueuedClientRoomMessages                          0.2524   7.3086     1.00</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:  FindClientByID                                        0.2494   0.2494   267.05</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10: +JournalQueuedSave                                     0.2374   0.3882     1.43</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10: +Tick                                                  0.2112  25.6313     1.00</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:  iprof_update                                          0.2051   0.2051     1.00</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10: +JournalSavePrep                                       0.1560   0.5442     1.43</span></pre>
<p>As you can see, it&#8217;s pretty easy to read, and you can drill down on individual blocks and see who calls them and who they call:</p>
<pre style="padding-left: 30px;"><span style="font-size: x-small;">2013/04/17-16:12:10: 85.156 ms/frame (fps: 11.74)  sort graf - current frame</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10: zone                                                     self     hier    count</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:     LoginReply                                         0.0006   0.0006     0.01</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:     JOINING                                            0.0007   0.0008     0.03</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:    +TYPE_CLIENT_GAME_ID_REQUEST_PACKET                 0.0011   0.0014     0.44</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:    +NewWaitingForJoinClients                           0.0019   0.0019     0.01</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:    +TYPE_CLIENT_PLAY_PACKET                            0.0086   0.0087     0.12</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:    +TYPE_CLIENT_INVITE_PACKET                          0.0213   0.0225     2.37</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:    +TYPE_CLIENT_IN_MATCH_PACKET                        0.0280   0.0282     0.49</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:    +JournalQueuedSave                                  0.0563   0.0571     1.18</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:    +RoomsChanged                                       0.0852   0.0939    15.38</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:    +NewInLobbyClients                                  0.1155   0.1342    14.55</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:    +ClientsUpkeepAndCloseLoop                          0.1362   0.2093   150.55</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:    +TYPE_CLIENT_MESSAGE_PACKET                         0.3122   0.3170     8.87</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:    +SendQueuedClientRoomMessages                       6.7122   7.0013   501.99</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:    +ClientsUpdated                                    10.3361  10.5720   424.67</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10: -Send                                                 17.8164  18.4493  1120.69</span><br /><span style="font-size: x-small;">2013/04/17-16:12:10:     unpack                                             0.6329   0.6329  3362.06</span></pre>
<p>This is super useful.  The biggest downside to it is that it&#8217;s not thread-aware, but I&#8217;ve made it thread-safe via the brute force method of having it ignore all threads that aren&#8217;t the &#8220;main&#8221;.  My code is mostly single-threaded, but the threadedness increased a fair bit during these optimizations, so I hope to eventually modify iprof to be thread-aware without losing too much simplicity and performance.  However, until I make those modifications, any background thread activity will show up attributed to one of these main-thread blocks.  You can still get useful data, you just have to be aware of this.  For example, ProcessMessages in the loop above is hiding a WaitForMultipleObjectsEx call on Windows, or a call to select/epoll on POSIX, so it&#8217;s not actually taking that much active time on the main thread.</p>
<p>I also used <a href="http://oprofile.sourceforge.net/news/">oprofile</a>, which is a nice sampling profiler on Linux that can profile per-thread using just the debug information in an application, and <a href="http://atoptool.nl/">atop</a> for keeping track of things happening on the machine as a whole.</p>
<p>Here&#8217;s a list of the stuff I ended up optimizing:</p>
<ul>
<li>I was originally sending out the chat messages to all clients as they came in, but I started queuing them up and sending them all out at once to reduce send calls.  Of course, once you do this, you have to make sure you don&#8217;t overflow the network packet if you have queued a lot of messages that tick, so that makes the code more complicated and harder to modify, which is a tradeoff one often has to make while optimizing, and it&#8217;s why you want to put off most optimization until you need it&#8230;although you should have a rough plan for how you&#8217;ll optimize a piece of code in the future even if you write it the dumb way first.</li>
<li>I made more threads, including putting network sending and receiving on separate threads, making a separate thread for logging, and a thread for saving files to the disk.  There were already threads for talking to the database and Kerberos, for receiving network packets, and for checking for new client builds.  These are all relatively simple threads to add, because they&#8217;re all just throwing data into a queue on one thread and taking it out on another, although multithreading a program always makes it harder to understand.  I discovered a fair number of deadlock bugs in <a href="https://developers.google.com/talk/libjingle/">libjingle</a>, the library I&#8217;m using for <a href="http://en.wikipedia.org/wiki/NAT_traversal">NAT traversal</a> and some cross platform threading stuff, and I&#8217;ve fixed some of them.  I&#8217;ve veered far enough from the original libjingle code that I&#8217;m probably just going to have to put my version up as a fork, sadly.</li>
<li>I timesliced the login phase for the clients.  Previously, when a client would log in, I&#8217;d process a bunch of stuff immediately, including some authentication stuff which can be somewhat time consuming.  In a load test where hundreds of clients log in to the server at the same time, this would bog down, so I now process a maximum of 20ms worth of clients each tick.  This makes some clients wait a bit longer before they&#8217;re logged in, but doesn&#8217;t result in a positive feedback loop where there&#8217;s a really long tick, so a lot of packets will have arrived while it was happening, so the next tick is really long too, etc.</li>
<li>Like the player list packets, I also made the room list packets incremental, and able to span multiple network packets.  This way all the lists of players and rooms that the lobby sends to the clients can be differential and arbitrarily long, so there&#8217;s no more hard limit on the number of clients that can join the lobby.  I think there&#8217;s actually a bug in this code, but I&#8217;ve only ever seen it once, even after tens of thousands of robot sessions, so I just hope it shows up more at some point.</li>
<li>I switched the POSIX networking inner loop in libjingle from <a href="http://linux.die.net/man/2/select">select</a> to <a href="http://linux.die.net/man/4/epoll">epoll</a>.  This was not so much an optimization as it was simply to allow more than 1024 sockets to work at all.  epoll is also a lot faster, but I&#8217;m currently kinda using it in a dumb way, so I&#8217;m not benefiting from that speed boost much yet.</li>
<li>There were also a bunch of smaller traditional code optimizations, like using maps to cache lookups, using free lists to avoid some allocations, and whatnot.  Oh, and don&#8217;t forget to <a href="https://twitter.com/checker/status/335503826939424768">change the ulimit -n settings in limits.conf</a> on Linux, so your process can actually accept a lot of connections!</li>
</ul>
<p>As I was doing these optimizations, I would run a loadtest with a bunch of robots and profile the lobby.  I was at 500 robots at the end of Part 3, and I slowly raised the ceiling as I improved the code over the weeks:  569 robots&#8230;741 robots&#8230;789 robots, 833, 923, 942, 990, 997, 1008, 1076, 1122, 1158, 1199, 1330, 1372, 1399, 1404, 1445, 1503, 1614, 1635, 1653, 1658, 1659&#8230;</p>
<p>When I hit 1659 it was late one night, and so I stopped for the day.  When I resumed work and did the next couple of optimizations, I figured I&#8217;d get it to 1800 or something.  I always launched 20% or so more robots than I was hoping to support in a given test to account for internet and <a href="http://aws.amazon.com/ec2/">EC2</a> variation, and for plain old bugs in the clients that would sometimes manifest themselves, so this time I must have launched 2500 robots, because when I looked up from the profiles running in ssh terminals and over to my <strong>SpyParty</strong> client logged into the test lobby, I saw this:</p>
<div id="attachment_3224" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/05/SpyParty-v0.1.2681.1-20130501-15-23-57-0.png"><img class="size-large wp-image-3224" title="SpyParty-v0.1.2681.1-20130501-15-23-57-0" alt="" src="http://cdn.spyparty.com/wp-content/uploads/2013/05/SpyParty-v0.1.2681.1-20130501-15-23-57-0-600x415.png" width="600" height="415" /></a><p class="wp-caption-text">This wasn&#8217;t supposed to happen yet.</p></div>
<p>Uh, I guess I was done optimizing?</p>
<p>I was actually kind of disappointed, to be honest.  I had all sorts of cool ideas for optimizations I was planning to do that I&#8217;d come up with while testing and profiling the code, and now, if I was going to follow my own plan and stop when I hit 2000 simultaneous robots, I would have to just take a bunch of notes for next time I optimized so I could pick up where I left off, and move on.  The good news is I&#8217;m pretty sure I can make the lobby almost twice as efficient if and when the time comes to do that!</p>
<a name="Room+at+the+Inn"></a><h3>Room at the Inn</h3>
<p>If you look closely at that screenshot, you&#8217;ll see the thumb on the scrollbar for the player list is pretty small.  That&#8217;s because all 2010 players are in a single room, which is not going to work very well for a lobby full of real people.  In fact, the only reason there were 2010 players in that room was because at the time I&#8217;d limited the room size to 2010 because I didn&#8217;t want to bother teaching the robots how to use rooms.  There were actually a few hundred more robots knocking on the door but they couldn&#8217;t get in.  But, now that I&#8217;d hit my 2k target, it was time to fix that.</p>
<p>I immediately realized I had a problem.  Currently, when you connect to the lobby, it sends you a list of rooms, and you have to pick one to log in.  But, what if the rooms are full?  Oops, you couldn&#8217;t log in.  So, as soon as I set the room size down to something more reasonable, like 100, then the first 100 robots got in and the rest just sat there failing to join.</p>
<p>It seemed like there were a number of solutions to this problem, including allowing players to create new rooms before logging in, but in the end I went with the simplest and most robust solution, which is to have the lobby create a new empty room if all the current rooms are full.  The initial room is always called <em>Headquarters</em>, so I named these new dynamic rooms <em>Headquarters 2</em> and onward.  Very creative, I know.  Somebody suggested using spy movie titles for these room names, but I figured that wouldn&#8217;t scale very well, ignoring the potential copyright issues.  If the lobby ever finds one of these dynamic rooms empty, it kills it, unless all the other rooms are full.  I also have the lobby automatically put you in a now-guaranteed-to-exist-non-full-room if you log in and try to join a full room, even if it wasn&#8217;t full when you clicked on it, so this eliminated a login race condition too, which is always a good sign.</p>
<p>This last bit also made it so I didn&#8217;t need to make the loadtesting robots know very much about rooms:  they always try to join Headquarters and if they don&#8217;t end up there, oh well.  As they join, they kind of spill over into the latest dynamic room until it fills up, and then they continue to the next, kind of like filling up an ice tray with water from one end.  I should probably make them test the actual room features by creating and changing rooms and whatnot, but the single giant 2010 player room was a way more intense loadtest than having 20 rooms with 100 players in each due to the chat broadcasting.</p>
<p>I don&#8217;t know if 100 is the right limit for room populations.  100 would still be way too many people to have in a single reasonable conversation, but I didn&#8217;t want to put too low of a limit on the size before I have tested things with humans instead of just robots.</p>
<a name="The+Client"></a><h3>The Client</h3>
<p>There&#8217;s this annoying thing that happens when you&#8217;re testing computer code, and it&#8217;s that you encounter problems and bugs not only in the code you&#8217;re trying to test, but also in your test code.  This was no different.  I was constantly fixing various bugs in the robots that would keep them from all connecting correctly, and I even made sure some of the optimizations helped the client side so I could run more robots on a given EC2 server.  Plus, just making sure the robots keep trying to connect and login was important, because if there was a timeout due to an initial burst, you want them to try again automatically after it dies down, rather than just sitting there not doing anything.</p>
<p>As I said in <a title="Loadtesting for Open Beta, Part 1" href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/">Part 1</a>, I started out running about 50 robots on each m1.small EC2 instance.  That didn&#8217;t scale, for some reason I&#8217;m still trying to figure out.  That worked okay with a low number of instances, but as I increased the number of instances, I had to lower the number of robots on each instance, eventually to around 20 per m1.small.  An AWS account starts with only being able to start 20 instances, so I did a total of two instance limit requests to Amazon, first to 100 and then to 300.  It&#8217;s scary to have 300 instances running&#8230;even though m1.small instances are only 6¢ an hour each, that&#8217;s still $18 an hour when there are 300 of them running, and Amazon rounds up to the hour, so if you miss shutting them down by a minute you just lost a large pizza!  It looks like Google&#8217;s new <a href="https://cloud.google.com/pricing/compute-engine">Compute Engine</a> thing is about twice as expensive for their somewhat similar low end machine (ignoring performance differences), but charges in 1 minute increments after the first 10, which might be cheaper for this very transient use-case.</p>
<p>I seem to remember reading somewhere that Amazon allocates instances for the same account to the same physical machine if possible, which might explain this scaling problem, since it means I was probably maxing out a given piece of server hardware with too many instances bursting at the same time.  It&#8217;s hard to tell if this is the case, and I need to do more testing before saying for sure.  A <a href="http://www.spyparty.com/2013/03/03/loadtesting-for-open-beta-part-2/comment-page-1/#comment-67637">commenter</a> said there might be a packets-per-second limitation in EC2, as well, but I haven&#8217;t verified that.  Once I&#8217;ve tried a few different things, I&#8217;ll do a long technical post on <a href="http://chrishecker.com">chrishecker.com</a> about EC2, <a href="https://www.linode.com/">linode</a>, and my dedicated host machine, comparing the different results I got.</p>
<p>Finally, I had to do some optimization on the <strong>SpyParty</strong> game client when the numbers started getting high.  I went a little nuts with the chat system early on and it has completion on all commands, room names, and player names, but the code that builds the completion tree was calling the memory allocator 35k times per update when the numbers of players got high, so I had to remove some of the stupid in that code as well.</p>
<a name="The+New+Server"></a><h3>The New Server</h3>
<p>With all that done, and 2010 robots running on the old server, I haggled with my hosting provider and started renting a newer and much faster server.  I use <a href="http://www.softlayer.com/">SoftLayer</a> for dedicated hosting, and have for years.<sup><a href="http://www.spyparty.com/2013/05/21/loadtesting-for-open-beta-part-4-done-optimizing-the-lobbyserver/#footnote_1_3210" id="identifier_1_3210" class="footnote-link footnote-identifier-link" title="Well, they were servermatrix when I started, and then The Planet, and now SoftLayer.">2</a></sup> My old server was a Pentium 4 with a single hyperthreaded core, 1GB ram, and a 100Mbps uplink, and the new server is a Xeon 3460 with four hyperthreaded cores, 4GB ram, and 1Gbps uplink, so it&#8217;s slightly more expensive but a lot faster.  That said, everybody seems to be using <a href="http://en.wikipedia.org/wiki/Virtual_private_server">VPS</a> hosts these days.  I talked to some other indie game developers, but I didn&#8217;t have time to do a full evaluation of the tradeoffs, so went with the devil I knew, so to speak.  It seems like VPS is going to be a bit slower but also a bit cheaper, but the big advantage of VPS to me is that you can move the virtual machine image to faster hardware and have it up and running again in minutes.  That&#8217;s a pretty great scaling sweetspot between having a single physical server and praying it doesn&#8217;t melt, and a scalable system that elastically uses cloud computing The Right Way™, but it&#8217;s also hundreds of times easier to get a VPS image working and then move it to a faster machine than it is to scale elastically.  So, I dunno, it&#8217;s definitely something worth looking into more during the year as I see how things are scaling.</p>
<p>The new server ate the robots for lunch:</p>
<div id="attachment_3230" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/05/SpyParty-v0.1.2703.1-20130518-20-57-40-0.png"><img class="size-large wp-image-3230" title="SpyParty-v0.1.2703.1-20130518-20-57-40-0" alt="" src="http://cdn.spyparty.com/wp-content/uploads/2013/05/SpyParty-v0.1.2703.1-20130518-20-57-40-0-600x421.png" width="600" height="421" /></a><p class="wp-caption-text">The new server works pretty well.</p></div>
<p>For reference, 4850 simultaneous players is pretty far up the <a href="http://store.steampowered.com/stats/">top 100 Steam games by player count</a>, so I don&#8217;t think I have to worry about those numbers for a while.  Here&#8217;s atop&#8217;s view of things:</p>
<div id="attachment_3237" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/05/2013-05-18-20_47_06-atop.png"><img class="size-large wp-image-3237" title="2013-05-18 20_47_06-atop" alt="" src="http://cdn.spyparty.com/wp-content/uploads/2013/05/2013-05-18-20_47_06-atop-600x344.png" width="600" height="344" /></a><p class="wp-caption-text">Well within parameters.</p></div>
<a name="What%26%238217%3Bs+Next%3F"></a><h3>What&#8217;s Next?</h3>
<p>So, that&#8217;s it for the lobbyserver loadtesting.  Now I need to move the website and registration system over to the new server, test them a bit, and start inviting everybody in in big batches.  Soon I&#8217;ll send out email to the beta testers to set up some scheduled human loadtests as well.  The robots will be jealous, left out in the cold, looking in at all the humans actually playing the game. </p>
<p>Open Beta is fast approaching.</p>
<hr/><ol class="footnotes"><li id="footnote_0_3210" class="footnote">I&#8217;ll  release my changes at some point.</li><li id="footnote_1_3210" class="footnote">Well, they were servermatrix when I started, and then The Planet, and now SoftLayer.</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.spyparty.com/2013/05/21/loadtesting-for-open-beta-part-4-done-optimizing-the-lobbyserver/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Come ask questions of me and Jonathan Blow in San Francisco on Wednesday!</title>
		<link>http://www.spyparty.com/2013/05/06/come-ask-questions-of-me-and-jonathan-blow-in-san-francisco-on-wednesday/</link>
		<comments>http://www.spyparty.com/2013/05/06/come-ask-questions-of-me-and-jonathan-blow-in-san-francisco-on-wednesday/#comments</comments>
		<pubDate>Tue, 07 May 2013 00:25:15 +0000</pubDate>
		<dc:creator>checker</dc:creator>
				<category><![CDATA[design]]></category>
		<category><![CDATA[events]]></category>
		<category><![CDATA[indie games]]></category>
		<category><![CDATA[miscellaneous]]></category>

		<guid isPermaLink="false">http://www.spyparty.com/?p=3200</guid>
		<description><![CDATA[Update:  Okay, the video is up from this talk, it&#8217;s a mix of general indie game stuff, some SpyParty and The Witness stuff, and miscellaneous stuff: Jonathan Blow and I will be answering—or trying to answer—questions from the audience on Wednesday, May 8th, at 7pm PDT in San Francisco as guests of the SF IGDA. [...]]]></description>
				<content:encoded><![CDATA[<p>Update:  Okay, the video is up from this talk, it&#8217;s a mix of general indie game stuff, some <strong>SpyParty</strong> and <a href="http://the-witness.net">The Witness</a> stuff, and miscellaneous stuff:</p>
<p><a href="http://www.spyparty.com/2013/05/06/come-ask-questions-of-me-and-jonathan-blow-in-san-francisco-on-wednesday/"><em>Click here to view the embedded video.</em></a></p>
<p><a href="http://the-witness.net">Jonathan Blow</a> and I will be answering—or trying to answer—questions from the audience on Wednesday, May 8th, at 7pm PDT in San Francisco as guests of the <a href="https://www.facebook.com/IGDASanFrancisco">SF IGDA</a>.  We did a kind of similar thing <a href="http://kotaku.com/5923134/weve-got--jonathan-blow-the-witness-braid-and-chris-hecker-spy-party-here-to-answer-your-best-questions">a while back on Kotaku</a>, and it turned out pretty well, so hopefully it&#8217;ll work again this time.  It&#8217;s open to the public, and it&#8217;s hosted by Dolby Labs:</p>
<p style="padding-left: 30px;">Dolby Laboratories Inc.<br />100 Potrero Ave, San Francisco, CA</p>
<p>Here&#8217;s the <a href="https://groups.google.com/forum/?fromgroups=#!topic/igda-sf/W3IcSo0D554">official announcement</a>.  I assume the questions will mostly be about indie game development.  I hope most of the questions are about indie game development.  Come ask questions about indie game development, please.</p>
<div id="attachment_3203" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/05/jon-loves-chris-DSCN0932.jpg"><img class="size-large wp-image-3203" title="Friendship" src="http://cdn.spyparty.com/wp-content/uploads/2013/05/jon-loves-chris-DSCN0932-600x450.jpg" alt="" width="600" height="450" /></a><p class="wp-caption-text">This post was really just an excuse to show you this picture.</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.spyparty.com/2013/05/06/come-ask-questions-of-me-and-jonathan-blow-in-san-francisco-on-wednesday/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>One Year of SpyParty</title>
		<link>http://www.spyparty.com/2013/04/17/one-year-of-spyparty/</link>
		<comments>http://www.spyparty.com/2013/04/17/one-year-of-spyparty/#comments</comments>
		<pubDate>Thu, 18 Apr 2013 04:40:19 +0000</pubDate>
		<dc:creator>checker</dc:creator>
				<category><![CDATA[art]]></category>
		<category><![CDATA[beta]]></category>
		<category><![CDATA[miscellaneous]]></category>

		<guid isPermaLink="false">http://www.spyparty.com/?p=3189</guid>
		<description><![CDATA[Wow, SpyParty has the best beta testers in the world!  Not only are they really patient while they&#8217;re waiting for invites, then they&#8217;re patient while finding good repros for bugs, then more patient while waiting for fixes for all the bugs they find, and they&#8217;re patient and super helpful mentoring new players&#8230;but they&#8217;re also hilarious [...]]]></description>
				<content:encoded><![CDATA[<p>Wow, <strong>SpyParty</strong> has the best beta testers in the world!  Not only are they really patient while they&#8217;re waiting for invites, then they&#8217;re patient while <a title="How to Report Bugs the SpyParty Way" href="http://www.spyparty.com/2012/04/12/how-to-report-bugs-the-spyparty-way/">finding good repros for bugs</a>, then more patient while <a title="One Bug’s Story, or, Assume it’s a bug!" href="http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/">waiting for fixes for all the bugs they find</a>, and they&#8217;re patient and super helpful mentoring new players&#8230;but they&#8217;re also hilarious and fun and creative and just generally great to hang out with!</p>
<p>I&#8217;ve been heads down on <a title="Loadtesting for Open Beta, Part 3" href="http://www.spyparty.com/2013/03/18/loadtesting-for-open-beta-part-3/">trying to get the lobby scalable for open beta</a>, so I didn&#8217;t even realize today is the one year anniversary of <a href="http://www.spyparty.com/2012/04/17/the-welcome-post-and-tutorial-video-from-the-beta-forums/comment-page-1/#comment-39116">the first batch of invites I sent to strangers</a>!  To celebrate this auspicious day, <a href="https://twitter.com/zerotka"><strong>zerotka</strong> </a>made this amazing video and posted it on the forums; it&#8217;s simply wonderful:</p>
<p><a href="http://www.spyparty.com/2013/04/17/one-year-of-spyparty/"><em>Click here to view the embedded video.</em></a></p>
<p>Thank you so much, I <span style="color: #ff0000;">♥</span> you all!  More invites soon!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.spyparty.com/2013/04/17/one-year-of-spyparty/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Loadtesting for Open Beta, Part 3</title>
		<link>http://www.spyparty.com/2013/03/18/loadtesting-for-open-beta-part-3/</link>
		<comments>http://www.spyparty.com/2013/03/18/loadtesting-for-open-beta-part-3/#comments</comments>
		<pubDate>Mon, 18 Mar 2013 06:37:06 +0000</pubDate>
		<dc:creator>checker</dc:creator>
				<category><![CDATA[beta]]></category>
		<category><![CDATA[indie games]]></category>
		<category><![CDATA[metrics]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.spyparty.com/?p=3139</guid>
		<description><![CDATA[Read Loadtesting for Open Beta, Part 1 and Part 2 to catch up on the spine-tingling story so far! When we last left our hero, our differential state update change was a resounding success and reduced the network bandwidth utilization from 98% to 3%, and it looked like we could move on to optimizing the [...]]]></description>
				<content:encoded><![CDATA[<p><em>Read <a title="Loadtesting for Open Beta, Part 1" href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/">Loadtesting for Open Beta, Part 1</a> and <a title="Loadtesting for Open Beta, Part 2" href="http://www.spyparty.com/2013/03/03/loadtesting-for-open-beta-part-2/">Part 2</a> to catch up on the spine-tingling story so far!</em></p>
<p><a title="Loadtesting for Open Beta, Part 2" href="http://www.spyparty.com/2013/03/03/loadtesting-for-open-beta-part-2/">When we last left our hero</a>, our differential state update change was a resounding success and reduced the network bandwidth utilization from 98% to 3%, and it looked like we could move on to optimizing the lobbyserver code itself to get to our goal of 1000 simultaneous loadtesting robots, until we noticed <a title="Loadtesting for Open Beta, Part 2" href="http://www.spyparty.com/2013/03/03/loadtesting-for-open-beta-part-2/#Up+Next,+The+Case+of+the+Missing+Robots">some of our robots were missing</a>!  This led me on a wild and wooly chase through the code, which I will recount for you now&#8230;</p>
<a name="Where%26%238217%3Bd+the+robots+go%3F"></a><h3>Where&#8217;d the robots go?</h3>
<p>The first order of business was to figure out why some robots were dying when they <em>weren&#8217;t</em> supposed to, and some weren&#8217;t dying when they <em>were</em> supposed to.  Robots: they never do what you tell them.</p>
<p>If you look at this graph of the number of running robots from last time, you can see that right off the bat, a bunch of them die on all the machines, and then they keep dying for about 30 seconds, and then it stabilizes.  Each of these machines should have 50 robots running solidly during the test period.</p>
<div id="attachment_3116" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/03/ec2-loadtest-client-counts.png"><img class="size-large wp-image-3116" title="ec2-loadtest-client-counts" src="http://cdn.spyparty.com/wp-content/uploads/2013/03/ec2-loadtest-client-counts-600x344.png" alt="" width="600" height="344" /></a><p class="wp-caption-text">The number of loadtest robots running on each EC2 instance.</p></div>
<p>Then, to make matters worse, some of them don&#8217;t die when they&#8217;re supposed to at the end of the test.  In the graph above, they only all finally die when I kill them manually from a separate script at 03:16:30.  This points towards two different problems I&#8217;m going to have to debug on the test machines&#8230;that only manifest themselves intermittently&#8230;with gdb&#8230;in the cloud. Good times!</p>
<p>Okay, first things first, let&#8217;s see if the robots will tell me where they&#8217;re going.  The lobbyclient robots can output verbose log files, but I had them turned off because I was worried about bogging down the client machines.  It turns out this isn&#8217;t much of a problem as I&#8217;ll discuss below, so I turned on logging and re-ran a test.  Then I ssh&#8217;d into one of the servers, and looked at the log files.  Well, before I looked the files themselves, I just did an <span style="font-family: courier new,courier;">ls</span> of the directory:</p>
<pre style="padding-left: 30px;">-rw-r--r-- 1 root root 258577 Mar  5 03:02 out59<br />-rw-r--r-- 1 root root 332320 Mar  5 03:02 out60<br />-rw-r--r-- 1 root root 177743 Mar  5 03:02 out61<br />-rw-r--r-- 1 root root 181639 Mar  5 03:02 out62<br />-rw-r--r-- 1 root root 264535 Mar  5 03:02 out63<br />-rw-r--r-- 1 root root 333515 Mar  5 03:02 out64<br />-rw-r--r-- 1 root root 282875 Mar  5 03:02 out65<br />-rw-r--r-- 1 root root 271040 Mar  5 03:02 out66<br />-rw-r--r-- 1 root root    264 Mar  5 03:01 out67<br />-rw-r--r-- 1 root root    264 Mar  5 03:01 out68<br />-rw-r--r-- 1 root root 284838 Mar  5 03:02 out69<br />-rw-r--r-- 1 root root 332967 Mar  5 03:02 out70<br />-rw-r--r-- 1 root root 303352 Mar  5 03:02 out71<br />-rw-r--r-- 1 root root 310596 Mar  5 03:02 out72<br />-rw-r--r-- 1 root root 194669 Mar  5 03:02 out73<br />-rw-r--r-- 1 root root 313193 Mar  5 03:02 out74<br />-rw-r--r-- 1 root root 238246 Mar  5 03:02 out75<br />-rw-r--r-- 1 root root 264190 Mar  5 03:02 out76<br />-rw-r--r-- 1 root root 198096 Mar  5 03:02 out77<br />-rw-r--r-- 1 root root 233980 Mar  5 03:02 out78<br />-rw-r--r-- 1 root root    264 Mar  5 03:01 out79<br />-rw-r--r-- 1 root root    264 Mar  5 03:01 out80<br />-rw-r--r-- 1 root root 301029 Mar  5 03:02 out81<br />-rw-r--r-- 1 root root 299694 Mar  5 03:02 out82<br />-rw-r--r-- 1 root root    264 Mar  5 03:01 out83<br />-rw-r--r-- 1 root root 351158 Mar  5 03:02 out84<br />-rw-r--r-- 1 root root 188071 Mar  5 03:02 out85<br />-rw-r--r-- 1 root root 242228 Mar  5 03:02 out86</pre>
<p>Well, there&#8217;s a clue, at least for the early-dyers.  The contents of those 264 byte log files look like this:</p>
<pre style="padding-left: 30px;">Lobby Standalone Client: 1000.0.0.5<br />init genrand w/0, first val is 1178568022<br />Running for 61 seconds.<br />LobbyClient started, v1000.0.0.5 / v12<br />LobbyClient UDP bound to port 32921<br />lobbyclient: sendto_kdc.c:617: cm_get_ssflags: Assertion `i &lt; selstate-&gt;nfds' failed.</pre>
<p>A-ha!  sendto_kdc.c is a file in the <a href="http://web.mit.edu/Kerberos/">Kerberos</a> libraries, which I use for login authentication.</p>
<p>I really love Kerberos, <a href="http://web.mit.edu/kerberos/dialogue.html">the architecture just feels right to me</a>, the API is simple, clean, and flexible, it&#8217;s cross-platform and open source, so I&#8217;ve been able to contribute features and bug fixes as I&#8217;ve used it and trace into the code when I was confused about something, and the folks at MIT that develop it are smart, knowledgeable, open-minded, and <a href="http://www.google.com/search?hl=en&amp;q=%2B&quot;Chris Hecker&quot; site%3Amail-archive.com kerberos">don&#8217;t mind some crazy indie game developer asking dumb questions</a> about the best way to do things that were pretty clearly not part of the original university and enterprise use-cases.  Most importantly, it&#8217;s battle-tested; it&#8217;s used by tons of different applications, and it&#8217;s the foundation of the modern Windows domain and Xbox authentication systems, so I know it works.  <strong>The last thing you ever want to do is roll your own authentication system.</strong></p>
<p>So, that assert&#8217;s the first place to look for the early-dying robots.</p>
<p>Next, I looked into the never-dying robots.  I logged into one of the machines that still had zombie robots<sup><a href="http://www.spyparty.com/2013/03/18/loadtesting-for-open-beta-part-3/#footnote_0_3139" id="identifier_0_3139" class="footnote-link footnote-identifier-link" title="ZOMBIE ROBOTS!!!">1</a></sup> running, ran <span style="font-family: Courier New,Courier,mono;">pidof lobbyclient</span> to figure out the process ID of one of them, and attached gdb to the robot.  A quick <span style="font-family: Courier New,Courier,mono;">thread apply all backtrace full</span> and I found the thread that was hanging while the main thread was trying to join them and exit cleanly.  It looked like the bad code was in a call to <a href="http://linux.die.net/man/2/poll">poll</a>, and it just so happened it was in sendto_kdc.c as well! I realized I was going to need some debug symbols, but this was easy since I build the Kerberos libraries myself,<sup><a href="http://www.spyparty.com/2013/03/18/loadtesting-for-open-beta-part-3/#footnote_1_3139" id="identifier_1_3139" class="footnote-link footnote-identifier-link" title="I have some local patches I haven&rsquo;t cleaned up enough to contribute yet">2</a></sup> so a quick scp of the debuginfo rpm and reattaching gdb and I could dig down a bit deeper.</p>
<p>The Kerberos libraries are built with optimizations on, which always makes debugging interesting, but I think it builds programming character to debug optimized code, so I don&#8217;t mind.<sup><a href="http://www.spyparty.com/2013/03/18/loadtesting-for-open-beta-part-3/#footnote_2_3139" id="identifier_2_3139" class="footnote-link footnote-identifier-link" title="gdb is not the best for assembly language debugging, but I did learn about &ldquo;layout asm&rdquo;, which helps a bit.">3</a></sup>  Here&#8217;s the code in question:</p>
<pre>    if (in-&gt;end_time.tv_sec == 0)<br />        timeout = -1;<br />    else {<br />        e = k5_getcurtime(&amp;now);<br />        if (e)<br />            return e;<br />        timeout = (in-&gt;end_time.tv_sec - now.tv_sec) * 1000 +<br />            (in-&gt;end_time.tv_usec - now.tv_usec) / 1000;<br />    }<br />    /* We don't need a separate copy of the selstate for poll, but use one<br />     * anyone for consistency with the select wrapper. */<br />    *out = *in;<br />    *sret = poll(out-&gt;fds, out-&gt;nfds, timeout);</pre>
<p>Well, these loadtesting machines are under some load themselves so they can be a bit sluggish, and there&#8217;s a problem with this code in that scenario if the call to k5_getcurtime() happens later than the in-&gt;end_time passed in by the caller.  As it says on the <a href="http://linux.die.net/man/2/poll">poll manpage</a>, <em>&#8220;Specifying a negative value in timeout means an infinite timeout.&#8221;</em>  Digging around on the stack verified the timeout was negative.</p>
<p>Okay, so now we have a pretty good clue for each of the problems.  The second problem with the poll timeout seemed easy to fix, but the first one was pretty mysterious and might take some real debugging.  I decided to<a href="http://mailman.mit.edu/pipermail/krbdev/2013-March/011451.html"> check with the krbdev mailing list</a> to see if they had any ideas while I looked into the problems more deeply.  While doing so, I looked at the main Kerberos source repository and <a href="http://mailman.mit.edu/pipermail/krbdev/2013-March/011452.html">found a commit for the timeout problem</a>, so it had already been fixed in a later version.  I was hoping maybe this was true of the assert as well.  True to form, the most excellent Greg Hudson <a href="http://mailman.mit.edu/pipermail/krbdev/2013-March/011453.html">replied with three more commits</a> he thought might help.  Meanwhile, I hacked the code to loop on a call to sleep() instead of asserting to convert the early-dyers into never-dying zombies so I could attach the debugger, since that&#8217;d worked so well on the second problem.</p>
<p>Sadly, although the negative-timeout-check fixed the original zombies, none of the fixes prevented the assert problem.  It wasn&#8217;t asserting anymore because the asserters were now looping, so now I had more zombies to deal with.</p>
<div id="attachment_3151" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-17-16_45_07-50-not-working.png"><img class="size-large wp-image-3151" title="2013-03-17 16_45_07-50-not-working" src="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-17-16_45_07-50-not-working-600x302.png" alt="" width="600" height="302" /></a><p class="wp-caption-text">Lots of zombie robots!</p></div>
<p>Time to get down and dirty and debug it for real.</p>
<p>As an aside, it&#8217;s a weird feeling when you&#8217;re debugging something on an <a href="http://aws.amazon.com/ec2/">EC2 instance</a>, since you&#8217;re paying for it hourly.  I felt a definite pressure to hurry up and debug faster&#8230;oh no, there went another $0.06 * 5 instances!</p>
<a name="Too+deep+we+delved+there%2C+and+woke+the+nameless+fear%21"></a><h3>Too deep we delved there, and woke the nameless fear!</h3>
<p>Like I said, debugging optimized code builds character, and I built a lot of character with this bug.  The assert was in a function that was inlined by the optimizer, which was in a function that was inlined by the optimizer, which was in a loop, which looked like it had been unrolled.  It was slow going, with lots of restarts and stuffing values into memory and registers so the code would execute again.  At one point, I thought I&#8217;d <a href="http://mailman.mit.edu/pipermail/krbdev/2013-March/011466.html">narrowed it down to a compiler bug in gcc</a>, because it seemed like a variable wasn&#8217;t getting reloaded from the stack correctly sometimes, but it was really hard to tell with all the inlining.  Even thinking it was a compiler bug was pretty silly and that thought always violates <a title="One Bug’s Story, or, Assume it’s a bug!" href="http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/">Assume it&#8217;s a Bug</a>, so I should have known better, but it happens. </p>
<p>Finally, a combination of stepping through the code, and looking at the code, and modifying the code revealed the problem. Here&#8217;s <a href="https://github.com/krb5/krb5/blob/krb5-1.9.2-final/src/lib/krb5/os/sendto_kdc.c#L1255">the source file at the version I was debugging</a>, linked to the area of the code where the bug lurked.  If you search for &#8220;host+1&#8243;, you will see that it occurs twice, once inside the loop, and once outside the loop.  This is what threw me when I was debugging&#8230;initially I didn&#8217;t notice there were two separate calls to service_fds(), so in the debugger I thought it was looping again but loading weird values.  I can only assume the second call almost never occurred in the wild for anybody but me after the inner loop on hosts completed, because in that case host+1 is n_conns+1, which is out-of-bounds for the connections.<sup><a href="http://www.spyparty.com/2013/03/18/loadtesting-for-open-beta-part-3/#footnote_3_3139" id="identifier_3_3139" class="footnote-link footnote-identifier-link" title="It never crashed because conns has a preallocated number of connections that was always bigger than n_conns+1">4</a></sup>  This bug was easy for me to fix locally, and it looks like it was (inadvertently?) fixed in <a href="https://github.com/krb5/krb5/commit/8b9d249e40601047e69c92d7acb578fd0bbafc00">this commit</a> in the main Kerberos code.</p>
<p>Thank goodness for open source code, where you can modify it and debug it when you run into troubles!</p>
<div id="attachment_3150" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-17-16_36_15-50-working.png"><img class="size-large wp-image-3150" title="2013-03-17 16_36_15-50-working" src="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-17-16_36_15-50-working-600x302.png" alt="" width="600" height="302" /></a><p class="wp-caption-text">No more zombies!</p></div>
<a name="Moar+Robots%21"></a><h3>Moar Robots!</h3>
<p>Now that I (thought I) was done debugging the robots, and I still had 5 EC2 instances running, I decided to see how well the instances did with 100 robots on each.  My original tests indicated I could only run about 50 per <a href="http://aws.amazon.com/ec2/instance-types/">m1.small</a> instance, but the client also got a lot more efficient with the differential state update change described last time, and it turns out 100 per instance is no problem, as you can see here:</p>
<div id="attachment_3147" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-16-02_53_11-100-robots.png"><img class="size-large wp-image-3147" title="2013-03-16 02_53_11-100-robots" src="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-16-02_53_11-100-robots-600x379.png" alt="" width="600" height="379" /></a><p class="wp-caption-text">Top on an m1.small instance running 100 robots at only 20% CPU.</p></div>
<p> The lobby was a little more grim with 501 clients:</p>
<div id="attachment_3153" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/03/SpyParty-v0.1.2602.1-20130316-02-53-39-0.png"><img class="size-large wp-image-3153" title="SpyParty-v0.1.2602.1-20130316-02-53-39-0" src="http://cdn.spyparty.com/wp-content/uploads/2013/03/SpyParty-v0.1.2602.1-20130316-02-53-39-0-600x415.png" alt="" width="600" height="415" /></a><p class="wp-caption-text">500 robots and me.</p></div>
<p> Here&#8217;s how the CPU looks with all these robots in the lobby, chatting at each other:</p>
<div id="attachment_3148" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-16-02_53_25-100-in-lobby.png"><img class="size-large wp-image-3148" title="2013-03-16 02_53_25-100-in-lobby" src="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-16-02_53_25-100-in-lobby-600x322.png" alt="" width="600" height="322" /></a><p class="wp-caption-text">atop in CPU mode with 500 robots in the lobby jabbering.</p></div>
<p>There are two cores in this machine, which is why the lobbyserver is at 115% CPU.  It&#8217;s mostly single-threaded for simplicity, but it uses threads for servicing network connections.</p>
<p>However, once the robots start playing each other, the CPU usage drops a bunch:</p>
<div id="attachment_3149" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-16-02_53_49-100-playing.png"><img class="size-large wp-image-3149" title="2013-03-16 02_53_49-100-playing" src="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-16-02_53_49-100-playing-600x322.png" alt="" width="600" height="322" /></a><p class="wp-caption-text">Stop talking, start playing!</p></div>
<p>This is pretty good news.  I think it means the chat system needs some work, because when everybody&#8217;s in the lobby all the chats go to all the players, but when people in are a match, chats only go between those two players, and they don&#8217;t get any of the lobby chats.  We&#8217;ll find out soon as I describe below.  Memory looks pretty good with 501 clients, staying at about 256kb per client:</p>
<pre style="padding-left: 30px;">2013/03/16-04:53:11: MEMORY_POSIX 501/993/492: resident 25540/25540, virtual 198000/198000<br />2013/03/16-04:53:11: MEMORY_NEW 501/993/492: bytes 132098963, news 69166, deletes 55478</pre>
<p>One last atop screenshot&#8230;this one is while the robots are starting up and connecting, but before they&#8217;re in the lobby:</p>
<div id="attachment_3146" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-16-02_52_57-startup.png"><img class="size-large wp-image-3146" title="2013-03-16 02_52_57-startup" src="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-16-02_52_57-startup-600x322.png" alt="" width="600" height="322" /></a><p class="wp-caption-text">Loadtest startup performance.</p></div>
<p>This one shows Kerberos and <a href="http://www.openldap.org/">OpenLDAP</a> taking a fair amount of time at the start of a new loadtest.  I use LDAP as the database backend for Kerberos, among other things, and when all of these robots are trying to get login tickets at the same time, it bogs down a bit.  I&#8217;m not too worried about this profile, since this scenario of 500 people all needing tickets at the same time is going to be rare (the tickets last a while, so this doesn&#8217;t happen every time), and there are well-known ways of scaling Kerberos and OpenLDAP if I need them.</p>
<p>Finally, here&#8217;s a shot of the 100 robots per instance:</p>
<div id="attachment_3152" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-17-16_47_28-100-working-plus-deadlock.png"><img class="size-large wp-image-3152" title="2013-03-17 16_47_28-100-working-plus-deadlock" src="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-17-16_47_28-100-working-plus-deadlock-600x302.png" alt="" width="600" height="302" /></a><p class="wp-caption-text">Wait a second&#8230;</p></div>
<p>Oh no!  Who the hell is that single zombie robot at the end on instance 4!?!  Sigh.  I find that machine, log in, attach the debugger, and check it out.  It looks like I have a pretty rare deadlock between two threads during shutdown.  I&#8217;m just going to ignore it for now and deal with it later.  All the bugs above were preventing robots from doing a good job at loadtesting, while this one is just preventing 1 out of 500 from shutting down completely&#8230;it can wait.  Here&#8217;s a shot of this guy, still in the lobby, mocking me:</p>
<div id="attachment_3154" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/03/SpyParty-v0.1.2602.1-20130316-02-55-54-0.png"><img class="size-large wp-image-3154" title="SpyParty-v0.1.2602.1-20130316-02-55-54-0" src="http://cdn.spyparty.com/wp-content/uploads/2013/03/SpyParty-v0.1.2602.1-20130316-02-55-54-0-600x415.png" alt="" width="600" height="415" /></a><p class="wp-caption-text">At least I have one more Sniper win on this debug server than this troll!</p></div>
<p>There&#8217;s actually another bug I found in the new differential state update code while I was testing this, where the server will send a duplicate client sometimes, but I had a comment in the code that I thought it might be possible, and now I know it is.  It turns out when you have 500 clients pounding on a server, you find bugs.</p>
<a name="Coming+Up+Next+Time"></a><h3>Coming Up Next Time</h3>
<p>Okay, so now we&#8217;ve got things where I can easily run a predictable number of loadtesting robots against the debug lobbyserver, and I&#8217;ve got some high level profiles telling me that I&#8217;m now CPU bound inside the server itself.  That points to a clear next step:  profile the code.  I use an old hacked up version of <a href="http://silverspaceship.com/src/iprof/">Sean Barrett&#8217;s iprof</a> for all my client runtime profiling, so my next task is to integrate that into the server code, and get it running on Linux.  That shouldn&#8217;t be too hard, and then I&#8217;ll be able to tell what&#8217;s actually taking the time<sup><a href="http://www.spyparty.com/2013/03/18/loadtesting-for-open-beta-part-3/#footnote_4_3139" id="identifier_4_3139" class="footnote-link footnote-identifier-link" title="This is only partially true, because iprof is single-threaded&hellip;I really wish there was a good cross-platform light-weight way to get per-thread timings.">5</a></sup> when a lot of clients are in the lobby.</p>
<p>My prediction, based on the above, is that the chat message handling is going to be the main culprit.  If so, it&#8217;ll be easy to queue up the chats and send them out in bunches, but I need to be careful here, because the robots chat a lot more than real humans would right now, so I don&#8217;t want to spend too much time optimizing this.  I think I&#8217;ll keep the robots as they are for the initial profiles, and then dial back their chattiness to more realistic levels after I&#8217;ve plucked the low-hanging chat fruit.  I also need to teach the robots how to use lobby rooms for a more realistic test.</p>
<p>Finally, I&#8217;m wondering if my usage of select() is going to be an issue as I get close to 1000 robots.  I may need to port to epoll().  We shall see!</p>
<p>&#8220;Assume Nothing!&#8221;</p>
<p>And finally, the SimCity launch has given me pause&#8230;I&#8217;m still forging ahead with my 1000 simultaneous goal, but I really hope it&#8217;s enough and things go smoothly.  I would much rather have a slow buildup of players over the next year as I roll out more cool stuff than a giant spike that melts everything and makes players grumpy.</p>
<p><a title="Loadtesting for Open Beta, Part 4: Done optimizing the lobbyserver!" href="http://www.spyparty.com/2013/05/21/loadtesting-for-open-beta-part-4-done-optimizing-the-lobbyserver/">On to Part 4&#8230;</a></p>
<hr/><ol class="footnotes"><li id="footnote_0_3139" class="footnote">ZOMBIE ROBOTS!!!</li><li id="footnote_1_3139" class="footnote">I have some local patches I haven&#8217;t cleaned up enough to contribute yet</li><li id="footnote_2_3139" class="footnote">gdb is not the best for assembly language debugging, but I did learn about &#8220;layout asm&#8221;, which helps a bit.</li><li id="footnote_3_3139" class="footnote">It never crashed because conns has a preallocated number of connections that was always bigger than n_conns+1</li><li id="footnote_4_3139" class="footnote">This is only partially true, because iprof is single-threaded&#8230;I really wish there was a good cross-platform light-weight way to get per-thread timings.</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.spyparty.com/2013/03/18/loadtesting-for-open-beta-part-3/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Come to drawnonward&#8217;s 100th stream party on Thursday, March 7th at midnight PST!</title>
		<link>http://www.spyparty.com/2013/03/05/come-to-drawnonwards-100th-stream-party-on-thursday-march-7th-at-midnight-pst/</link>
		<comments>http://www.spyparty.com/2013/03/05/come-to-drawnonwards-100th-stream-party-on-thursday-march-7th-at-midnight-pst/#comments</comments>
		<pubDate>Tue, 05 Mar 2013 05:16:24 +0000</pubDate>
		<dc:creator>checker</dc:creator>
				<category><![CDATA[events]]></category>
		<category><![CDATA[streams]]></category>

		<guid isPermaLink="false">http://www.spyparty.com/?p=3131</guid>
		<description><![CDATA[I guess Thursday midnight is technically Friday morning, unless midnight is defined as 00:00-ε or something, but anyway, the Official SpyParty Late Night Streamer™ drawnonward is celebrating his 100th stream on Thursday night at midnight, Pacific Standard Time (GMT-8), which means it&#8217;s Friday at 8am GMT (I think, unless I messed that math up), on his [...]]]></description>
				<content:encoded><![CDATA[<p>I guess Thursday midnight is technically Friday morning, unless midnight is defined as 00:00-ε or something, but anyway, the Official <strong>SpyParty</strong> Late Night Streamer™ <strong><a href="http://www.twitch.tv/drawnonward">drawnonward</a></strong> is celebrating his 100th stream on Thursday night at midnight, Pacific Standard Time (GMT-8), which means it&#8217;s Friday at 8am GMT (I think, unless I messed that math up), on his <a href="http://twitch.tv">twitch.tv</a> channel:</p>
<p style="text-align: center;"><strong><a href="http://www.twitch.tv/drawnonward">http://www.twitch.tv/drawnonward</a></strong></p>
<p>It should be fun and rather silly&#8230;he posted in the beta forums asking for ideas and people are gifting him random Steam stuff to give away to people in stream chat, he&#8217;s going to dress up as one of <a href="http://spyparty.wikia.com/wiki/Category:Characters">the current <strong>SpyParty</strong> characters</a> (he&#8217;s not telling which one!), and I&#8217;ll be joining him from midnight to 1am to play on his stream. Sadly, my internet this week is kind of slow and terrible, so I don&#8217;t think I can stream the other side on the <a href="http://twitch.tv/spyparty"><strong>SpyParty</strong> channel</a>, but I should be able to join him via voice chat.  My personal dream is that he&#8217;ll actually start using a non-default icon/image for his channel by then.</p>
<p>Come along, hang out in the stream chat, and it should be a good time. The big question is whether I&#8217;ll have this build done and tested before then. Maybe you can watch a brand new build fail spectacularly, live, while I curse over TeamSpeak!</p>
<p>PS. I think I have figured out <a href="http://www.spyparty.com/2013/03/03/loadtesting-for-open-beta-part-2/#Up+Next%2C+The+Case+of+the+Missing+Robots">who is killing my robots</a>, so Part 3 of the <a title="Loadtesting for Open Beta, Part 1" href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/">Loadtesting for Open Beta</a> series will be posted soon!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.spyparty.com/2013/03/05/come-to-drawnonwards-100th-stream-party-on-thursday-march-7th-at-midnight-pst/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Loadtesting for Open Beta, Part 2</title>
		<link>http://www.spyparty.com/2013/03/03/loadtesting-for-open-beta-part-2/</link>
		<comments>http://www.spyparty.com/2013/03/03/loadtesting-for-open-beta-part-2/#comments</comments>
		<pubDate>Sun, 03 Mar 2013 23:28:11 +0000</pubDate>
		<dc:creator>checker</dc:creator>
				<category><![CDATA[beta]]></category>
		<category><![CDATA[indie games]]></category>
		<category><![CDATA[metrics]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.spyparty.com/?p=3109</guid>
		<description><![CDATA[In our last exciting episode of Loadtesting for Open Beta, we did some initial profiling to see how the lobbyserver held up under attack by a phalanx of loadtesting robots spawned in the cloud. It didn&#8217;t hold up, obviously, or the beta would already be open. Specifically, it failed by saturating the server&#8217;s 100Mbps network [...]]]></description>
				<content:encoded><![CDATA[<p>In our <a title="Loadtesting for Open Beta, Part 1" href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/">last exciting episode of <em>Loadtesting for Open Beta</em></a>, we did some initial profiling to see how the lobbyserver held up under attack by a phalanx of loadtesting robots spawned in the cloud. It didn&#8217;t hold up, obviously, or the beta would already be open.</p>
<p>Specifically, it failed by saturating the server&#8217;s 100Mbps network link, which turned out to be a great way to fail because it meant there were some pretty simple things I could do to optimize the bandwidth utilization.  I had done the initial game<span style="font-size: medium;">↔</span>lobby protocol in the simplest way possible, so every time any player state changed, like a new connection, or switching from chatting in the lobby to playing, it sent out the entire list of player states to everybody.  This doesn&#8217;t scale at all, since as you add more players, most aren&#8217;t changing state, but you&#8217;re sending all of their states out to everybody even if only one changes.  This doesn&#8217;t mean it was the wrong way to program it initially; it&#8217;s really important when you&#8217;re writing complicated software<sup><a href="http://www.spyparty.com/2013/03/03/loadtesting-for-open-beta-part-2/#footnote_0_3109" id="identifier_0_3109" class="footnote-link footnote-identifier-link" title="especially by yourself!">1</a></sup> to do things the simplest way possible, as long as you have a vague plan for what you&#8217;ll do if it turns into a problem later.  In this case, I knew what I was doing was probably not going to work in the long run, but it got things up and running more quickly than overengineering some fancy solution I might not have needed, and I waited until it actually <em>was</em> a problem before fixing it.</p>
<a name="Tell+Me+Something+I+Don%26%238217%3Bt+Know"></a><h3>Tell Me Something I Don&#8217;t Know</h3>
<p>The solution to this problem is pretty obvious: differential state updates.  Or, in English, only send the stuff that&#8217;s changed to the people who care about it.  Doing differential updates is significantly more complicated than just spamming everybody with everything, however.  You still have to send the initial state of all the curent players when new players log in, you have to be able to add and remove players in the protocol, which you didn&#8217;t have to before because you were just sending the complete new state every time, etc.</p>
<p>This was going to be a fairly large change, so I took it by steps.  I knew that I&#8217;d have to send out the complete state of everybody to new logins, so it made sense to start by optimizing that initial packet using normal data size optimization techniques.  I pretty easily got it from about 88 bytes per player down to 42 bytes per player, which is nice, because my goal for these optimizations is 1000 simultaneous players, and at 88 bytes they wouldn&#8217;t all fit in my 64kb maximum packet size, where at 42 bytes they should fit, no problem, so I don&#8217;t have to add any kind of break-up-the-list-across-packets thing.  However, it turns out I actually got the ability to send the entire list across multiple packets while I was doing this, because I had to program the ability to add players as part of the differential updates, so now I could just use that packet type to send any clients in a really large player list that didn&#8217;t fit in a single packet.  But, like I said in the last episode, although I don&#8217;t think I&#8217;ll hit 1000 simultaneous outside of load testing for a while, it&#8217;s always nice to know you have that sort of thing in your back pocket for the future.</p>
<p>Once I&#8217;d tested the new optimized player list, I started making the updates differential.  New players get the initial list, and then they&#8217;re considered up-to-date and just get updates along with everybody else.  The list of new players is sent as additions to players already in the lobby.  For each player, I track some simple flags about what&#8217;s been updated in their state, so if they set or clear their /away message for example, that flag is set, and I only send that information.</p>
<p>In programming, usually when you&#8217;ve got the right design, you get some unintentional upside, and this case was no different.  Previously, I was not sending live updates to player stats (wins, game time, etc.) to the players in the lobby until the player was done playing the match, or some other state changed that caused everybody&#8217;s state to be re-sent.  Now, since the differential updates are efficient, I&#8217;m updating player stats in real time as well, so people in the lobby can see wins as they accumulate for players in matches, which is nice and how you&#8217;d expect it to work.</p>
<a name="Results"></a><h3>Results</h3>
<p>It basically worked exactly as planned.  After lots of debugging, of course.  Here you can see the profiles for one of the loadtests, which got to 340 simultaneous players in the lobby:</p>
<div id="attachment_3117" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/03/SpyParty-v0.1.2553.1-20130303-00-13-24-0.png"><img class="size-large wp-image-3117" title="SpyParty-v0.1.2553.1-20130303-00-13-24-0" src="http://cdn.spyparty.com/wp-content/uploads/2013/03/SpyParty-v0.1.2553.1-20130303-00-13-24-0-600x447.png" alt="" width="600" height="447" /></a><p class="wp-caption-text">I really need to have the robot Sniper win sometimes.</p></div>
<p>&nbsp;</p>
<div id="attachment_3115" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-03-00_13_34-atop-mem.png"><img class="size-large wp-image-3115" title="2013-03-03 00_13_34-atop-mem" src="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-03-00_13_34-atop-mem-600x243.png" alt="" width="600" height="243" /></a><p class="wp-caption-text">atop in memory mode</p></div>
<p>&nbsp;</p>
<div id="attachment_3114" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-03-00_13_31-atop-cpu.png"><img class=" wp-image-3114" title="2013-03-03 00_13_31-atop-cpu" src="http://cdn.spyparty.com/wp-content/uploads/2013/03/2013-03-03-00_13_31-atop-cpu-600x243.png" alt="" width="600" height="243" /></a><p class="wp-caption-text">atop in cpu mode</p></div>
<p>Look ma, 3% network utilization!  That&#8217;s whats so awesome about a really spiky profile&#8230;when you pound one of the spikes down, things just get better!</p>
<p>Here&#8217;s the new table of packet sizes for this run.  If you compare this with the <a title="Loadtesting for Open Beta, Part 1 - Packet Size Table" href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/#Update:+Assuming+More+Nothing&amp;#8230;Er,+Less+Nothing?">previous results</a>, you can see the PLAYER_LIST packets are way way way smaller, and this table was accumulated from two longer test runs, so it&#8217;s not even a fair comparison!  It&#8217;s interesting, because the TYPE_LOBBY_MESSAGE_PACKET is smaller as well, and I think that&#8217;s because now the robots can actually start games since the network isn&#8217;t saturated, and this means they don&#8217;t broadcast chats to the entire lobby while they&#8217;re playing, so that&#8217;s a nice side effect of optimizing the bandwidth.</p>
<table border="0" cellspacing="0" cellpadding="0" align="center">
<thead>
<tr>
<td><strong>Packet Type</strong></td>
<td align="right"><strong>Total Bytes</strong></td>
</tr>
</thead>
<tbody>
<tr>
<td>TYPE_LOBBY_MESSAGE_PACKET</td>
<td align="RIGHT">58060417</td>
</tr>
<tr>
<td>TYPE_LOBBY_PLAYER_LIST_UPDATE_PACKET</td>
<td align="RIGHT">29751413</td>
</tr>
<tr>
<td>TYPE_CLIENT_GAME_JOURNAL_PACKET</td>
<td align="RIGHT">18006186</td>
</tr>
<tr>
<td>TYPE_LOBBY_ROOM_LIST_PACKET</td>
<td align="RIGHT">16674479</td>
</tr>
<tr>
<td>TYPE_LOBBY_PLAYER_LIST_ADDITION_PACKET</td>
<td align="RIGHT">4280563</td>
</tr>
<tr>
<td>TYPE_LOBBY_PLAYER_LIST_PACKET</td>
<td align="RIGHT">3482691</td>
</tr>
<tr>
<td>TYPE_CLIENT_MESSAGE_PACKET</td>
<td align="RIGHT">1501822</td>
</tr>
<tr>
<td>TYPE_CLIENT_LOGIN_PACKET</td>
<td align="RIGHT">477356</td>
</tr>
<tr>
<td>TYPE_CLIENT_INVITE_PACKET</td>
<td align="RIGHT">435368</td>
</tr>
<tr>
<td>TYPE_LOBBY_INVITE_PACKET</td>
<td align="RIGHT">275781</td>
</tr>
<tr>
<td>TYPE_LOBBY_LOGIN_PACKET</td>
<td align="RIGHT">235878</td>
</tr>
<tr>
<td>TYPE_LOBBY_GAME_ID_PACKET</td>
<td align="RIGHT">96000</td>
</tr>
<tr>
<td>TYPE_LOBBY_GAME_OVER_PACKET</td>
<td align="RIGHT">68901</td>
</tr>
<tr>
<td>TYPE_CLIENT_GAME_ID_CONFIRM_PACKET</td>
<td align="RIGHT">40257</td>
</tr>
<tr>
<td>TYPE_LOBBY_PLAY_PACKET</td>
<td align="RIGHT">32498</td>
</tr>
<tr>
<td>TYPE_CLIENT_IN_MATCH_PACKET</td>
<td align="RIGHT">25714</td>
</tr>
<tr>
<td>TYPE_LOBBY_IN_MATCH_PACKET</td>
<td align="RIGHT">21204</td>
</tr>
<tr>
<td>TYPE_CLIENT_CANDIDATE_PACKET</td>
<td align="RIGHT">16089</td>
</tr>
<tr>
<td>TYPE_CLIENT_PLAY_PACKET</td>
<td align="RIGHT">12419</td>
</tr>
<tr>
<td>TYPE_CLIENT_GAME_ID_REQUEST_PACKET</td>
<td align="RIGHT">9610</td>
</tr>
<tr>
<td>TYPE_LOBBY_WELCOME_PACKET</td>
<td align="RIGHT">4494</td>
</tr>
<tr>
<td>TYPE_CLIENT_JOIN_PACKET</td>
<td align="RIGHT">4494</td>
</tr>
<tr>
<td>TYPE_KEEPALIVE_PACKET</td>
<td align="RIGHT">1011</td>
</tr>
<tr>
<td>TYPE_CLIENT_IDLE_PACKET</td>
<td align="RIGHT">24</td>
</tr>
</tbody>
</table>
<p>Hmm, I just noticed as I&#8217;m writing this that the resident memory utilization in the atop screenshot is way lower now than before&#8230;I wonder why&#8230; On the application side I take about 250kb per player right now, which at 340 players should be about 85MB.  Looking at the lobbyserver logs, right about when the screenshot was taken, the lobby self-reported this data:</p>
<pre style="padding-left: 30px;">2013/03/03-02:13:15: MEMORY_POSIX 348/757/409: resident 12808/12808, virtual 160276/160276<br />2013/03/03-02:13:15: MEMORY_NEW 348/757/409: bytes 91766974, news 45707, deletes 36155</pre>
<p>The MEMORY_NEW stats looks about right for this load and my quick math, but the MEMORY_POSIX stats—which are read from /proc/pid/status—match the atop results: expected virtual but low resident.   Maybe it was just paged out for a second, or maybe I&#8217;m not touching much of that 250kb and so it doesn&#8217;t stay resident.  A lot of it is network buffers, so it makes some sense with this lower bandwidth protocol that it wouldn&#8217;t be resident compared to last profile because less buffering is having to be done.  I&#8217;ll have to investigate this more.</p>
<a name="Up+Next%2C+The+Case+of+the+Missing+Robots"></a><h3>Up Next, The Case of the Missing Robots</h3>
<p>So, the bandwidth optimizations were a resounding success!  Plus, both the CPU and memory utilization of the lobbyserver are really reasonable and haven&#8217;t been optimized at all, so we&#8217;re sitting pretty for getting to 1000 simulataneous robots&#8230;</p>
<p>Except, where are the remaining 160 robots?  In the test above, I ran 10 EC2 instances, each with 50 robots, thinking the optimizations might let me get to 500 simultaneous and find the next performance issue&#8230;but it never got above 340 in the lobby.  I updated my perl loadtesting framework and had each instance output how many lobbyclients were running every two seconds with this shell command over ssh:</p>
<pre style="padding-left: 30px;">'while true; do echo `date +%T`,`pidof lobbyclient | wc -w`; sleep 2; done'</pre>
<p>And then I loaded that into gnuplot,<sup><a href="http://www.spyparty.com/2013/03/03/loadtesting-for-open-beta-part-2/#footnote_1_3109" id="identifier_1_3109" class="footnote-link footnote-identifier-link" title="&hellip;which I hate, but I forgot to install excel on my new laptop, and Google&rsquo;s spreadsheet sucks at pivottables, and the Office for Web excel doesn&rsquo;t even have them as far as I could tell!">2</a></sup> and graphed the number of robots on each instance:</p>
<div id="attachment_3116" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/03/ec2-loadtest-client-counts.png"><img class="size-large wp-image-3116" title="ec2-loadtest-client-counts" src="http://cdn.spyparty.com/wp-content/uploads/2013/03/ec2-loadtest-client-counts-600x344.png" alt="" width="600" height="344" /></a><p class="wp-caption-text">The number of loadtest robots running on each EC2 instance.</p></div>
<p>You can see that they all started up with 50, but then a bunch of them lost clients until they found a steady state.   Something is killing my robots, and I need to figure out what it is&#8230;</p>
<p><a title="Loadtesting for Open Beta, Part 3" href="http://www.spyparty.com/2013/03/18/loadtesting-for-open-beta-part-3/">Turn the page to Part 3&#8230;</a></p>
<hr/><ol class="footnotes"><li id="footnote_0_3109" class="footnote">especially by yourself!</li><li id="footnote_1_3109" class="footnote">&#8230;which I hate, but I forgot to install excel on my new laptop, and Google&#8217;s spreadsheet sucks at pivottables, and the Office for Web excel doesn&#8217;t even have them as far as I could tell!</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.spyparty.com/2013/03/03/loadtesting-for-open-beta-part-2/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Loadtesting for Open Beta, Part 1</title>
		<link>http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/</link>
		<comments>http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/#comments</comments>
		<pubDate>Thu, 28 Feb 2013 03:21:24 +0000</pubDate>
		<dc:creator>checker</dc:creator>
				<category><![CDATA[beta]]></category>
		<category><![CDATA[indie games]]></category>
		<category><![CDATA[metrics]]></category>
		<category><![CDATA[programming]]></category>

		<guid isPermaLink="false">http://www.spyparty.com/?p=3072</guid>
		<description><![CDATA[Way back in 2011, right before I opened up Early-Access Beta signups, I loadtested and optimized the signup page to make sure it wouldn&#8217;t crash if lots of people were trying to submit their name and email and confirm their signup. I always intended to write up a technical post or two about that optimization [...]]]></description>
				<content:encoded><![CDATA[<p><a title="Here we go…it’s SpyParty Beta time!" href="http://www.spyparty.com/2011/05/10/here-we-go-its-spyparty-beta-time/">Way back in 2011</a>, right before I opened up <a title="Sign Up for the SpyParty Early-Access Beta!" href="http://www.spyparty.com/beta-sign-up/"><em>Early-Access Beta</em></a> signups, I loadtested and optimized the signup page to make sure it wouldn&#8217;t crash if lots of people were trying to submit their name and email and confirm their signup. I always intended to write up a technical post or two about that optimization process because it was an interesting engineering exercise, but I have yet to get around to it. However, I can summarize the learnings here pretty quickly: <a href="http://wordpress.org/">WordPress</a> is excruciatingly slow, <a href="https://www.varnish-cache.org/">Varnish</a> is incredibly fast, I ♥ <a href="http://www.perl.org/">Perl</a>,<sup><a href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/#footnote_0_3072" id="identifier_0_3072" class="footnote-link footnote-identifier-link" title="See this thread for how I wrote the dynamic loadtesting form submission in a way that would saturate the network link.">1</a></sup> <a href="http://httpd.apache.org/">Apache</a> with plain old mod_php (meaning <em>not</em> loading WordPress) was actually <em>way</em> faster than I expected, slightly faster even than <a href="http://nginx.org/">nginx</a> + php-fpm in my limited tests, <a href="http://aws.amazon.com/cloudfront/">CloudFront</a> is pretty easy to use,<sup><a href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/#footnote_1_3072" id="identifier_1_3072" class="footnote-link footnote-identifier-link" title="I use CF for images and other static stuff, with W3 Total Cache to keep them synced to S3, but I only use W3TC for this CDN sync, since Varnish blows it out of the water for actual caching.">2</a></sup> and even cheap and small dedicated servers can handle a lot of traffic if you&#8217;re smart about it.</p>
<p>Like with any kind of optimization, <em><a href="http://www.phatcode.net/res/224/files/html/ch03/03-01.html">Assume Nothing</a></em>, so you should always write the loadtester first, and run it to get a baseline performance profile, and continue running it as you optimize the hotspots. When I started, the signup submission could only handle 2 or 3 submits per second. When I was done, it could handle 400 submissions per second. I figured that was enough.<sup><a href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/#footnote_2_3072" id="identifier_2_3072" class="footnote-link footnote-identifier-link" title="Let me be clear, I think 400 submissions per second is really pretty slow for raw performance on a modern computer, but web apps these days have so many layers that you lose a ton of performance relative to what would happen if you wrote the whole thing in C. For an interesting example of this, there&rsquo;s a wacky high performance web server called G-WAN that gets rid of all the layers and lets you write the pages directly in compiled C.">3</a></sup> If more than 400 people were signing up for the <strong>SpyParty</strong> beta every second, well, let&#8217;s file that under &#8220;good problem to have&#8221;.</p>
<p>After all the loadtesting and optimizing, the signups <a title="Beta Data" href="http://www.spyparty.com/2011/05/12/beta-data/">went off without a hitch</a>.</p>
<p>Loadtesting and optimizing the beta signup process was important, because the entire reason I took signups instead of just letting people play immediately was &#8220;fear of the unknown&#8221;. I couldn&#8217;t know in advance how many people would be interested in the game, and getting a couple web forms scalable in case that number was &#8220;a lot&#8221; was much easier than getting the full game and its server scalable, and that&#8217;s ignoring the very real need to exert some control over the growth of the community, to make sure the game wasn&#8217;t incredibly buggy on different hardware configurations or that there wasn&#8217;t some glaring balance issue, etc. Overall, starting with signups and a closed beta was great for the game, even if it&#8217;s meant frustrating people who signed up and want to play.</p>
<p>But it&#8217;s been long enough, and I&#8217;m now finally actively loadtesting and optimizing for opening the beta!</p>
<a name="Lobby+Loadtesting+Framework"></a><h3>Lobby Loadtesting Framework</h3>
<p>Like with the signup form, I&#8217;m loadtesting first. This will tell me where I need to optimize, and allow me to test my progress against the baseline. However, loadtesting a game lobby server is a lot more complicated than loadtesting a web form, so it&#8217;s a bit slower-going. I&#8217;ve had to create a robot version of the game client that logs into the lobby, chats, invites other robots to play, and then reports on the results of the fake games played. I build this on top of the game&#8217;s client interface, so it looks just like a real game to the lobby.</p>
<p>As with all testing, you need to make sure you aren&#8217;t <a href="http://en.wikipedia.org/wiki/Werner_Heisenberg">Heisenberg</a>-ing<sup><a href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/#footnote_3_3072" id="identifier_3_3072" class="footnote-link footnote-identifier-link" title="I just read on wikipedia that the uncertainty principle is often confused with the observer effect, and so on the surface this verbing of Heisenberg&rsquo;s name isn&rsquo;t correct, except he apparently also confused the two, so I&rsquo;m going to keep on verbing.">4</a></sup> your results, so I wanted to get fairly close to the same load that would happen with multiple real game clients hitting the server. This means I had to have a good number of machines running these robots hitting the test lobby at the same time, and that means using cloud computing. I was inspired by the <a href="http://blog.apps.chicagotribune.com/2010/07/08/bees-with-machine-guns/"><em>bees with machine guns</em></a> article about using Amazon Web Services&#8217;s Elastic Compute Cloud (EC2) to launch a bunch of cheap http load testers. I use AWS for <strong>SpyParty</strong> already, distributing updates and uploading crashdumps using S3, so this seemed like a good fit. At first I tried modifying the bees code to do what I want, but I found the Python threading technique they used for controlling multiple instances didn&#8217;t scale well running on Windows, and since I wanted more control over the instances anyway and the core idea was not terribly difficult to implement, I wrote my own version in Perl, which I&#8217;m much more familiar with. The code uses <a href="http://search.cpan.org/~mallen/Net-Amazon-EC2-0.23/lib/Net/Amazon/EC2.pm">Net::Amazon::EC2</a> to talk to AWS to start, list, and stop EC2 instances, and <a href="http://search.cpan.org/~rkitover/Net-SSH2-0.48/lib/Net/SSH2.pm">Net::SSH2</a> to talk to the instances themselves, executing commands and waiting for exit codes, downloading logs, and whatnot. I just use an existing CentOS EC2 AMI<sup><a href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/#footnote_4_3072" id="identifier_4_3072" class="footnote-link footnote-identifier-link" title="ami-c9846da0">5</a></sup> and then have the scripts download and install my robots onto it from S3 every time I start one up; I didn&#8217;t want to bother with creating a custom AMI when my files are pretty small. I&#8217;m going to post all the loadtest framework code once I&#8217;ve got it completely working so others can use it.</p>
<a name="How+Much+is+Enough%3F"></a><h3>How Much is Enough?</h3>
<p>In loadtesting the loadtesters, I found that an <a href="http://aws.amazon.com/ec2/instance-types/"><em>m1.small</em> instance</a> could run about 50 loadtest bots simultaneously with my current client code. I can switch to larger and more expensive EC2 instance types if I need to run more robots per instance, and as I optimize the server I&#8217;m pretty sure the client code will get optimized as well, which will allow more concurrency. Amazon limits accounts to 20 simultaneous EC2 instances until you apply for an exception, so I&#8217;ve done that,<sup><a href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/#footnote_5_3072" id="identifier_5_3072" class="footnote-link footnote-identifier-link" title="although they haven&rsquo;t gotten back to me so I guess I&rsquo;ll apply again&hellip;sigh, customer service &ldquo;in the cloud&rdquo; &nbsp;Update: Woot! &nbsp;My limit has been increased, now I can DDOS myself to my heart&rsquo;s content!">6</a></sup> but even with that limitation, I can loadtest to about 1000 concurrent clients, which seems like more than enough for now.</p>
<p>I still don&#8217;t know exactly what to expect when I open up the beta, but I don&#8217;t think I&#8217;ll hit 1000 simultaneous <strong>SpyParty</strong> players outside of loadtesting anytime soon. If you look at <a href="http://store.steampowered.com/stats/">the Steam Stats page</a>, 1000 simultaneous players is right in the middle of the top 100 games on the entire service, including some pretty popular mainstream games with mature player communities. In the current closed beta, I think our maximum number of simultaneous players has been around 25, and it&#8217;s usually between 10 and 15 on any given night at peak times, assuming there&#8217;s no event happening and I haven&#8217;t just sent out a big batch of invites. I still have about 6000 people left to invite for the first time from the signup list, and 9000 who didn&#8217;t register on their first invite to re-invite, all of whom I&#8217;ll use for live player loadtesting after the 1000 robots are happily playing without complaints. I think the spike from those last closed invites will be bigger than the open beta release spike, unless there are a ton of people who didn&#8217;t want to sign up with their email address, but who will buy the game once the beta is open. I guess that&#8217;s possible, but who knows? Again, if we go over 1000 simultaneous, I guess I will scramble to move the lobby to a bigger server, and keep repeating the &#8220;good problem to have&#8221; mantra over and over again, but I&#8217;m betting it&#8217;s not going to happen and things will go smoothly.</p>
<p>After open beta there will be a long list of awesome stuff coming into the game, including new maps and missions, spectation and replays, the <a title="The New SpyParty Character Art Style" href="http://www.spyparty.com/2012/08/27/the-new-spyparty-character-art-style/">new art</a>, and lots more, but once things are open it&#8217;ll be easier to predict the size of those spikes and plan accordingly. Eventually I&#8217;ll probably (hopefully?) have to move the lobby off my current server, but I&#8217;m pretty sure based on my initial testing that the old girl can keep things going smoothly a bit longer.</p>
<a name="Initial+Loadtesting+Baseline"></a><h3>Initial Loadtesting Baseline</h3>
<p>Okay, so what happens when I unleash the robots? Well, I haven&#8217;t let 1000 of them loose yet, but I&#8217;ve tried 500, and things fall over, as you might expect. It looks like around 250 is the maximum that can even connect right now, which is actually more than I thought I&#8217;d start out with.</p>
<div id="attachment_3075" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/02/SpyParty-v0.1.2532.0-20130227-11-56-33-0.png"><img class="size-large wp-image-3075" title="SpyParty-v0.1.2532.0-20130227-11-56-33-0" src="http://cdn.spyparty.com/wp-content/uploads/2013/02/SpyParty-v0.1.2532.0-20130227-11-56-33-0-600x505.png" alt="" width="600" height="505" /></a><p class="wp-caption-text">The loadtesting robots are not very good conversationalists.</p></div>
<p>Things don&#8217;t work very well even with 250 clients, though, with connections failing, and match invites not going through.<sup><a href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/#footnote_6_3072" id="identifier_6_3072" class="footnote-link footnote-identifier-link" title="Let&rsquo;s ignore the lobby UI also drawing all over itself for now.">7</a></sup> However, when I looked at <a href="http://www.atoptool.nl">atop</a> while the robots were pounding on the lobby, a wonderful thing was apparent:</p>
<div id="attachment_3096" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/02/2013-02-27-12_32_00-atop-cpu.png"><img class="size-large wp-image-3096" title="2013-02-27 12_32_00-atop-cpu" src="http://cdn.spyparty.com/wp-content/uploads/2013/02/2013-02-27-12_32_00-atop-cpu-600x256.png" alt="" width="600" height="256" /></a><p class="wp-caption-text">atop in CPU mode</p></div>
<p>&nbsp;</p>
<div id="attachment_3095" class="wp-caption aligncenter" style="width: 610px"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/02/2013-02-27-12_31_53-atop-mem.png"><img class="size-large wp-image-3095" title="2013-02-27 12_31_53-atop-mem" src="http://cdn.spyparty.com/wp-content/uploads/2013/02/2013-02-27-12_31_53-atop-mem-600x256.png" alt="" width="600" height="256" /></a><p class="wp-caption-text">atop in memory mode</p></div>
<p>Neither the CPU utilization nor the memory utilization was too terrible, but the lobbyserver was saturating the 100 Mbps ethernet link! That&#8217;s awesome, because that&#8217;s going to be easy to fix!</p>
<p>Before I explain, let me say that the best kind of profile is one with a single giant spike, one thing that&#8217;s obviously completely slow and working poorly. The worse kind of profile is a flat line, where everything is taking 3% of the time and there&#8217;s no single thing you can optimize. This is a great profile, because it points right towards the first thing I need to fix, which is the network bandwidth.</p>
<p>My protocol between the game clients and the lobby server is really pretty dumb in a lot of ways, but the biggest way it&#8217;s dumb is that on any state change of any client, it sends the entire list of clients and their current state to every client. This is the simplest thing to do and means there&#8217;s no need to track which clients have received which information, and this in turn means it&#8217;s the right thing to do first when you&#8217;re getting things going, but it&#8217;s also terribly wasteful performance-wise compared to just sending out the clients who changed each tick. So, I was delighted to see that bandwidth was my first problem, because it&#8217;s easy to see that I have to fix the protocol. I&#8217;m guessing switching to a differential player state update will cut the bandwidth by 50x, which will then reveal the next performance spike.</p>
<p style="text-align: left;">I can&#8217;t wait to find out what it will be!<sup><a href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/#footnote_7_3072" id="identifier_7_3072" class="footnote-link footnote-identifier-link" title="You can see the CPU usage is pretty high relative to the memory usage, and seeing slapd and krb5kdc in there is a bit worrying, since that&rsquo;s kerberos and ldap, which are used for the login and client authentication and are going to be a bit harder to optimize if they start poking their heads up too high, but both of them have very battle-tested enterprise-scale optimization solutions via replication, so worst-case is I&rsquo;ll have to get another machine for them, I think. If the lobbyserver itself is still CPU-bound after fixing the bandwidth issue, then I&rsquo;ll start normal code optimization for it, including profiling, of course. I&rsquo;ll basically recurse on the lobbyserver executable!">8</a></sup></p>
<p>Oh, and the total EC2 bill for my loadtesting over the past few days: $5.86</p>
<a name="So%26%238230%3BOpen+Beta%3F"></a><h3>So&#8230;Open Beta?</h3>
<p>Within weeks! Weeks, I tell you!</p>
<p>Oh, and as I&#8217;ve said before, everybody who is signed up will get invited in before open beta. I will then probably have a short &#8220;quiet period&#8221; where I let things settle down before really opening it up, so if you want in before open beta, <a title="Sign Up for the SpyParty Early-Access Beta!" href="http://www.spyparty.com/beta-sign-up/">sign up now</a>.</p>
<a name="Update%3A+Assuming+More+Nothing%26%238230%3BEr%2C+Less+Nothing%3F"></a><h3>Update: Assuming More Nothing&#8230;Er, Less Nothing?</h3>
<p>After posting this article, I was about to start optimizing the client list packets, when it occurred to me I wasn&#8217;t <a href="http://www.phatcode.net/res/224/files/html/ch03/03-01.html">assuming enough nothing</a>, because I was assuming it was the client list taking all the bandwidth. This made me a bit nervous, which is the right feeling to have when you&#8217;re not following your own advice,<sup><a href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/#footnote_8_3072" id="identifier_8_3072" class="footnote-link footnote-identifier-link" title="&hellip;let alone Mike Abrash&rsquo;s advice!">9</a></sup> so I implemented a really simple bit of code that accumulated the per-packet send and recieve sizes, and printed them on exit, and then threw another 250 robots at the server for 60 seconds. The results validated the client list assumption, it&#8217;s by far the biggest bandwidth consumer, sending 1.6GB in 60 seconds.<sup><a href="http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/#footnote_9_3072" id="identifier_9_3072" class="footnote-link footnote-identifier-link" title="Or actually trying to send, since 1.6GB in 60 seconds is 200Mbps, which is not happening on a 100Mbps link!">10</a></sup> However, it did show that the lobby sending chat and status messages to the clients is also maybe going to be a problem, so yet again: <em>measuring things is crucial</em>.</p>
<table border="0" cellspacing="0" cellpadding="0" align="center">
<thead>
<tr>
<td><strong>Packet Type</strong></td>
<td align="right"><strong>Total Bytes</strong></td>
</tr>
</thead>
<tbody>
<tr>
<td>TYPE_LOBBY_PLAYER_LIST_PACKET</td>
<td align="right">1632549877</td>
</tr>
<tr>
<td>TYPE_LOBBY_MESSAGE_PACKET</td>
<td align="right">66687600</td>
</tr>
<tr>
<td>TYPE_LOBBY_ROOM_LIST_PACKET</td>
<td align="right">9474937</td>
</tr>
<tr>
<td>TYPE_CLIENT_INVITE_PACKET</td>
<td align="right">303056</td>
</tr>
<tr>
<td>TYPE_CLIENT_MESSAGE_PACKET</td>
<td align="right">226779</td>
</tr>
<tr>
<td>TYPE_CLIENT_LOGIN_PACKET</td>
<td align="right">157795</td>
</tr>
<tr>
<td>TYPE_LOBBY_INVITE_PACKET</td>
<td align="right">131667</td>
</tr>
<tr>
<td>TYPE_LOBBY_LOGIN_PACKET</td>
<td align="right">77951</td>
</tr>
<tr>
<td>TYPE_KEEPALIVE_PACKET</td>
<td align="right">43032</td>
</tr>
<tr>
<td>TYPE_CLIENT_GAME_JOURNAL_PACKET</td>
<td align="right">5478</td>
</tr>
<tr>
<td>TYPE_LOBBY_PLAY_PACKET</td>
<td align="right">1888</td>
</tr>
<tr>
<td>TYPE_LOBBY_WELCOME_PACKET</td>
<td align="right">1491</td>
</tr>
<tr>
<td>TYPE_CLIENT_JOIN_PACKET</td>
<td align="right">1491</td>
</tr>
<tr>
<td>TYPE_CLIENT_PLAY_PACKET</td>
<td align="right">836</td>
</tr>
<tr>
<td>TYPE_CLIENT_IN_MATCH_PACKET</td>
<td align="right">713</td>
</tr>
<tr>
<td>TYPE_LOBBY_IN_MATCH_PACKET</td>
<td align="right">532</td>
</tr>
<tr>
<td>TYPE_CLIENT_CANDIDATE_PACKET</td>
<td align="right">490</td>
</tr>
<tr>
<td>TYPE_LOBBY_GAME_ID_PACKET</td>
<td align="right">300</td>
</tr>
<tr>
<td>TYPE_CLIENT_GAME_ID_REQUEST_PACKET</td>
<td align="right">30</td>
</tr>
</tbody>
</table>
<p>It&#8217;s interesting that the clients are only sending 300KB worth of chat messages to the lobby, but it&#8217;s sending 66MB back to them, but 66MB is around 250 * 300KB, so it makes back-of-the-envelope sense. I&#8217;m probably going to need to investigate that more once I&#8217;ve hammered the player list traffic down. Maybe I&#8217;ll have to accumulate them every tick, compress them all, and send them out.</p>
<p><a title="Loadtesting for Open Beta, Part 2" href="http://www.spyparty.com/2013/03/03/loadtesting-for-open-beta-part-2/">This way to Part 2&#8230;</a></p>
<hr/><ol class="footnotes"><li id="footnote_0_3072" class="footnote">See <a href="http://www.perlmonks.org/?node_id=901638">this thread</a> for how I wrote the dynamic loadtesting form submission in a way that would saturate the network link.</li><li id="footnote_1_3072" class="footnote">I use CF for images and other static stuff, with <a href="http://wordpress.org/extend/plugins/w3-total-cache/">W3 Total Cache</a> to keep them synced to S3, but I only use W3TC for this CDN sync, since Varnish blows it out of the water for actual caching.</li><li id="footnote_2_3072" class="footnote">Let me be clear, I think 400 submissions per second is really pretty slow for raw performance on a modern computer, but web apps these days have so many layers that you lose a ton of performance relative to what would happen if you wrote the whole thing in C. For an interesting example of this, there&#8217;s a wacky high performance web server called <a href="http://gwan.com/benchmark/babel.html">G-WAN</a> that gets rid of all the layers and lets you write the pages directly in compiled C.</li><li id="footnote_3_3072" class="footnote">I just read on wikipedia that the <a href="http://en.wikipedia.org/wiki/Uncertainty_principle">uncertainty principle</a> is often confused with the <a href="http://en.wikipedia.org/wiki/Observer_effect_%28physics%29">observer effect</a>, and so on the surface this verbing of Heisenberg&#8217;s name isn&#8217;t correct, except he apparently also confused the two, so I&#8217;m going to keep on verbing.</li><li id="footnote_4_3072" class="footnote">ami-c9846da0</li><li id="footnote_5_3072" class="footnote">although they haven&#8217;t gotten back to me so I guess I&#8217;ll apply again&#8230;sigh, customer service &#8220;in the cloud&#8221;  Update: Woot!  My limit has been increased, now I can DDOS myself to my heart&#8217;s content!</li><li id="footnote_6_3072" class="footnote">Let&#8217;s ignore the lobby UI also drawing all over itself for now.</li><li id="footnote_7_3072" class="footnote">You can see the CPU usage is pretty high relative to the memory usage, and seeing slapd and krb5kdc in there is a bit worrying, since that&#8217;s <a href="http://web.mit.edu/Kerberos/">kerberos</a> and <a href="http://www.openldap.org/">ldap</a>, which are used for the login and client authentication and are going to be a bit harder to optimize if they start poking their heads up too high, but both of them have very battle-tested enterprise-scale optimization solutions via replication, so worst-case is I&#8217;ll have to get another machine for them, I think. If the lobbyserver itself is still CPU-bound after fixing the bandwidth issue, then I&#8217;ll start normal code optimization for it, including profiling, of course. I&#8217;ll basically recurse on the lobbyserver executable!</li><li id="footnote_8_3072" class="footnote">&#8230;let alone Mike Abrash&#8217;s advice!</li><li id="footnote_9_3072" class="footnote">Or actually <em>trying to send</em>, since 1.6GB in 60 seconds is 200Mbps, which is not happening on a 100Mbps link!</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.spyparty.com/2013/02/27/loadtesting-for-open-beta-part-1/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Happy Valentine&#8217;s Day, Belated</title>
		<link>http://www.spyparty.com/2013/02/15/happy-valentines-day-belated/</link>
		<comments>http://www.spyparty.com/2013/02/15/happy-valentines-day-belated/#comments</comments>
		<pubDate>Fri, 15 Feb 2013 19:37:12 +0000</pubDate>
		<dc:creator>checker</dc:creator>
				<category><![CDATA[miscellaneous]]></category>

		<guid isPermaLink="false">http://www.spyparty.com/?p=3065</guid>
		<description><![CDATA[The ever-awesome ZeroTKA had an awesome idea for a Valentine&#8217;s Day surprise, and then my daughter gave me a cold and I couldn&#8217;t get it done for the actual day.  But hey, it&#8217;s still Valentine&#8217;s Day in Damon&#8217;s heart. This build, v0.1.2487.0, was supposed to be a quicky, but it snowballed bigtime.  Here are the release notes [...]]]></description>
				<content:encoded><![CDATA[<p>The ever-awesome <a href="https://twitter.com/zerotka">ZeroTKA</a> had an awesome idea for a Valentine&#8217;s Day surprise, and then my daughter gave me a cold and I couldn&#8217;t get it done for the actual day.  But hey, it&#8217;s still Valentine&#8217;s Day in Damon&#8217;s heart.</p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-3066" title="sniperheart" src="http://cdn.spyparty.com/wp-content/uploads/2013/02/sniperheart.png" alt="" width="484" height="424" /></p>
<p style="text-align: left;">This build, v0.1.2487.0, was supposed to be a quicky, but it snowballed bigtime.  Here are the release notes from the private <a title="Sign Up for the SpyParty Early-Access Beta!" href="http://www.spyparty.com/beta-sign-up/">beta</a> forums:</p>
<blockquote>
<p style="text-align: left;">Uh, this was supposed to be a quick update.</p>
<ul class="tightlist">
<li>add idle timestamp to invite confirmation if player is idle</li>
<li>s/In Lobby/In Room/ in lobby status</li>
<li>auto-idle state in lobby after 5 minutes of no input to SpyParty</li>
<li>display away message when playing</li>
<li>tiebreaker for both players hitting spy or sniper at the same time in match menu &#8211; kind of a hack</li>
<li>wait a bit after putting down briefcase</li>
<li>couriers never stand on a pad to pick up or put down briefcase</li>
<li>ambassador won&#8217;t put down briefcase unless it can be picked up without standing on a pad</li>
<li>display game type and level on match results screen</li>
<li>Happy Valentine&#8217;s Day (belated due to a cold) &#8211; thank <strong>zerotka</strong> for the awesome idea!</li>
<li>pointier convo arrow</li>
<li>reenable previous talker arrow</li>
<li>add concept of &#8220;important&#8221; NPCs (like green statue swapper)</li>
<li>sort important NPCs higher/lower than unknown cast</li>
<li>put a grey bar on them in the portraits on result screen</li>
<li>play microfilm failure animation if bailing on action test</li>
<li>drink events for accept, reject, offer, waiter giving up</li>
<li>okay/good/bad -&gt; white/green/red in mission preview text</li>
<li>much more visible sniper crosshairs</li>
<li>try harder to find a spot if you&#8217;re forced to go to pedestal</li>
<li>delay forcing swapper until outside min radius and then random timer</li>
<li>finally solved the mystery of the floor pad rotations, old click-to-move ui vestige, thanks kcmmmmm!</li>
<li>round event for cast member picking up pending statue</li>
<li>handle room name escapes before eating whitespace</li>
<li>ignore escape until endgame results synced to avoid lobby kicks and disabled Spy/Sniper menu items</li>
<li>setting to not print /help on lobby entry</li>
<li>allow big lobby font, allow ellipsis in settings and lobby escape menu</li>
<li>Damon at least looks at the corpse, for god&#8217;s sake</li>
<li>arrow ui for conversation talking turns</li>
<li>version number in screenshot name</li>
<li>toby offers drink and waits until accept or refuse</li>
<li>break up too-long chat lines correctly with wrapping</li>
<li>settings in lobby escape menu uses new lobby layout</li>
<li>small timestamp font, only do time, not date (clipboard still gets full date)</li>
<li>sync up /time format</li>
<li>block actions while putting down or handing off briefcase</li>
<li>allow Toby to go more places and offer more drinks (like the back side of Balcony, which he basically was afraid to visit)</li>
<li>new walking bug tuning, midpoint, tighter radii, etc</li>
<li>display autolock animation times correctly in spy disk</li>
<li>move ambassador personal space radius in a bit so not as jumpy</li>
<li>bulge cy2 pillar out a bit more, but NPCs really hug it, better practice this or get shot by r7</li>
<li>whiter whites for your username</li>
<li>fix bug if spawn at convo center you&#8217;re going to</li>
<li>if you&#8217;re finishing convo because you&#8217;re lonely, and somebody shows up, go back to listening</li>
<li>retry look areas if your spot is taken</li>
<li>clear the customer&#8217;s queue so don&#8217;t take drinks long after waiter leaves</li>
<li>make veranda front a little wider for pathing sanity</li>
<li>walking people count as violating ambassador&#8217;s personal space, so spy can&#8217;t twitch</li>
<li>version check content package to avoid weird bugs</li>
<li>fix lobby chat crash bug</li>
<li>draw cast bars with quads, size properly</li>
<li>sort known-to-sniper cast members low, unknown (like seduction target, no-suspected double agents) high</li>
<li>splash screen works on wineskin (non-transparent)</li>
<li>expand results screen portraits on mouseover now that we have a mouse on that screen</li>
<li>fix results display for multiple consecutive time-adds</li>
<li>only push round events if playing</li>
<li>push round event on shift-m add time</li>
<li>flicker ambassador ui during leaving, not go to red (ST color)</li>
<li>grey outlines in results, not yellow</li>
</ul>
<p style="text-align: left;">So, the briefcase/floor-pad thing is kinda huge. Having to actively reject Toby is kinda huge. New green swap is big. I don&#8217;t even remember what else. Yikes.</p>
<p>Chris</p>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.spyparty.com/2013/02/15/happy-valentines-day-belated/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>One Bug&#8217;s Story, or, Assume it&#8217;s a bug!</title>
		<link>http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/</link>
		<comments>http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/#comments</comments>
		<pubDate>Sun, 10 Feb 2013 03:07:44 +0000</pubDate>
		<dc:creator>checker</dc:creator>
				<category><![CDATA[beta]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[streams]]></category>

		<guid isPermaLink="false">http://www.spyparty.com/?p=3024</guid>
		<description><![CDATA[This is the story of a bug in SpyParty.  This story has a happy ending, because the SpyParty beta testers are amazing, and they are constantly helping find bugs, of course, but they are also constantly helping me reproduce bugs, and narrow down the potential causes of bugs, and triage them, and are generally providing [...]]]></description>
				<content:encoded><![CDATA[<p>This is the story of a bug in <strong>SpyParty</strong>.  This story has a happy ending, because the <strong>SpyParty</strong> beta testers are amazing, and they are constantly helping find bugs, of course, but they are also constantly helping me reproduce bugs, and narrow down the potential causes of bugs, and triage them, and are generally providing me with incredible support so I can make the game better.</p>
<p>This bug also has an interesting story, because it turned out to have a very subtle cause, one that manifested itself intermittently in ways that looked almost random, and for a long time there was no &#8220;repro case&#8221;.  Getting a repro on a bug is the key to fixing it, as I discuss in the <a title="How to Report Bugs the SpyParty Way" href="http://www.spyparty.com/2012/04/12/how-to-report-bugs-the-spyparty-way/">How to Report Bugs the <strong>SpyParty</strong> Way</a> post.  It can make the difference between a 10 minute fix and a 10 day fix&#8230;or never managing to find and fix it.  <em></em></p>
<p><em>*shudder*</em></p>
<p>But let&#8217;s start at the beginning&#8230;</p>
<p>I first noticed the bug long before I&#8217;d invited people into the beta, but it was so rare that I didn&#8217;t prioritize finding it, and in fact I would forget about it for stretches of time.  Yes, I make notes of bugs I see, but I&#8217;ve got so much high priority stuff to do right now that I don&#8217;t go back and look at that list very often&#8230;the important bugs get fixed immediately, but a bug like this can stay in for a long time.</p>
<p>So, what&#8217;s the bug?  Well, let&#8217;s go to the videotape:</p>
<p><a href="http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/"><em>Click here to view the embedded video.</em></a></p>
<p>This video was from a <a href="http://www.youtube.com/watch?v=ASkd9vpner4">Spy Commentary game</a> played between <strong>buxx</strong> and <strong>dieffenbachj</strong>, two early beta testers who were some of the first to upload gameplay videos.</p>
<p>It turns out, in addition to just being plain awesome for games overall, the rise of videos and streams is also an amazing resource for bug finding and fixing!  The more people record their games, the more they&#8217;ll be able to point to a video of exactly what went wrong so the developers can see it almost first-hand.  The bane of a developer&#8217;s life is a bug report that says, &#8220;the game broke&#8221; with no other description.  You know something&#8217;s probably wrong, but it&#8217;s basically useless for finding a problem.  With a video, you can usually see exactly what&#8217;s going on, so that problem is eliminated, or at least massively reduced.</p>
<p>So, as you can see in that video, the Spy is facing the wrong way on the floor pad.  The Spy should be facing the pedestal with the statue on it in that position, but in this case the Spy is turned 90 degrees.</p>
<p>I watched <strong>buxx</strong>&#8216;s videos when he posted them<sup><a href="http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/#footnote_0_3024" id="identifier_0_3024" class="footnote-link footnote-identifier-link" title="&hellip;because they&rsquo;re great for learning, the only thing better than video is commentated video!">1</a></sup> and I noticed this, and so I posted a bug in the Bugs Forum on the private beta website myself on May 6th, 2012.</p>
<p>Obviously, trying to repro it by replaying the steps in that video was fruitless.</p>
<p>Next up, <strong>bishop</strong> caught it with a screenshot and posted on May 13th, 2012:</p>
<p><a href="http://cdn.spyparty.com/wp-content/uploads/2013/02/SpyParty-20120512-08-45-22.png"><img class="aligncenter size-large wp-image-3034" title="SpyParty-20120512-08-45-22" src="http://cdn.spyparty.com/wp-content/uploads/2013/02/SpyParty-20120512-08-45-22-600x349.png" alt="" width="600" height="349" /></a></p>
<p>It&#8217;s a perfect shot, but his next post is &#8220;I haven&#8217;t had much luck on the repro.&#8221;</p>
<p><strong>ardonite</strong> chimes in the next day:</p>
<blockquote>
<p>I got the rotation bug once at a statue.</p>
<p>I think I was rriiiiight in the bounding box. So maybe if it&#8217;s on a border pixel of the box then it glitches out?</p>
<p>Edit: no, hypothesis incorrect.</p>
</blockquote>
<p>That&#8217;s how it goes: make a hypothesis, check it, repeat.</p>
<p>More shots every couple months over the summer from <strong>bishop</strong> and <strong>r7stuart</strong>:</p>
<p style="text-align: center;"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/02/SpyParty-20120527-15-36-37.png"><img class="size-medium wp-image-3040 aligncenter" title="SpyParty-20120527-15-36-37" src="http://cdn.spyparty.com/wp-content/uploads/2013/02/SpyParty-20120527-15-36-37-300x174.png" alt="" width="300" height="174" /></a></p>
<p style="text-align: center;"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/02/2012-08-09_00003.jpg"><img class="size-medium wp-image-3032 aligncenter" title="2012-08-09_00003" src="http://cdn.spyparty.com/wp-content/uploads/2013/02/2012-08-09_00003-300x159.jpg" alt="" width="300" height="159" /></a></p>
<p><a href="http://cdn.spyparty.com/wp-content/uploads/2013/02/2012-08-10_00001.jpg"><img class="aligncenter size-medium wp-image-3033" title="2012-08-10_00001" src="http://cdn.spyparty.com/wp-content/uploads/2013/02/2012-08-10_00001-300x159.jpg" alt="" width="300" height="159" /></a></p>
<p>Still no repro.</p>
<p>This entire time—in fact since I first saw it myself pre-beta—I&#8217;ve been trying to resist thinking it&#8217;s some kind of &#8220;numerical issue&#8221; with the code that handles the facing angle.  Yes, angles are finicky to deal with due to wrapping, but I&#8217;ve found programmers, including myself, tend to immediately go to vague concepts like &#8220;floating point error&#8221; for anything like this.  To fight this tendency back when we were working on physical simulation code together, <a href="http://www.mollyrocket.com/136">Casey Muratori</a> and I developed a mantra:  <strong>&#8220;Assume it&#8217;s a bug!&#8221;</strong>  It means that instead of assuming it&#8217;s some subtle floating point error creeping in, or anything mysterious like that, it&#8217;s almost certainly just some dumb programming bug.  That mantra has never failed me.  It&#8217;s always just a plain old bug.</p>
<p>Onward&#8230;</p>
<p>In the fall of 2012, <a href="http://www.spyparty.com/streams/">streaming <strong>SpyParty</strong></a> took off bigtime, and so people were recording their games more often, and we started to get more videos, this one from <strong>r7stuart</strong> in October:</p>
<p><a href="http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/"><em>Click here to view the embedded video.</em></a></p>
<p>And <strong>tytalus</strong> in November:</p>
<p><a href="http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/"><em>Click here to view the embedded video.</em></a></p>
<p>I saw this last one live on <a href="http://www.twitch.tv/tytaluswarden">tytalus&#8217;s stream</a>, and grimaced when it happened, but also was happy to have more data to some day find a repro, or just have a random brainwave and fix it by intuition.<sup><a href="http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/#footnote_1_3024" id="identifier_1_3024" class="footnote-link footnote-identifier-link" title="This happens in programming a lot, but you don&rsquo;t want to count on it if you don&rsquo;t have to.">2</a></sup></p>
<p>At this point, people were reporting NPCs doing it, which at least made me happy, because it meant it wasn&#8217;t a <em>tell</em>.  Tells and anti-tells<sup><a href="http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/#footnote_2_3024" id="identifier_2_3024" class="footnote-link footnote-identifier-link" title="Where the NPCs can do something the Spy can&rsquo;t.">3</a></sup> are the most serious <strong>SpyParty</strong> bugs, because they undermine the delicate balance of the game, so I proritize them highest, even above crash bugs sometimes!</p>
<p>A couple on New Year&#8217;s Eve from <strong>jorjon</strong>:</p>
<p><a href="http://cdn.spyparty.com/wp-content/uploads/2013/02/SpyParty-20121231-12-40-42-0.png"><img class="aligncenter size-medium wp-image-3036" title="SpyParty-20121231-12-40-42-0" src="http://cdn.spyparty.com/wp-content/uploads/2013/02/SpyParty-20121231-12-40-42-0-300x159.png" alt="" width="300" height="159" /></a></p>
<p><a href="http://cdn.spyparty.com/wp-content/uploads/2013/02/SpyParty-20121231-12-49-28-0.png"><img class="aligncenter size-medium wp-image-3029" title="SpyParty-20121231-12-49-28-0" src="http://cdn.spyparty.com/wp-content/uploads/2013/02/SpyParty-20121231-12-49-28-0-300x159.png" alt="" width="300" height="159" /></a></p>
<p>Then two clips from streams, the first from <strong>slappydavis</strong>, who&#8217;s Seduction Target appears to do it at the bookshelf on January 6th:</p>
<p><a href="http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/"><em>Click here to view the embedded video.</em></a></p>
<p>And then from james1221 on January 12th during the <a title="SpyParty New Years Cup Tournament Starting Tonight!" href="http://www.spyparty.com/2013/01/02/spyparty-new-years-cup-tournament-starting-tonight/"><strong>SpyParty</strong> New Years Cup Tournament</a>:</p>
<p><a href="http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/"><em>Click here to view the embedded video.</em></a></p>
<p>Both of these are different, however.  In both cases, the NPC is being blocked by another character, and instead of repathing to a new place, they just wait until the blocking character leaves.  This is both good news and bad news.  It means it&#8217;s easy to project this bug onto other bugs, it means there are other bugs,<sup><a href="http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/#footnote_3_3024" id="identifier_3_3024" class="footnote-link footnote-identifier-link" title="duh">4</a></sup> and it means all the real examples of this bug so far have involved the Spy.  I don&#8217;t mention this last part in the hopes that nobody notices.  Luckly it&#8217;s rare enough that it&#8217;s not going to be a game balance changer even if it is a tell.</p>
<p>Finally, <strong>kcmmmmm</strong> finds a reliable repro on February 7th, two days ago, and 8 months after the first post in the Bugs thread!  These pictures are beautiful to me:</p>
<p style="text-align: center;"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/02/SpyParty-20130207-12-09-59-0.png"><img class="aligncenter  wp-image-3031" title="SpyParty-20130207-12-09-59-0" src="http://cdn.spyparty.com/wp-content/uploads/2013/02/SpyParty-20130207-12-09-59-0-600x556.png" alt="" width="360" height="334" /></a></p>
<p style="text-align: center;"><a href="http://cdn.spyparty.com/wp-content/uploads/2013/02/SpyParty-20130207-12-03-01-0.png"><img class="aligncenter  wp-image-3030" title="SpyParty-20130207-12-03-01-0" src="http://cdn.spyparty.com/wp-content/uploads/2013/02/SpyParty-20130207-12-03-01-0-600x556.png" alt="" width="360" height="334" /></a></p>
<p>You can stand at that position, with that camera angle, and repro the bug most tries.  He also figures out that it&#8217;s very camera angle dependent, which is another clue, but once I could repro it locally, its remaining time on this earth was measured in minutes.</p>
<p>I had some trouble reproing it reliably here, including a couple wild-goose chases where I thought it wouldn&#8217;t repro with the debugger running<sup><a href="http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/#footnote_4_3024" id="identifier_4_3024" class="footnote-link footnote-identifier-link" title="Sometimes this happens with an uninitialized variable, since the debugger initializes most memory to zero for you.">5</a></sup> or in my debugging modes,<sup><a href="http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/#footnote_5_3024" id="identifier_5_3024" class="footnote-link footnote-identifier-link" title="Could mean some debug code was correcting for the bug?">6</a></sup> but in the end I got a case where I could catch it in the debugger, and I looked at the source, and there it was, suspiciously rotten code.</p>
<p>It was an old check from when I used to support click-to-move, as opposed to direct-control of the spy.  There was a case in the code that would check if you were clicking on the bookshelf itself, rather than the floor pad in front of the bookshelf, and it would helpfully direct you to the floor pad.  The position part of this got taken out long ago (I think), but the angle part remained in, so when the Spy stopped moving, if the mouse was over the bookshelf that code would return the angle for facing the bookshelf.</p>
<p>Wait, you say, there&#8217;s no mouse cursor in Spy mode?  Ah, yes there is, it&#8217;s just hidden and forced into the middle of the screen.  So, most of the time it hits your back, but sometimes, if you&#8217;re turning or leaning down or whatever when you stop, it&#8217;ll miss you and hit what&#8217;s behind you, and if it hits a bookshelf on the frame when you stop, you get the wrong facing angle.</p>
<p>Now, with this knowledge, go back and watch the videos and look a the pictures above.  Always a nearby guilty bookcase, isn&#8217;t there? </p>
<p>But wait, you say again, what about the very first video, the green bookshelf is nowhere near the middle of the screen!  Ah, but <strong>buxx</strong> uses a controller,<sup><a href="http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/#footnote_6_3024" id="identifier_6_3024" class="footnote-link footnote-identifier-link" title="You can see the action UI in the video uses the controller icons! No detail is too small to matter in a bug!">7</a></sup> and the mouse gets hidden but doesn&#8217;t get centered if you&#8217;re using a controller!  It probably should, but it doesn&#8217;t.  So, <strong>buxx</strong>&#8216;s hidden mouse pointer is probably off to the right of the window, over the green bookshelf, until he moves one pedestal pad to the left, and then the mouse pointer is no longer on the bookshelf, and he faces the right way!</p>
<p>Awesome, all the cases explained, and the bug was trivial to fix!</p>
<p>Okay, there&#8217;s actually one more case in the Bugs thread I didn&#8217;t post here, because it&#8217;s funny enough that I&#8217;m going to make an entire post about it soon.</p>
<p>So, just remember, always, <strong>Assume it&#8217;s a bug!</strong></p>
<hr/><ol class="footnotes"><li id="footnote_0_3024" class="footnote">&#8230;because they&#8217;re great for learning, the only thing better than video is commentated video!</li><li id="footnote_1_3024" class="footnote">This happens in programming a lot, but you don&#8217;t want to count on it if you don&#8217;t have to.</li><li id="footnote_2_3024" class="footnote">Where the NPCs can do something the Spy can&#8217;t.</li><li id="footnote_3_3024" class="footnote">duh</li><li id="footnote_4_3024" class="footnote">Sometimes this happens with an uninitialized variable, since the debugger initializes most memory to zero for you.</li><li id="footnote_5_3024" class="footnote">Could mean some debug code was correcting for the bug?</li><li id="footnote_6_3024" class="footnote">You can see the action UI in the video uses the controller icons! No detail is too small to matter in a bug!</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.spyparty.com/2013/02/09/one-bugs-story-or-assume-its-a-bug/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Full Commentated Live Casting for Tournament Finals, January 19th, 7pm PST!</title>
		<link>http://www.spyparty.com/2013/01/19/full-commentated-live-casting-for-tourney-finals-january-19th-7pm-pst/</link>
		<comments>http://www.spyparty.com/2013/01/19/full-commentated-live-casting-for-tourney-finals-january-19th-7pm-pst/#comments</comments>
		<pubDate>Sat, 19 Jan 2013 09:26:16 +0000</pubDate>
		<dc:creator>checker</dc:creator>
				<category><![CDATA[competitive gaming]]></category>
		<category><![CDATA[streams]]></category>
		<category><![CDATA[tournaments]]></category>

		<guid isPermaLink="false">http://www.spyparty.com/?p=2988</guid>
		<description><![CDATA[Update:  Here&#8217;s the video of the finals match between  james1221 and bl00dw0lf with the full commentary!1 I am ludicrously excited about this!  If you follow this blog and SpyParty on Twitter or Facebook,2 you probably know I&#8217;ve been really interested in live streaming recently.  In fact, just the other day I set up the SpyParty Streams Notifier and I&#8217;m writing [...]]]></description>
				<content:encoded><![CDATA[<p>Update: <strong> </strong>Here&#8217;s the video of the finals match between  <strong>james1221</strong> and <strong><a href="http://twitter.com/bl00dw0lf6">bl00dw0lf</a> </strong>with the full commentary!<sup><a href="http://www.spyparty.com/2013/01/19/full-commentated-live-casting-for-tourney-finals-january-19th-7pm-pst/#footnote_0_2988" id="identifier_0_2988" class="footnote-link footnote-identifier-link" title="As a youtube commentor pointed out, I shouldn&rsquo;t have had the winner above the video because it ruins the surprise!">1</a></sup></p>
<p><a href="http://www.spyparty.com/2013/01/19/full-commentated-live-casting-for-tourney-finals-january-19th-7pm-pst/"><em>Click here to view the embedded video.</em></a></p>
<p>I am ludicrously excited about this!  If you follow this blog and <strong>SpyParty</strong> on <a href="http://twitter.com/spyparty">Twitter</a> or <a href="http://facebook.com/spyparty">Facebook</a>,<sup><a href="http://www.spyparty.com/2013/01/19/full-commentated-live-casting-for-tourney-finals-january-19th-7pm-pst/#footnote_1_2988" id="identifier_1_2988" class="footnote-link footnote-identifier-link" title="And now Google+, because the world definitely needed another social network!">2</a></sup> you probably know I&#8217;ve been really interested in live streaming recently.  In fact, just the other day I set up the <a title="SpyParty Streams Lists and Notification Sign Up" href="http://www.spyparty.com/streams/"><strong>SpyParty</strong> Streams Notifier</a> and I&#8217;m writing up a longer post about that and streaming in general, and I was <a href="http://indiestatik.com/2013/01/15/games-on-stream-spyparty-clairvoyance-and-the-livestreaming-phenomenon/">recently interviewed by Chris Priestman on Indie Statik about the growth of streaming for indie games</a>, along with <a href="https://twitter.com/e_svedang">Erik Svedäng</a> who&#8217;s working on the awesome <a href="http://www.gameofclairvoyance.com/">Clairvoyance</a>, which makes great use of online replays. </p>
<p>You may also know we&#8217;re at the final game of beta-tester<strong> <a href="http://twitter.com/benswinden">insight’</a></strong>s <a title="SpyParty New Years Cup Tournament Starting Tonight!" href="http://www.spyparty.com/2013/01/02/spyparty-new-years-cup-tournament-starting-tonight/"><strong>SpyParty</strong> New Year’s Cup</a>, which has been a really fun and exciting set of matches, most of which have been live streamed or at least recorded and posted after the match.  Watching the live streams and chatting with the other viewers during the games is incredible.</p>
<div id="attachment_2991" class="wp-caption aligncenter" style="width: 532px"><a href="https://twitter.com/ZeroTKA/status/289320879781974017"><img class="size-full wp-image-2991" title="zero-tourney" src="http://cdn.spyparty.com/wp-content/uploads/2013/01/zero-tourney.png" alt="" width="522" height="94" /></a><p class="wp-caption-text">Even grizzled live streaming veterans are having a blast with the tournament streams!</p></div>
<p>So, what better to do than combine the tournament and the focus on live streaming and go all-in?</p>
<p>We are going to live commentate the finals match of the tournament between <strong>james1221</strong> and <a href="http://twitter.com/bl00dw0lf6"><strong>bl00dw0lf</strong></a>, re-streaming both sides of the match to a single stream we can fade between the two views dynamically based on context, and we&#8217;ll have expert level commentary by <strong><a href="http://twitch.tv/virifaux">virifaux</a></strong> and <strong><a title="Live streamed interview tonight, January 10th, 8pm PST (GMT-8)" href="http://www.spyparty.com/2013/01/10/live-streamed-interview-tonight-january-10th-8pm-pst/">ky</a></strong>, with occasional color by me.  <strong>ky</strong> is going to play director while he casts, picking the best view from the two streams, just like the dude in the video truck at the Super Bowl!  We tested all the technical and social aspects of this today, it worked incredibly well, and you can see the results here, with a test game between <strong>james1221</strong> and <strong><a href="http://twitch.tv/slappydavis">slappydavis</a></strong>:</p>
<div class="aligncenter wp-caption" style="width: 650px;"><p><a href="http://www.spyparty.com/2013/01/19/full-commentated-live-casting-for-tourney-finals-january-19th-7pm-pst/"><em>Click here to view the embedded video.</em></a></p></p>
<p class="wp-caption-text">Check out <strong>ky</strong>&#8216;s awesome temporary name tags for this test, whipped up in mspaint while we were streaming!</p>
</div>
<a name="Tune+In"></a><h3>Tune In</h3>
<p>So, at <strong>10pm EST / 7pm PST / 3am GMT</strong>,<sup><a href="http://www.spyparty.com/2013/01/19/full-commentated-live-casting-for-tourney-finals-january-19th-7pm-pst/#footnote_2_2988" id="identifier_2_2988" class="footnote-link footnote-identifier-link" title="sorry!">3</a></sup> <strong>on Saturday, January 19th</strong><sup><a href="http://www.spyparty.com/2013/01/19/full-commentated-live-casting-for-tourney-finals-january-19th-7pm-pst/#footnote_3_2988" id="identifier_3_2988" class="footnote-link footnote-identifier-link" title="20th if you&rsquo;re GMT">4</a></sup> point your browser to the <strong>SpyParty</strong> twitch.tv channel:</p>
<p style="text-align: center;"><a href="http://twitch.tv/spyparty"><span style="font-size: medium;"><strong>http://twitch.tv/spyparty</strong></span></a></p>
<p>If today&#8217;s practice session is any indication, it&#8217;s going to be totally awesome!</p>
<p>Assuming this goes as well as I hope, we&#8217;re going to start doing this kind of casting more often, with a rotating mix of commentators, so leave comments here on this post with feedback on what you saw and let us know if you liked it, or if there&#8217;s anything we can change to make it even better!</p>
<hr/><ol class="footnotes"><li id="footnote_0_2988" class="footnote">As a youtube commentor pointed out, I shouldn&#8217;t have had the winner above the video because it ruins the surprise!</li><li id="footnote_1_2988" class="footnote">And now <a href="https://plus.google.com/111962592569202160089/posts">Google+</a>, because the world definitely needed another social network!</li><li id="footnote_2_2988" class="footnote">sorry!</li><li id="footnote_3_2988" class="footnote">20th if you&#8217;re GMT</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.spyparty.com/2013/01/19/full-commentated-live-casting-for-tourney-finals-january-19th-7pm-pst/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Content Delivery Network via Amazon Web Services: CloudFront: cdn.spyparty.com

 Served from: www.spyparty.com @ 2013-05-22 09:24:12 by W3 Total Cache -->