, ,

How Nouvola Made Us ‘Launch-Ready’ For Shark Tank

POSTED BY ALEX WILHELM, VALIDATED CO-FOUNDER & CTO

We were approached by Shark Tank last year during our participation in Techstars Seattle 2016 and filmed our episode in June 2016. 10 months later the phone rang: our pitch to the Sharks would air on April 14th, 2017.

With a heads-up of roughly five weeks, it was time to hunker down. The episode was scheduled to coincide with our nationwide rollout to ten additional cities across the U.S. and we had some pretty lofty goals on top of the time constraint.

Over and above overall product improvements and optimizations (to both the app and backend), social login integration, and security hardening, we were switching from our long-term cloud provider Heroku to AWS for cost and performance reasons. Furthermore, we were revamping our website to capture as many leads as possible with a sweepstakes, live chat, and B2B dashboard.

No pressure 🙂

As an early stage startup that – up till then – was only live in two cities, in one time zone, we had never experienced such a surge in traffic and users. Right off the bat, we knew we needed to tackle three big unknowns:

  • How will our app and backend perform at scale with the current code base?
  • How will the new infrastructure scale and perform?
  • What needs to be done to ensure that our corporate webpage stays responsive and reachable?

If we were going to capitalize on the massive exposure of such a huge media event, we needed to ensure our capacity to weather the storm. But where to begin?

RESEARCH & EVALUATION

Step 1:  Establish a Baseline

No system is ever infinitely scalable so our first task was research into baseline numbers to orient ourselves. We started with blog posts from other Shark Tank Survivors and relevant articles. Here are three articles we found most useful:

In sum, the research said we could expect up to ~130,000 requests per minute on the corporate web page and roughly 1,500 app downloads per minute. And while those are pretty massive numbers, the fact that they’re real world examples makes them perfectly suited to serve as benchmarks to test against.

Step 2:  Assess Our Infrastructure

Like many others, we use WordPress to easily manage and update our content. We enjoy the flexibility and features it offers for marketing purposes — and, of course, also not needing to upload static HTMLs with FTP left and right. Up to this point, we had hosted with a shared hosting provider that, upon notice of the upcoming Shark Tank spike, politely informed us that they would likely have to shut down us down during the surge to prevent impairing their infrastructure. With such high stakes, we could not risk being shut down. If your app is slow, nobody cares what it does. So we had no choice but to add to our already overfull plate and do one of two things: (a) move our website to another provider that would guarantee against downtime; or (b) host our website ourselves.

Since we were in the middle of moving our backend and had already gotten our feet wet in AWS, we decided to host it ourselves. Using a PaaS service on our own cloud, we would be in complete control of both horizontal and vertical scaling-up of our infrastructure; could forgo the need of a static HTML page (as recommended by past Shark Tank survivors); and ultimately guarantee everything would work come showtime.

Step 3: Migrate our Infrastructure

We will spare you with the nitty gritty details of moving our Heroku app to AWS, as there are enough articles about this out there. (I might write a follow-up article in explaining how we moved specifically using Convox; contact me on LinkedIn if of particular interest to you.) WordPress was actually pretty straight forward. We set up the new EC2 instances (3xt2.medium), CDN, etc. (thank you CLI), and moved the entire website in one fell swoop. A surprise bonus: AWS also provides free SSL certificates for top and wildcard domains! If you haven’t worked with WordPress in ages, here is an amazing Backup & Restore solution that made the move a breeze: All-in-One WP Migration, a paid plugin worth EVERY PENNY if your backup size is over 512 MB.

Once we moved all of Validated to our own private cloud, it was time to prime it for success. While the infrastructure move was successful, there was no guarantee it could operate at scale. We needed to make absolutely sure our infrastructure would withstand the traffic for both our products (web and app). But how could we simulate the anticipated loads on our infrastructure quickly and efficiently?

MEET DIVECLOUD BY NOUVOLA

Nouvola helped us engineer a system that scales to 3,000 req/s in less than two weeks!  Nouvola offers real-world performance and load testing for web and mobile apps so developers can immediately know how a site or app will do with real load. We chose Nouvola because we were impressed by its turnkey simplicity and simultaneous ability to run large loads seamlessly, model realistic scenarios, and visualize dynamic data in a comprehensive dashboard.

With Nouvola, we were able to identify otherwise unforeseen cliffs, tipping points, and bottlenecks — and fix them right away.  As a B2B2C, traffic would be split between (1) users looking to download the app from the Apple App/Google Play store and (2) merchants looking for more information on our corporate web page. In addition, merchants needed to be able to onboard themselves through our business portal on our web page, which ties directly into our production backend. Consequently, the corporate page and infrastructure were crucial to optimizing the opportunity. Here is a step-by-step account of our wonderful experience.

PRIMING THE NEW CORPORATE WEBSITE

Through Nouvola’s continuous deployment and testing, we were able to assess the impact of every change and optimize our capacity accordingly.

TEST 1: WORDPRESS – NEW INFRASTRUCTURE

With the new page in place at 3xt2.medium for web and a 1xt2.medium RDS instance, we ran the first test by setting up a SESSION Test, normal GET to https://validated.co with up to 3,000 concurrent users per second, 3-second think time, and a linear ramp-up over a 10-minute timespan.

OUTCOME: FAIL

We had to cancel the test halfway through as it was failing and the server stopped responding at around 750 concurrent users.

 

Fig 1: Nouvola Dashboard, TEST 1 results.

>> all graphs & stats are taken directly from AWS & Nouvola dashboard <<<

So what happened? Let’s get to the bottom of it.  After monitoring the web page for some time, we saw the dreaded WordPress ‘Error establishing a database connect’ message:

Fig 2: WordPress ‘Error establishing a database connection’ message.

We also took a look at the RDS logs and confirmed the instance was overwhelmed by the incoming traffic.

Fig 3: RDS log TEST 1: DB – t2.medium queue length overload.

Finally, we turned to Google to home in on the precise cause.  What we found out: WordPress opens a new DB connection every time a user visits the page, which is not good for page load times and resource strain. So caching was the next logical step, which is fortunately made very easy by a multitude of plugins.

FIX 1: Caching

After testing a few plugins, we ultimately settled on Comet Cache, which worked great and offered a very easy way to verify results and make sure the caching is actually working.

Fig 4: WordPress plugin Comet Cache adds an optional entry in the page source for verification, loaded in 0.03s.

To counter server errors, we also increased the web instance count from three (3) to 10, as CPU load spiked during sustained load on the EC2 instances.

TEST 2: WORDPRESS

Rinse repeat. Nouvola saves your tests so you can conveniently rerun them (and you can’t change test details once it’s been run so you don’t have to worry about screwing up your test data). So we ran our second test with the same settings as TEST 1.

OUTCOME: FAIL

The test completed with no server errors (so increasing the instance count fixed that), but we still got the DB error. Somehow the caching wasn’t working correctly even though it showed up fine the webpage page source.

Fig 5: Nouvola Dashboard, TEST 2 results. Still no responses, but server errors are gone.

FIX 2: Set headers correctly

After looking back through the SESSION test case in the Nouvola Dashboard, we found that we hadn’t set the headers correctly, which in turn didn’t trigger the caching agent. This was quickly remedied:

Fig 6: Nouvola Dashboard. Add any browser header to GET test.

Hoping this fixed the problem, we ran the next test, fingers crossed.

TEST 3: TEST FIXES

Hoping this fixed the problem, we ran the final test using the same settings: 10xt2.medium web, 3,000 concurrent users, 3-second think time with linear ramp-up over a 10-minute time span and 1xt2.medium DB instance.

OUTCOME: SUCCESS

Running with 3,000 concurrent users, everything went well. Response time is amazing with 60ms at peak and we are looking at an average of 32,000 requests per minute handled by our infrastructure with 10x t2.medium instances. Not too shabby.

Fig 7: Nouvola Dashboard, TEST 3 results. Those are pretty good numbers!

Next step was to scale up the number of web instances that served the web page and run a final test to ensure our infrastructure could withstand the anticipated surge.

TEST 4: TEST AT SCALE

We added a new test case with 3,000 concurrent users, random think time, which was expected to triple traffic and more closely resemble real-time behavior over a 10-minute period with linear ramp-up; and increased the EC2 container size to 30xt2.medium (for good measure and to be on the safe side).

OUTCOME: SUCCESS

Everything fell into place. With roughly 120,000 requests per minute and a response time of under 500ms at peak load, we were ready for showtime.

Fig 8: Nouvola Dashboard, TEST 4 results. Web test up to 3,000 concurrent users.

Looking at the caching timestamp during peak load directly in the browser also showed a fantastic picture. The cache response time got up to 10x at a fantastic 0.3s!

This proves pretty well that WordPress can indeed handle high traffic events and offers a viable alternative to static HTML pages in a crunch. Granted we had a nicely scalable system thanks to Convox and AWS, but this shows that it’s not terribly difficult to set up yourself!

Once we confirmed the capacity of our corporate web page, we moved on to assess and prime the app.

PRIMING THE VALIDATED APP

Knowing our backend inside and out, we went into this task expecting more of a CPU heavy load on the web instances to create the offer list on a per user basis.

Based on what we learned with load testing for the web page, we knew what kind of EC2 instances we wanted to pick and how many would be running; so we settled on 4xt2.large for web and a t2.large + follower DB to help with I/O spread.

First, we concentrated on optimizing and checking the main contact points for new users:

  • Query optimizations: No sequential scans signup, user profile or offer list creation.
  • CPU utilization is not that big and changing between t2.medium, t2/m4.large and t2/m4.xlarge didn’t impact processing time significantly. We will just have to monitor to see what happens at scale.

Feeling confident that this would work without a problem, we set up a staging environment and multi-session test case that would faithfully follow our normal onboarding flow:

  1. Create a user with random credentials/emails and geolocation coordinates.
  2. Give it a thinking time of 2-3 seconds to stimulate input, view switching, etc.
  3. Request the offer list with those randomized parameters.

This is easily done by (1) creating two sessions in the Nouvola Dashboard:

Fig 9: Nouvola Dashboard. Created two separate actions as two separate sessions.

And then (2) setting the session specific variable parameters directly in the applicable session:

Fig 10: Nouvola Dashboard, small overview of parameter optimization per applicable session.

Lastly, (3) we brought everything together by combining them into a MULTI-SESSION giving them the necessary sequence of execution:

Fig 11: Nouvola Dashboard; add and order the previously created sessions.

The whole process didn’t take longer than 30 minutes and it will carry over all session IDs, cookies, and variables that are necessary to simulate a real user.

TEST 1: INITIAL INFRASTRUCTURE TEST

As previously done with the web page, we created a test that would mimic 500 concurrent users and 3 seconds think time with linear ramp-up over a timespan of 10 minutes running 4xt2.large web EC2 containers and 1xt2.large DB instance plus read follower.

OUTCOME: SUCCESS

Fig 12: Nouvola Dashboard, TEST 5 results. Up to 500 concurrent users.

We were off to a great start. With 1.4 seconds from signup to offer-list-creation, loading was swift and New Relic reported an App Index Score of 0.9!

TEST 2: 1,000 CONCURRENT USERS FOR 10 MINUTES

Now we needed to make sure the app behaved as well at scale so we went back and created a new test with 1,000 concurrent users. Otherwise, we kept all the same parameters.

OUTCOME: FAIL

This was a bit surprising as CPU load on the instances was below 40% (same as with the previous test) and it started failing about seven minutes in (~700 concurrent users). So we looked into both the Nouvola reports and AWS logs and found something interesting:

Fig 13: Nouvola Dashboard, TEST 6 results. App starts not responding.

So why did this test fail? The RDS instances CPU (t2.large) started to choke, DB connections spiked, and the system became unresponsive.

Fig 14: DB CPU maxes out and system screeched to a halt.

Since all of the queries were optimized, the problem came down to the DB instance size; so we increased the CPU horsepower and tried again.

TEST 3: 1,000 CONCURRENT USERS FOR 10 MINUTES (TEST 2 #2)

We decided to upgrade both the main and follower DB to m4.xlarge (16x CPU) to be on the safe side, with the expectation that it would perform very well during any prime time broadcast (and resultant surges).

OUTCOME: SUCCESS

The results were promising!

Fig 15: Nouvola Dashboard, TEST 7 results.

We dug into the Nouvola data to pinpoint the cause of the small proportion of ‘No Responses’: an error in one of our EC2 instances, which was easily fixed with a simple restart.

Fig 16: DB CPU; plenty of room to scale.

End result? We could easily serve 5,200+ app downloads per minute and with CPU utilization hovering at 44%, there was plenty of room for further scaling!

PRIMED FOR PRIME TIME

Nouvola made it stupidly easy to test our infrastructure and stack, and ensured that we didn’t have any blind spots. On the big day, everything went off without hitch. The east coast broadcast kicked off and we had three big waves, one hour apart, and experienced no issues whatsoever. People were able to reach us via live chat on our web page, merchants were able to sign up via our B2B portal, and new users were able to download the app and create accounts without delay. We also captured thousands of leads.

You read online a lot about using command line tools and nightmare stories about setting up test cases that need entire teams but Nouvola greatly exceeded our expectations. In fact, we were so impressed with the outcome and ease of utility that Nouvola is now a part of our development cycle and we include it in our default deployment and testing workflow. On top of that, we are using Nouvola dashboard data to optimize our infrastructure cost-wise and reliably test autoscaling triggers.

NOUVOLA : DEVELOPMENT TAKEAWAYS

Proper performance testing doesn’t have to be a dauntingly complicated process.

Good planning, the right tools, and real world scenarios in your testing process will go a long way.

You need the right data to know that your app and website are ready every day.

Performance testing shouldn’t be restricted to big events or product launches. Every software change has to be validated because every software change as implications on user experience.

It pays out to be prepared.

Past performance is not a guarantee of future performance so test early, test often, test always. If your app is slow, nobody cares what the app does; and slowtimes on your web page are insidious.


The ease and utility of Nouvola are unparalleled and we couldn’t recommend it more highly.

Sign up for free and try Nouvola today.


About Nouvola

Nouvola is the leading provider of performance testing and load testing helping developers, DevOps and engineering managers release better code faster and ensuring their web, mobile, API and IoT applications are real-world ready. Designed for and by developers, Nouvola’s innovative platform significantly simplifies performance testing at any stage of the development flow. Founded in 2013 with headquarters in Portland, Oregon and offices in San Francisco, Nouvola has a broad and fast-growing user base with customers from Fortune 1000 companies to market disruptors appearing on Shark Tank. For more information on Nouvola solutions, visit www.nouvola.com or find us on Twitter at @Nouvolatech.