The way that we develop software has so radically changed over the last five or ten years. This change has dramatically impacted the way we have to test software as well. But one of the last areas of testing that most people are still following the old way of doing things has been performance testing.
In this episode Michael Sage from BlazeMeter shares how performance testing should be done in an Agile DevOps world.
Listen to the Audio
In this episode, you’ll discover:
- How Taurus can help help you create a user-friendly framework for Continuous Performance Testing.
- Why the old way of doing performance testing is no longer working in a DevOps world.
- Is performance testing an activity that every sprint team should be involved in?
- Tips to improve your performance testing efforts.
- What is Docker and why you should use it.
#Taurus lets you write #perftest in Yamel without having to bring up a vendor specific recording tool @mondosageClick to tweet
Join the Conversation
My favorite part of doing these podcasts is participating in the conversations they provoke. Each week, I pull out one question that I like to get your thoughts on.
This week, it is this:
Question: What tools or techniques are you using to improve your Continuous Performance Testing efforts ? Share your answer in the comments below.
Want to Test Talk?
If you have a question, comment, thought or concern, you can do so by clicking here. I’d love to hear from you.
How to Get Promoted on the Show and Increase your Kama
Subscribe to the show in iTunes and give us a rating and review. Make sure you put your real name and website in the text of the review itself. We will definitely mention you on this show.
We are also on Stitcher.com so if you prefer Stitcher, please subscribe there.
Read the Full Transcript
Joe: Hey Michael. Welcome to Test Talks.
Michael: Great to be here. Thanks.
Joe: Awesome. Today I’d like to talk about BlazeMeter and performance testing in general. Before we get into it though could you just tell us a little bit more about yourself?
Michael: Yeah. I’ve been working in the testing world for about fifteen years now. I used to be a systems engineer with Mercury Interactive way back in the day. I remember the old time tools Win Runner, Test Director, and Load Runner and so forth
Michael: Kind of followed that trajectory for a while. A few years ago I broke out into the emerging world of start ups, dub ups, and so forth. Spent a little time with New Relic and now I’m working with BlazeMeter.
Joe: Great. You bring up a great point, you’ve worked with more established technologies I would say, HP’s, Load Runner. From someone that actually has that background, I’m an old time Load Runner performance engineer but I haven’t really kept up the performance testing in the past few years. Is there anything new with performance that people need to be aware of if they’re getting started now with the tools, with the technology?
Michael: Yeah. That’s a great question. I think like everything else, the fact that the way that we develop software has so radically changed over the last five or ten years. That’s impacted the way we have to test software as well. I think that one of the last bastions if you will of the old way of doing things has been performance testing. People still think about the war room test. We do a big test towards the end of a release cycle, maybe these days toward the end of a sprint or something like that that involves very well thought out scripts that take a little while to develop in a vendor specific tool. We have to schedule everyone together for this event that we call the Load Test. We run that Load Test and find some bottlenecks that will go fix them and do it again. That’s turning out to not work because the same logic that we use for continuous integration and continuous delivery with regard to other kinds of testing. In other words, finding the defects as soon as possible after the code was written and being able to manage that process in the same mental context. A developer writes some code, we discover some performance bottlenecks related to that commit or that build and fix them quickly. That’s basically where I think the cutting edge right now on performance testing is just getting it done earlier in the delivery work flow.
Joe: Awesome. Great answer and the reason why is I see a lot of teams struggling with performance testing. Just like you said, a lot of times they still have this mentality where we’ll wait till after the code is written, after we have everything tested, then we’ll go into performance. That’s really losing the benefits of finding performance issues early when you really can fix them. How does an agile team in a sprint go about performance testing? Do you recommend every sprint team does performance testing? How does that break out now a days?
Michael: I don’t know that we really found a discipline that makes sense for everybody. I think it’s really based on the teams structure. That’s one of the interesting emergent in the delivery styles and work flows that are out there. I think that if it makes sense a developer could start doing performance testing on his or her features really right from their desktop. We have the open source tools now where you don’t really need to invest a lot of up front effort in getting a performance testing script written and ready. You can actually do that pretty quickly.
In fact, BlazeMeter has some technology. We sponsor an open source project called Taurus which allows developers to write their test in Yamel. They can actually describe a full blown script in about ten lines of text. A big, big leap forward from the old days of having to bring up a big, heavy vendor specific recording, scripting tool. You could have developers writing a test and running some load against our local build for the futures that they’re working on during this sprint. I think the better place, the more obvious place to be running Load Test is actually in CI. Even in the context of whether it’s agile or some customized combined style work flow that a team is using. Hopefully they’re starting to commit to an integration server and do continuous integration builds. At that point I think is where we start to see the real first place that a Load Test should really be executed. For example, we have a Jenkins plug in for our product but also the Taurus product I mentioned you could just run that from a shell. In Jenkins you could have a build job for a shell script that runs a Load Test. It just tells you whether or not … It can even be a level of a smoke test. Low concurrency, really not a lot of think time or any elaborations. Just throw a bunch of load at that build and see if it falls down.
Joe: Awesome. I definitely agree with that approach. I definitely want to get more into BlazeMeter and Taurus but I just want to follow up on this concept. If you’re running in CI, what do you recommend? Do you have an environment specific to performance? Back in the day we used to try to make a performance environment as realistic as possible to production. I never had success with that. I always end up running a production on off hours to try to get realistic results. I like the idea you said maybe running a base line test to get a feel for what the relative performance is or are you saying also you can run a full blown performance test in CI every night or something like that?
Michael: That’s a good point. I think that we can actually … Again because we have the tools now to really flexibly create these tests, we can run different kinds of tests at different meaningful points. If I’m a developer in my local environment, naturally I’m running some scale down version. Maybe I have just some Mongo DB running locally. When it’s in production it’s part of a much more complex cluster or whatever. I still have the main components of the stack available to me in my local environment. I can run a performance test and again like you said, it’s kind of a base liner. It gives me just some idea of whether or not my code is performing as I expected to. Then of course the next stage. The CI stage. A lot of people committing to the same master branch or trunk. There we have the next level up. The early test can be low concurrency and they should be fast, small scoped test. We don’t want to spend a lot of time trying to emulate real world conditions. We just want to fill a bunch of requests at an in point and see if it falls down. See if there concurrency defects or five hundred errors or whatever.
The next stage, the CI stage, we may want to get a little more elaborate but still we don’t want to slow down the build. We don’t want to artificially delay the feedback that is the whole point of CI. There we also want smaller scope, fast running tests but then, yes. Then we want to deploy to staging or rehearsal environment. While it won’t look exactly like production, obviously these days with dockers and other kinds of technologies we can actually get pretty close to production. We want to run some more realistic scenarios there. There’s where were talking about the more traditional kind of Load Test. Where we introduce think time. We have staggered work loads in our load profile. Some users are doing one path through the app, others are doing something else. We’re mixing and matching those conditions a little more.
Finally, we really should run Load Test in production. Obviously we need to be careful about that. There is no replacement for those production conditions. Find the time where user impact is negligible or the lowest it can possibly be in that period and run a reasonable Load Test. Don’t try to break it. Don’t run a stress test but run a reasonable Load Test and collect those metrics. I think there are four different places, local, CI, staging or QA, and production where we can run a growing body of tests along that left to right spectrum.
Joe: Awesome. What is Docker? How does that fit into performance? Is that something you see as a trend that’s going to grow in the future? Hoe does technology like BlazeMeter handle that?
Michael: Yeah. To a degree in that kind of context, BlazeMeter is somewhat technology neutral. The place I see Docker being most meaningful to testing is in providing those kind of environments where we can spin up a test lab super easily, right? We have these Docker images. They can work together in kind of a cluster and we can create in a staging environment a relatively accurate performance lab that I think the infrastructural kind of architecture, the infrastructural characteristics of that performance lab get closer and closer to production. I think the piece that generally is missing is the network behaviors. You can’t really duplicate the way the internet is, the backbones and latencies and things like that. With Docker it gives us this tremendous flexibility in creating test environments so we can spin up and throw away. We don’t have to put any effort into ordering any hardware or managing any lab and paying people who just do that or just manage our QA lab. We can literally just spin up a lab that’s everything we need and then when the test is done we can dispose of it. That kind of flexibility is new.
I think Docker in a architectural sense is making their way into production around what are generally called micro services which are discreet business services. This is a rethinking of SOA in a lot of ways. It’s a way of saying this functionality, let’s say if it’s an online bank we have an account management service. We also have maybe a mortgage service. These are two separate services and they may be created by separate teams. The benefit of micro services is first they lend themselves very well to continuous delivery. You have a single team working on a single stack that is just focused on their code. When they have a performance problem with their account management service they can fix that performance problem much more easily than with a monolithic app where there are dozens of services running in one big stack. We have that kind of flexibility of managing the continuous delivery process better because we’re focused on a known domain, a business logic and code.
When you’re performance testing that obviously you reap the benefits of CI as we discussed earlier. When you bring all those micro services together you have those complex scenarios that we want to test with the more traditional style performance testing, load testing. The place that Docker occupies there is when you do have problems. You can actually just swap out one machine with another. You can kind of hot swap the machines that provide this service to the rest of the business. Essentially Docker gives us a kind of flexibility for both development and testing that we’re still sort of new to. It’s even more flexible and more adaptable than virtual machines. Sort of [inaudible 00:11:15].
Joe: Awesome. This is awesome stuff. Talk a little bit more like you said. We have newer tools, a lot of things have changed in the past five years with testing and performance testing. It almost sounds like we’re going towards continuous integration, continuous delivery. For that to happen, a lot of things need to be automated. It’s not so much now functional automation, it could be anything automated. It sounds like Docker and Vagrant fit into that where we’re able to automate the deployments of services in servers and bring them up automatically rather than have someone manually go in and do these types of activities.
Michael: That’s right. What we’re seeing as far as that continuous delivery work flow … Everybody finds the tool chain that works for them but the commonalities are starting to stabilize a little bit. Git Hub is the big version control player now. Everyone’s using Git Hub. Maybe they’re using [inaudible 00:12:04] version but Git Hub’s very dominant. When it comes to functional testing, of course Selenium has really exploded over the last few years. Largely for the same kinds of reasons that I’ve been talking about for the other technologies. We have a programmatic approach. A developer, friendly approach to traditional functional testing, right? User space functional testing. Kind of presentation where interacting with the web app and making sure that their controls are working, various calculations are being handled properly and so forth. There’s no need to launch a windows only record and replay tool that uses an outdated language like VB script.
I can write a full blown Selenium test that tests my user journeys through a particular important part of the application in Ruby, Python, Java, or C shop, whatever. I think there’s still testers who do that but I think that’s developer friendly enough that if I’m a developer I want to see how my code works. I want to start writing some of those tests. It gives me a sense of knowing that the code works and also it’s just a bit of pride and satisfaction in seeing it work. That’s kind of the Selenium piece next.
Then we have of course the Load Testing piece. That’s the new one, right? That’s Taurus. There are some other solutions like Gatling, which is a scalar based Load Testing solution, or Locust IO which is Python. JMeter is the heavy weight. JMeter is the tool that we mostly use at BlazeMeter. It’s the most popular open source tool in the Load Testing space. The Taurus piece that I talked about really makes it a lot more automation friendly. You have Git Hub checking out some codes, then you have Jenkins who’s the big player in CI. You’ve run some Selenium tests, run some jMeter tests. Then you’re talking about Chef, Puppet, or Ansible or one of these configuration automation solutions that will then stand up that deployment environment for you. Then further tests can be executed there. Yeah, we are automating the whole pipeline. I do a lot of work with AWS. They have a solution called Code Pipeline which does a lot of this as well. I think in general, those technologies or those automation approaches are going to proliferate through out the community. In a few years I think automation will just be second nature for everybody.
Joe: We’ve talked about Taurus a few times. We’ve touched upon jMeter and BlazeMeter but at a high level what is BlazeMeter? What are the pieces of technology that make it up? Can you just tell us a little bit more about BlazeMeter?
Michael: Sure. Yeah. BlazeMeter got its inception addressing the problems, let’s say not even problems, but some of the short comings around the jMeter project. It became pretty clear I think to our founder a few years ago that open source was the next wave. I think a lot of people saw this coming. Open source tools, they largely, they’re built by there community and it’s often based on the communities level of interests. What that means ultimately is some features may not get to where they need to be for enterprise grade consumption, right? In jMeter case for example, It’s usually a little bit difficult to scale jMeter yourself for a large distributed test. You have to set up a bunch of machines. You have to configure them to talk to each other. Even there depending on the nature of your jMeter script, you may get it wrong and have to do it over. These are a lot of orchestrating problems to execute large jMeter tests. That was the problem that our founder and the team set out to solve is how can we run jMeter in the Cloud and automatically scale out the infrastructure required to run these very large tests? If you have to run a hundred concurrent users, you’re fine with jMeter with on your desk top. When you want to run a ten thousand, twenty thousand, hundred thousand million concurrent user test, then it becomes very challenging.
jMeter was founded to solve that problem as well as the problem on the other end which is reporting. Like many tools jMeter just gives you some XML files and some log files. What the community, what the business community certainly wants and what testers really appreciate are rich graphs and rich reports that they can quickly use to find the bottleneck, right? That’s the whole point of Load Testing. BlazeMeter was founded with those two targets in mind. Orchestrating very, very large tests in the Cloud super, super simple and providing very rich and useful reports. I saw that while the test was running as well as over time after the test is done. What happened over the last few years as we’ve seen [inaudible 00:16:35] and this continuous delivery mania taking over the landscape.
Some of the other short comings of jMeter have started to come to the floor. Notably that there’s no domain specific language for it. There’s no way to easily programmatically interact with a jMeter test. It’s got some command line options and some other things that you can sort of do bit it’s sort of a square peg in round hole kind of situation. A guy named Andrey Pokhilko who is our chief scientist … He lives in Russia and he created a tool called Taurus. Like that bull, T-A-U-R-U-S. Taurus provides sort of an abstraction layer on top of jMeter as well as some other tools like Locust, Gatling, the Grinder, and Selenium. That gives teams the ability to describe these tests in a Yamal file or the JSON file. The Yamal human, readable, edible approach where I can describe a test in a simple text file in the editor that I am using to write code or I can programatically interact with other components of mt automation work flow with the J Son tests and execute these tests where ever it makes sense. The next level for BlazeMeter is just really bringing Load Testing into these modern delivery pipelines in native ways that make it really, really easy for people to adopt Load Testing now that we want to do it frequently.
Joe: Can you walk us through a very simple example of how somebody would get started? Do they just create a jMeter script and then they up load it to the Cloud on BlazeMeter and BlazeMeter handles all the scaling and the reporting aspects of it after?
Michael: Yeah. jMeter is a topic all to itself, right? It’s a tool that was created … I think it’s probably about twelve years old. Don’t quote me on that but it’s about twelve or fifteen years old. It was built pretty much to be the open source alternative to Load Runner so you’ll find in jMeter a lot of sophisticated features. It’s very, very capable. It’s a different kind of paradigm. It’s a thick client Java app that’s largely driven by right clicks. It’s a little bit weird but it’s got a lot of power. All the kinds of features that a professional Load Tester wants are available on jMeter. That also makes it a little bit hard for new users to figure it out. We have lots of good videos on line and there are some really good jMeter blogs. We also have a chrome extension that works as a interactive recorder. Simply launch our chrome extension, start working with your app just like you would with VU Jen or one of the other tools. It will capture a script for you. That’s actually the easiest way to get started.
The other way if you have a jMeter script that you use in native jMeter, if you use our chrome extension you’ll end up with that test plan file which is those steps you want to immolate for the driving the load into the app. You upload that script to BlazeMeter and then you select a couple of options. We have a lot of advance options. We integrate for example with APM. If you need to integrate with New Relic or App Dynamics or DynaTrace we make those available. That gives you a great way to find the needle in the hay stack while that load is being driven. We also have things like network emulation. If you want to emulate the speed that are coming from mobile devices you can do that.
You upload your script, choose some of these advance options and then you choose where that load should come from. As a Cloud provider, we make it really, really easy for people to drive load from throughout the internet, right? You can chose a European location or a Asian location or South American location or wherever you want. You can spread your load across a bunch of different source locations so that you get a kind of a real world representation of network traffic. Finally you chose how many concurrent users you want. We can scale up to two million. If you come to BlazeMeter, upload your script and you decide you need fifty thousand concurrent users, it’s a simple slider in our web UI. You slide that up to fifty thousand, press start and your test will start running. We’ll do all the work to make sure the right infrastructure is spun up to be able to execute that load.
Joe: I like how you said the reporting aspect is something that BlazeMeter really focuses on. For myself and my experience on performance testing usually the creation of the script was the easy part but analyzing the results was the hardest so is there anything specific to BlazeMeters reporting that helps people identify issues quicker than normal?
Michael: Yeah. Absolutely. That’s the thing we really try to make simple for folks. A person who is using BlazeMeter, as soon as they start the test will start to get immediate feedback. The KPI’s will come in. Those are the high level stats like number of concurrent users, the request per second, the hits per second that the servers are handling. Also response time of course. The over all response time and then also some percentiles. Ninety of percentiles response time. Just to give the testers or the developers the sense of those big important marquee metrics. Then what we provide is what we call a time line graph which shows each individual transaction in the script. For example, the home page versus the search function. The user can go into those individual labeled transactions and work with an interactive graph that’s building out in real time during the test but is also available over time. The way they can do it is they can say, “You know what? I see that the home page is slow, now I want to drill in and see exactly what’s happening there. Are there errors? Is the [inaudible 00:22:16] throttled? Are we seeing response times go up at the same time the load is hitting this certain point against that home page?”
From there we typically go into the monetary data. It will provide the monetary data directly but we [inaudible 00:22:29] with Cloud watch and certainly teams will be using APM, New Relic, Graph Dynamics and so forth. At that point you’ll corroborate what BlazeMeter is showing us the choke point or the bottle neck with the application stacked that you’re getting from the other tools that your team is using.
Joe: Awesome. It always gets me nervous about performance testing with more people getting involved is that in my experience, once again maybe this is an old school thing is that it’s very easy to create false alerts or fires. Say our application is broken cause someone aimed a thousand concurrent users at there application and you think time and they told management it took thirty, forty seconds for our page to load when it should only take two seconds. Is there an approach you recommend people for creating realistic scenarios especially as we get more different diverse applications interacting with your application? It may not necessarily be a computer but now you have all these different mobile devices that are also interacting with it.
Michael: Right. Yeah. This is another good place where we’re seeing real progressive teams are mining their APM data. DynaTrace, New Relic, App Dynamics as I mentioned before. They all do a lot around understanding what real users are doing against the site. You might also be using Google analytics or mixed panel or kiss metrics or things like that. Marketing oriented analytics tools that are also showing you the same thing. They’re showing you whatever my most frequently hit page is and what is my normal response time for these particular pages. What kind of activity is happening with the mobile app versus the web stores, so forth. That becomes a great source of data that we can bring back to the front end of our pipeline and use to develop the right tests. In fact, we’re doing a little bit of automation there and we’re playing with some Python code that will query the New Relic API, find the top ten slowest URLs, and then immediately turn them into a performance test automatically.
That’s the exciting stuff for me is have that full feedback loop happening where we understand the performance profile in production and then we take that data to make sure that we’re testing the right things. Especially as we move faster and faster with continuous delivery. We don’t really have time to maintain scripts. If we have a script that was written three months ago, probably the worst thing we can do with Load Testing is to optimize code that nobody ever uses. Why optimize code that doesn’t execute because we’re wasting time. Being able to understand what’s happening in the production app up to the moment. Use that to feed our test design scenarios is a huge thing. A huge thing that we’re seeing. Then from there also to further answer your question, if you adopt a continuous performance testing approach like we do with BlazeMeter, what you end up with is much greater familiarity with your base lines, right? You’re running tests so frequently you get a much better set of data over time and BlazeMeter for example you can compare multiple test runs day over day or hour over hour, however you’re doing it. You get a much better sense of what the deviations are. Both of those become factors in having realistic load tests and realistic expectations for what those thresholds should be.
Joe: Awesome. Are there any functionality built into BlazeMeter that you think most people aren’t aware of or it’s under utilized that you’re really excited about?
Michael: I think that two things that I’m not sure folks are using all that frequently. One is the chrome recorder that I mentioned earlier. It’s such a simple, simple way to get a script up and running. It’s a simple chrome extension. You launch it and capture your interactions with the browser. It immediately exports it jMeter script and it can also export adjacent file and a Taurus script. It does all this great little work. It’s a brainless way to get started. You can probably take that script and do much more sophisticated stuff with it. The jMeter’s reporting is less than optimal so I like the chrome recorder a lot.
The other is our mobile recorder. I don’t think a lot of people know about. The mobile recorder, we host the secure proxy that’s spun up just for you. You download a certificate to your device and it will interact with your mobile lap. The reason that’s important is because mobile API’s behave differently than website. The structure of the communication is different. Then of course the actual way the load is distributed is different. People don’t tend to stay engaged with a mobile app for a long session. They’ll do a little something, they’ll get a phone call or the bus will come and they’ll put their phone down, whatever. Then they’ll go back to it later. Our mobile recorder not only captures that mobile traffic in its unique structure with all of its header errors and user agents and all the other kinds of stuff that impacts what the service heeds from the client. We give people the ability to replay that traffic in those bursty and unique kind of mobile ways. Those I think are the two technologies that I would like to see people adopt more.
Joe: Awesome. Very cool. Sounds like great technology. I think if you can have something that helps you create a script rather than starting from scratch and you can just build it out, I think that’s a big benefit. Do you have any recommendations for books of resources that someone can go to to learn more about performance testing in general? Or jMeter?
Michael: They’re are some jMeter books. I think there’s one in particular. I think it’s called the jMeter Cook Book. There’s a new edition of it. Second edition of jMeter Cook Book, that’s great. It takes you right to specific exercises. Here’s how to record a script. Here’s how to replay a script with so many concurrent users and some custom headers and cookies. Things like that so you can get right into meaningful exercise that teach you how to use jMeter for specific operations and such. I like that one a lot. We also, our BlazeMeter resources page. BlazeMeter dot com slash resources has a tone of tutorials and training videos. Lots of helper tools for people to get up and running with jMeter. I think that’s one of our more popular pages that people visit.
Joe: Okay Michael, before we go, is there one piece of actual advice you can give someone to improve their performance testing efforts? Let us no the best way to find or contact you.
Michael: That’s a good question. I don’t know that there’s a magic bullet. I would say just do it more.
Michael: My big advice is to do it. Get it done. We’re seeing reports coming up from different providers, different research that’s out there, Achanime comes to mind. There’s an Achanime report that came out from last years holiday season telling us that forty nine percent of users that they surveyed expect a page to load under two seconds. Thirty percent expect under one second. Eighteen percent were sub second response time. They want instant response. I think the importance of performance is increasing intensely. The way we need to handle that is to just do it more often. Figure out the discipline. You’ll figure out jMeter or some other tool. You’ll learn about it but just start doing it. Engage with your developers. Engage the community of practice around your stack. Get performance testing done. Than we’ll start to work on all those details. The best way to get in contact with me is Sage at BlazeMeter dot com. S-A-G-E at BlazeMeter dot com and also on twitter I’m Mondo Sage.