Performance Testing in New Contexts
Welcome to Episode 80 of TestTalks. In this episode, we’ll discuss performance testing in new contexts as well as interpreting and reporting results with Eric Proegler, a performance engineering expert at SOASTA. Discover how to create readable performance results and tips for performance testing against modern environments.
Performance testing has changed dramatically over the years. Gone are the days where organizations controlled all the components in their application’s infrastructure. The changes in application architecture also require some adjustments to the way load injectors work. While performance testing back in the day required us to run hundreds of virtual users per box, we now have to manage a couple thousand per box.
In this episode, Eric walks us through many of these modern testing scenario dilemmas, as well as how to handle some common issues that most performance testers will encounter when testing in the Cloud.
Listen to the Audio
In this episode, you’ll discover:
- Why performance testing is more important then ever
- How to monitor common Cloud-based performance issues that can derail your performance efforts
- Tips to find out if you’re performance testing the right things
- What you should never do when sending out a performance testing report
- Why emulating and monitoring geographic locations in your performance scenarios is critical to create realistic performance results
- Much, much more!
Join the Conversation
My favorite part of doing these podcasts is participating in the conversations they provoke. Each week, I pull out one question that I like to get your thoughts on.
This week, it is this:
Question: With cloud based testing and other modern software infrastructure how has your performance testing efforts changed? Share your answer in the comments below.
Want to Test Talk?
If you have a question, comment, thought or concern, you can do so by clicking here. I’d love to hear from you.
How to Get Promoted on the Show and Increase your Kama
Subscribe to the show in iTunes and give us a rating and review. Make sure you put your real name and website in the text of the review itself. We will definitely mention you on this show.
We are also on Stitcher.com so if you prefer Stitcher, please subscribe there.
Read the Full Transcript
Eric: Hi Joe, good to be here.
Joe: Cool, before we get into could you just tell us a little bit more about yourself?
Eric: Okay, I am a performance tester by background- or at least for the last 15 years I started working in software about 20 years ago. I moved from development to testing in 99 and I did some functional test leadership, started doing a little automation, and then moved to performance full time in 2002/2003 and that’s the space I’ve been in since then. Today I work in a testing tools start-up called SOASTA and I do a little independent work on the side as well.
Joe: Awesome, so today I’d like to talk about your presentation at OreDev, you had one on performance testing in a new context and interpreting and reporting performance test results. I was able to actually sit in on the performance testing results one and I really enjoyed it, and to me I’ve always said for some reason creating the scripts for the performance test is actually easier than interpreting the results. Why did you feel like you had to put together this presentation on reporting performance test results?
Eric: First of all, if you like the scripting better, than we should work together.
Eric: My opinion about a performance test project is that there are a couple of phases to it and coming up with a plausible simulation is a big part of that, and then coding that is another part of it. When you actually interpret what you learn from your experiment and then turn that into something actionable, I think that’s a really key part of the process. In the performance testing work we talk about tool jockeys… about people who can do scripting but don’t understanding what it means… can’t put it the context of something that the business could make a decision on. That’s sort of taking a interesting engineering exercise and turning it into something that has an impact on what the business does.
Joe: Awesome. I’m actually an old-school performance test engineer. I haven’t done much in some years, so how has performance testing tools and approaches changed in the last few years. I know you covered this in your other presentation, I’m just curious to know with virtualization and the cloud… how have you seen things evolved in the performance space over the 15+ years you’ve been involved with performance testing.
Eric: They’ve evolved a lot, so when I first learned how to performance test, the VP of QA of the time was familiar with the Compuware sales rep so I got a QA load put in my hands, and then within a year or 2 I managed to trade that in for LoadRunner. So that’s where I got started was the old school client server load tool type model. Those solved a need that going back to the 90’s that was what was there at the time. There’s still some pretty good tools out there and I’ve used most of them in my consulting practice between back then and now, but these days, if you’re talking about a SAS application or your talking about a public facing website, the users go from the hundreds concurrently to the thousands or tens of thousands, and that change in application architecture also requires some change in how load injectors work. We’re able to go from what was a couple hundred virtual users per box to a couple thousands per box these days. It’s still not practical to manage a couple dozen boxes to get together a load against the system. Particularly if you want to run load from different geographical locations which is another requirement that I certainly see a lot more of now than I used to.
Joe: Since we’re able to scale up almost to infinity when we use cloud solutions, why do we even need to still do performance testings? Back in the day we just said throw hardware at it, and this is actually the ultimate dream I’d think is just scale up. It’s infinite, so why do we still need to do performance testing?
Eric: Because it isn’t just hardware. The hardware is what the hardware is and yeah it’s the promise of cloud and elastic resources I think has been largely met in that whatever machine resources are needed, we can continue to scale horizontally, but there’s still the code that runs on top of it. The part that we actually have control over. There’s soft resources involved like the size of a database connection pool or the number of sessions that you can hold in memory on a single server. There’s still all of these things to check against and to test for even when it appears that the pool of resources is bottomless. By the way I’m just letting that pass for now but my experience is not that the resources are bottomless.
Joe: So in this new context then, you named a few resources, but are there any must monitor resources that you think everyone needs to be aware of nowadays that they should be monitoring that may not have been popular back in the day?
Eric: Network bandwidth is a little more key, I think, when you’re talking about a distributed user base. The example we always use in troubleshooting here is the person who’s connecting via their phone over Panera wifi at lunchtime where network conditions are not fantastic. So paying attention to what the load is on the client end and then also coming into your data center are key. It’s typical that you’ll have to provision some amount of bandwidth coming in and then maybe you’ll need to handle larger surges. I’m seeing oh I have a 100 megabit line, well something I need more. That’s okay, ATT says it will burst to 1 gigagit a second when I need it, then I test that, we find out that it doesn’t respond as quickly as we might like. Even if you can describe a technical capability, that isn’t the same thing as verifying that it works as intended.
Joe: You mentioned earlier about geographically placed load generators, and back in the day, once again, I would have to load up software on a machine and ship it to an office in one of our locations and then have them turn it on to run a script in something called a business availability center. How is it done nowadays? Do you just point to a cluster in some location on Amazon and you’re able to pretty much get that type of data?
Eric: How that works with cloud based injection depends on which tool that you’re using. I’ve used 4 tools that can use cloud based load injection. Two of them are previous generation client server tools- LoadRunner and NeoLoad. Two of them are cloud only plays- BlazeMeter and SOASTA. For LoadRunner and NeoLoad, they have a system where there is a cloud based injector that you would control with your conductor. Not to make too many digressions but I’m also not talking about StormRunner right now which is a different issue. The HP’s new offering looks more like SOASTA or BlazeMeter.
Eric: The traditional client server tools, I’ve not used the LoadRunner cloud injector. I have used the NeoLoad cloud injectors. There is a tokening system that you could rent them by the hour, there was some sizing work to do, but if you could get a load injected from cloud locations. The cloud based tools, they tend to provision the load injectors as needed. With BlazeMeter, they kind of control that for you and you can pick the Amazon region you want to rent things in, and they have support for some other cloud providers. In the SOASTA world we have a couple of different providers we work with- primarily Amazon and Asher, but also HP at least until those get turned off in a couple months, IBM, GoGrid, and some local providers in Europe and Asia. There’s no load inject that exists until I push the button to create them. They cost something like $.14 an hour per box and then I push a button when I don’t need them anymore and then they get destroyed. The idea is that the cloud based load tool handles the provisioning and installation of software for you and is on demand.
Joe: A lot of engineers I think still… I still hear this, not as often as I used to but, we need to test against a bare metal machine, but every organization now is using virtual machines. Have you seen that as something that someone that’s coming from a background of performance testing that still a hurdle they need to get over is just the fact of life now that we need to test against Vms.
Eric: I agree it’s just a fact of life. This is just the way that application architecture has changed. I’m still seeing the old school LoadRunner people who are balking at using virtualized load injectors and claiming that their instruments won’t correctly if they’re virtualized. There is some risk that you can virtualize incorrectly, but I mean this is just the world we live in now. Even the companies who’s primarily business isn’t technology, they virtualize virtually everything in their data center and there’s hold outs, there’s legacy applications and there’s legacy engineers who haven’t gotten with that, but I think that battle is over.
Joe: I definitely agree. I think in your presentation you’ve been using Vms since 2004 I think. In your experience it doesn’t cloud the performance results does it?
Eric: It can. This is always a concern. When it’s virtualized inside somebody’s private cloud, I have to kind of depend on whoever did that virtualizing that they’ve provided enough resources to the pool. A load test is a perfectly designed approach for defeating a leveling algorithm which is what the Vmware Vsphere uses to put virtual machines on the proper physical hardware. The idea is you put stuff on Vsphere and after some period of time it has sorted all the machines in a way to load balance their requirements onto the different physical nodes. Then a load test shows up and changes those equations. I’ve definitely been in a situation where I’ve run out of resources at the virtualization layer and had my results be corrupted. I was able to tell that that was occurring though because you should always monitor your injectors and because my response times were erratic. I didn’t understand any of them.
Joe: All these services out of our hands now? They’re in some other person’s cloud. I used to be able to go to the admins of these machines and say, “hey I need your help. This is a team effort. What did you do here? Can you figure this?” How do we handle this now in this new world where we don’t necessarily own the servers that we’re running our applications on.
Eric: It’s difficult. It’s taking a step back to a previous level of sophistication and how I was able to do performance testing. Back to before resources were monitored. In my current context with the cloud based load injectors, we do a lot of on demand testing where a customer brings us in and says, “you turn on the load, we’ll do the monitoring.” I’m working with an ops team that will tell me when they think a resource is being starved. At one point I was able to give somebody a report that said, “you’re out of this resource and you need to scale this resources to get to whatever work level.” That’s just gone now.
Joe: I definitely agree. That’s something that we still need to find better ways to overcome that, but it’s just the way of doing business now, so there’s no way round it unfortunately. I don’t think.
Eric: I don’t. Again, this is just something that has already happened and as testers, we’re the tail. We’re getting wagged at best, so that we don’t get to tell people well they should virtualize or not to solve our problems.
Joe: Absolutely. Talking about reporting, one thing I’ve noticed about newer tools- maybe like I said I haven’t been really done a lot with it- when I used a vendor based tool, the report was really robust. I was able to create nice graphs and layer graphs on top of one another and do that kind of analysis for the performance information, but I don’t see that with some of the open source tools. Do you have suggestions on how we could create better graphs with all the type of performance information we’re getting back from when we’re using our open source performance tools.
Eric: So acknowledging first that I do work for a tools vendor and one of the differentiators against open source is that we have that kind of functionality.
Eric: That being said, my experience with the open source is that it’s not as bad as everybody thinks. It does take some work and the first time you do it it will seem frustrating if you’re used to I push a button and a graph appears. What I’ve found is that using a tool like Jmeter that you can do some excel graphing… I’ve even dumped a fairly large test results into tableau and made some scatter charts with that. It’s more work, but that’s part of what you’re paying for when you buy a commercial tool. I haven’t got to use the grinder on a real project yet, but the visualization it shows is not too bad. I’d consider showing that to somebody if it showed the right thing.
The whole idea of reporting I think people talk about graphs, but any report that I give to an executive is going to have 2 maybe 3 graphs that have 2 or 3 data points on them and then have a paragraph- I think I got this from Mike Kelly, I don’t know where he got it from. Every graph deserves a paragraph. There’s a paragraph under the graph that says what you’re looking and what it means. I don’t use the graphs to try to tell a story as I do to try to support a narrative that the rest of my analysis is generated. You can’t send somebody 50 graphs like the default tool report button will give you and expect them to understand what occurred. You have to explain it. Especially if you’re working performance testing where most everybody you’re going to talk to knows less about what you’re doing than you do. You need be explicit about what you learned, what you say, and what that tells you.
Joe: Awesome. That’s a great point. That’s one of the biggest points I enjoyed about your presentation when you mentioned this: that you really need to put some context around a graph and just don’t send graphs to people and assume they understand what it means because, like I said, to me analyzing the results are sometimes the hardest piece of the performance test, so having that context, the paragraph actually talking about what’s going on rather than just sending it. You have a nice slide there where you actually show a mock-up graph that explains it.
Eric: Right, and there’s just the 2 or 3 graphs. It’s a human psychology thing I’m sure, but if I send somebody 20 graphs, that’s less useful to them than if I send 2 because when I send them 20 I don’t tell them what they need to focus on. If I want to communicate something key to somebody, I have maybe an hour to present results and take questions. I probably have less. I’m trying to do it in writing like in a report then I can communicate even less. The idea then is not just to produce the right data but to filter it down to the right size chunks that something can be done with them.
Joe: Awesome. Hopefully I’m not putting you on the spot, but you have a method that you use called CAVIAR that you actually went over when you’re presenting your results. I thought that was really helpful just having that acronym in the back of my head.
Eric: So CAVIAR came out of attempting to generate a pithy acronym for what it is we do after a test is complete and we turn results into actionable information. There maybe an extra A in there, but the idea was that generating data- a lot of data- is pretty easy to do once you’ve written the scripts and decided what you’re going to simulate. It was the thinning that out to the one little rich spoonful of what’s really useful to the business stakeholder that made it appropriate. I’m actually not a huge caviar fan myself, but I’ve tried it and one little scoop has so much flavor and is very satisfying and you only need a little tiny bit because it’s so rich, and that’s sort of the idea with the CAVIAR process is to get to that one little spoonful on one little toast point that you can hand to somebody and they can enjoy and then we can all get on with our lives.
Joe: I believe the acronym stood for Collecting, Aggregating, Visualizing, Interpreting, Accessing, and Reporting.
Eric: I think you can guess what the extra A is, but that is definitely the process as described. One of the important things is that there’s no jumping to a conclusion. That’s really easy to do when you’ve been performance testing for a while is to look at results and then spit out root cause. The danger in doing that is a) you’re often wrong, and only House MD is allowed to be wrong several times and get it right in the end and everybody forgets. The rest of us who live in the real world have to watch our credibility. The decisions we’re asking people to make based on our performance test often represent a lot of money and time. If we say you need to delay the launch of your e-commerce website by 3 weeks and fix these problems, that could be an awful lot of money on the line.
To be able to speak to those sorts of requirements and to be able to report on results from our test and attest to what they mean, we have to protect that credibility carefully. Going through the process of I saw this, I saw this, I saw this, it might mean this thing. These are the things I could check to support that argument. Kind of like the scientific method is going to be a lot easier to do, partly because it helps you use safety language to talk about: well it might mean this, and it could be this, and it invites people to collaborate with you. If you say your database sucks, the DBA is not going to help you. If you say we’re seeing an increase in response time and I see a corresponding increase in IO latency on the database log volume, the DBA will push you out of the way to get in there and figure out what that means. By presenting it as a finding instead of a conclusion you’ve been able to enlist that person and you been trusted as a scientific instrument. Instead of just somebody else who spouts of what I call dart throwing trouble shooting which is immensely frustrating if you’re trying to get some help on something and that’s the approach is just check this and get back to me. That’s a bad way to trouble shoot. It’s a really bad way to report on a performance test.
The idea with going through those step is I figure out what I know, I figure out what it means, I figure out what that could represent. Then I’m able to negotiate with all the people involved in that performance test project- the subject matter experts, the business stakeholders. Okay, this seems to be happening, let’s work together to figure out what it means. Then it helps us get to the proper conclusion and the SME’s are engaged, and their going to be able to take the data that you provided and do something useful with it. If I tell the DBA you’re storage is slow, or your storage appears to be slow, is there anything we could do? I’m going to get an entirely different answer and hopefully their engagement on how to fix it. Hopefully if I could go talk to that privately, what I present back to the business stakeholder is a completed plan of what the DBA is going to do before our next test.
Joe: That’s great advise. It’s almost like you want to ask open-ended questions because if you just pinpoint on something you may derail the whole troubleshooting process, and usually a DBA knows a little bit more than the performance tester. Maybe not, but usually like you said, doing an open-ended question or just not blaming them, getting their input, almost always leads to better results, I think overall.
Eric: Sure, that’s just the reality of things is that there’s a lot of focus in technology on being right and proving that your stuff works right and needing it to be proven to you that your stuff isn’t right before you’ll anything.
Eric: It’s good to just kind of side step that process and collaborate together. If I look at the most interesting stuff that’s coming out of the agile community, it’s about safety, it’s about people working together. That’s the kind of things that really- in my experience- you have to have a successful technical project. If you have a bunch of big brains having a pissing match with each other, then you’re usually not going to get anywhere until somebody calls them into it and demands people work together.
Joe: I completely agree, like once again back in the day, I would always run the same test at least 3 times and then get the average of those three tests and now when I think about it, I don’t even know if that was correct because someone told me one time you need statistical relevance and just running a performance test in time doesn’t necessarily give you the picture of what’s going on. What’s your approach that you run the same test multiple times, the same exact test in the same scenario, the same time just to make sure you didn’t have an anomaly when you first ran a test if you get a weird result or is just having that anomaly something that you need to look into because it could be a potential issue.
Eric: Wow, that’s a great question. Thanks for saving it for now.
Eric: I’m going to take some time to answer it. All models are wrong, some are useful. Our load test is going to be some model. We pretend that we’re going to have this closed deterministic system where the resources available are always going to be and the software is always going to be the same and the way that a multi-user system responds to a flow of requests coming in is always going to be the same and then we introduce randomness into the delays between the requests that we make. Then we introduce randomness into the pacing before a virtual user starts doing another thing. There are factors in our simulation that are going to make it hard to reproduce. The idea of a lot of precision and four digits after the decimal point I think is one of the not great things that’s legacy things in performance testing. I think that running multiple tests is necessary, that you do need to understand what’s going on to a certain extent. I try not to worry about chasing anything less than a 5 or 10% variance too far.
I do want to understand if there’s a variance, where it came from and to finally get back to where you started. Yes, I think you should run multiple test, so 2 or 3 tests is typical for me. Sometimes it’ll be a shakedown test where I just make sure everything is working, a baseline test where I run at a 30% of peak and just kind of get what a response time should be, and then the 100% of the peak load model to see what occurs. Taking those measurements separately, I can at least test whether they all agree with each other, or their all plausible against each other. Running the exact same test multiple times, I have done that. It can be expensive to run a load test depending on how many users you’re simulating and how many people have to be on the conference call while you’re doing it.
Joe: Awesome, that’s great advise. It’s good to know that I was kind of doing the right thing.
Eric: Sure. I’d love to run 10 load tests on each project it’s generally… in testing we always run out of time before we run out of things we could do to reduce risk.
Joe: Absolutely, it goes back to why I was always afraid to give results early, so I would at least run 3 of them so I didn’t send out… for some reason with performance testing it’s so easy to start fires with issues that aren’t real issues because you could be using the tool wrong, putting the load on, making too many transactions that would never happen in a normal day. I always get more confidence running it multiple times before I send out results.
Eric: That’s a key. You touched on some of the things we talked about before about your credibility. Once you’ve been wrong, and you’ve gotten a person who works on that technology to chase some mirage that you created with your tool, it’s going to be very difficult for people to pay attention to you next time. What I used to like to do when I was a LoadRunner consultant was: okay I saw this during the test, don’t hold me to it, I’ll give you a full report in 24-48 hours. So I’d have time to assess what all the errors meant. If broke the system, make sure that I wasn’t reporting anything that occurred after the system was broken. Making sure that I had changed the reporting window to after I reached peak load so that I didn’t report the lower response times that happened when I was still ramping. That kind of careful thoughtful analysis is one of the things that CAVIAR is trying to overcome and not just do the tool jockey: okay I pushed play then I pushed generate report, then I emailed it, I’m done.
Joe: I definitely agree, and I think performance testing is almost an art and it takes a lot of thinking and analyzing like you said rather than just… it’s not just a tool. It really is a lot more going on than just like you said running a test, looking at results. There’s a lot that goes into it to make it… because once again performance testing for some reason a lot of times once it goes into production, usually the results are totally different than what you’ve been working on for a few months.
Eric: Right, because something about the resources is going to be different. When I started this is what happens when you get into performance testing, everyone wants to talk about tools. A fool of the tool is still a fool and what I found by switching tools, based on my context is that the tools generally all do the same things. The implementation might be slightly different, but it’s a bit like different programming languages at different script languages, they all generally do the same things, it’s all generally the same generation it’s just maybe different names for them and different methods. Focusing on what the tool does means that you’re not adding any value. If all you do is operate a tool, then you should be replaced with a shell script.
Joe: That’s the whole point of automation. You may be able to automate something but it’s not necessarily doing the thinking that really needs to be involved in the process. Before we go, is there one piece of actual advice that you can give someone to improve the performance testing efforts and let us know the best way to find out or contact you.
Eric: Okay, so the first one: I think the best way to improve your performance testing efforts is to take a step back and find somebody to take a look at what you’re doing. I think that these projects turn into these multi-week, multi-month, multi-year where you go down a rabbit hole and you’re trying to build a certain test that may not necessarily be relevant to what your stakeholder want. May not tell you what you think it’s telling you, and in testing is really important to be able to focus and de-focus, but when I look at where I really wasted time or the mistakes I really wish I could have back, it was spending 5% on planning and then a lot of time on executing and figuring out that I didn’t spend enough time one planning.
The second piece of advise I think is encapsulated when we were talking about the reporting is to not a tool operator or then you’re just being a tool yourself. Be somebody who can understand what it is that the test is trying to determine or prove and be able to talk about the test results in that light. As opposed to- I achieved so many transactions in the CPU percentage and suddenly lost half the room that doesn’t even follow what you’re talking about anymore, the other half doesn’t care. Always come back to the business requirement, always come back to what the real question is. Sometimes it’s do we go live or not. Sometimes it’s do we need to buy more hardware or not. Make sure you understand what that is and who’s asking and how to communicate to them what they’re looking to know.
My full name: Eric Proegler is my twitter handle. I think that works for now. I don’t want to broadcast my e-mail address out into the whole world. LinkedIn, a lot of people have connected with me that I’ve never met, but it’s certainly a way that you can send me a message and I’d like talk to you and I’m really at point in my career where I’m trying to help people out and volunteer and give something back and there’re an awful lot of people who gave me a helping hand got me this far. Certainly isn’t something where I just studied and monkies isolation for 15 years. It was taking help from people who were willing to give it to me, and I’m hoping to pas that forward.