So you’ve got your basic testing framework, 100% unit test coverage, a pile of functional tests, automated builds, and and a decent deployment script. In short, you’ve acheived the 1 click holy grail of test automation. Your functional tests run automatically every hour, after every checkin, or whenever you feel like it — against a clean system with the newest code.
Why haven’t you attained testing nirvana yet?
Well, there are still a few more challenges you might be facing. Here are some that I’ve run into:
Email & Other Services
Many applications need to send (or worse, receive) email. How do you handle that? At the unit level, you can use a simple mock, and at a higher level you can use dummy server like MailTrap or Dumbster. But what if you really need to test that emails are being sent? You could also use a simple SMTP server like James, but sometimes you need to be able to test with a production-like mail server, which can require special expertise.
If you’re sending actual emails, that opens up a whole slew of new issues.
What about network policies, spam filters, etc? Besides the technical challenges, there can be bureaucratic issues to deal with.
How do you check reply-to addresses in the test environment if the server names are different?
How do you test external mail systems? If you’re sending email, where do you send it? You could set up test hotmail or yahoo accounts or use a temporary service like Mailinator.
How do you verify email results? It can be challenging, especially stepping outside your app. Webmail apps (like Gmail/Hotmail/Yahoo) resist automation efforts, and rich clients aren’t automatable using web automation tools like Selenium.
What if you need to test how it looks in multiple mail clients, or formats? One time, I wrote a test to verify text vs. html formatting and had it send mail to my personal email address where I could check it with mutt. Someone “borrowed” my code, and I ended up with over 100 test emails every night when the billing regression scripts ran.
That’s just email. It’s the most complex, but you might have other services and systems that are needed. FTP, Cron, VPN, SNMP, etc.
External Systems
Besides the services such as Email that are often considered a part of the underlying operating system, there are other applications that may interact with the system under test. There are a variety of reasons that it may not be feasible or practical to use them including licensing issues, complex installation, and often an insurmountable hurdle — you don’t own the system.
Web services can fall into this category, as well as SAAS systems. Examples include Payment Processing, Address Verification, etc.
While test systems might exist (if you’re lucky), they are often not as robust or performant, and may have restrictions on the number of requests, and the complexity of the response. There are also potentially firewall and security issues.
You may have to build mocks, which don’t cover the range of potential circumstances, or even simulated services, which could have their own bugs (or not expose actual bugs in the production system.)
External systems are a significant challenge to integration tests, and sometimes isolating your system from them is not desirable or even possible.
Firewalls, Networks, Load Balancers, Latency
I’ve touched twice already on network and firewall issues, but besides the corporate LAN firewall, there is the issue of testing against a realisting production network, which can include firewalls, a DMZ, load balancer, clustered servers, and traffic shaping.
How can you test a complex deployment environment? I can tell from first hand experience that some of the trickiest bugs can surface when you start using multiple application servers, or when your load balancer mangles your session information, or when the database isn’t on the same server as the web front end. How can you be sure that your intrusion detection system won’t proactively block a webservice (or prevent a real DOS attack.)
A complex network also means that result times aren’t going to be the same. Not only do clients not connect over 100 MBit ethernet, but a production network topology will have it’s own latency issues.
Performance
Besides network latency, there are other performance issues that are challenging to tackle.
How do know if performance in your test environment is adequate for production? (I actually had an incident once where the test server was significantly newer than production, so while it performed adequately in test, it was unacceptable in production.)
You may not have budget for the same hardware in test that you have in production. You might not be able to load balance across multiple application servers because you don’t have licenses for it. Your database won’t have the same volume of data. All these and more can be challenges to your test environment.
Production Data
I mentioned production data in relation to performance, but besides the volume of data, there are other issues in attaining production-like data.
One of my favorite bugs is one that was dependent on production data. I don’t remember what it was, but some data in production had invalid characters (unicode or 2 bit asian) that was causing our system to blow up, but we couldn’t reproduce it test, no matter how hard we tried. Turns out that recent versions of the content management system disallowed or escaped them, so they weren’t an issue in test, even if we cut and pasted the offending text. It was literally impossible to reach that state. But it was happing in production. The only way I was able to find the issue was by installing TCPDump on the production server to capture the stream. Thank goodness for a cooperative sysadmin.
I guess the issue of that story has more to do with production-like systems than data, but it was caused by real production data.
I remember another issue with production credit cards failing. Of course we couldn’t tell what the cards were, because we didn’t store the card numbers, just a hash. That one was because our code didn’t expect card numbers greater than 16 characters, and they were being truncated.
The point of these stories being that production data sometimes does stuff that you don’t expect. And sometimes there’s no substitute for production data.
In the case of credit card data or other sensitive info, how can you pull from production? You might need to scrub data to avoid identification. And knowing just what you need to pull from production can be challenging. You don’t need that 20 million row transaction log, or do you?
Time Dependent Results
Credit card processing reminds me of another issue: Time Dependency. You might run credit cards as a batch every night at 12:30, but can your simulator or test system handle that? Do you have to wait until midnight to check your credit card failure test cases?
What about billing, which may only run once a month? Or email newsletters?
What about daylight savings? Everyone testing time sensitive applications a couple years ago remembers the daylight savings switch. (I’m expecting it to switch back real soon now — and billions more in lost productivity, but it’d be worth it for more realistic times.)
How can you check cron (and other scheduled) events? Changing the system time may work for some, but not for others.
What about timezone issues? Are your test servers and production servers running on UTC or local time? Are they all in the same time zone? What about external systems?
Manual Intervention.
With many of these issues, I’ve found there are some things that can’t be fully automated, or automation isn’t worth the effort. You might have to separate some of your tests out that require manual intervention. How do you flag a test that is 99% automated, but requires one step that can’t be automated or can’t be easily automated.
Test Organization and Maintenance
Apart from all the technical hurdles, there are the challenges of test organization and maintenance. When you have hundreds (or thousands) of functional tests, how do you keep track of them all? They are not simple and isolated like unit tests. They are environment and system dependent. They may require manual steps. They may dependent on data setup, or other tests passing.
A test may pass but no longer be testing anything of value. Or it may fail do to some UI change or process modification. They are brittle. They are complex. You can follow good practices to combat these problems, but you can’t avoid them.
And then there’s the matter of how long it takes. Eventually, your functional tests are going to start taking longer to run. How do you break them down? You can have a core/full regression. You can separate acceptance tests and regressions. You can have smoke tests. It’d be nice to be able to tell which tests are affected by which system changes, but sometimes you can’t be sure. Or you can be sure and wrong.
One incident I remember involved changes to the admin interface, a completely separate application. So the customer application didn’t need regression tested, right? Except that a base class that both applications inherited from was accidentally modified by an ambitious IDE refactoring.
This long rambling post describes some of the challenges that may arise once you have successful test automation in place. I could say that the moral of the story is that there’s no silver bullet, but that would be glib. Being aware of the issues and potential workarounds can only help.
Have you encountered similar (or other) challenges with test environments? What have you don’t to surmount them? I’d love to know.