Buggy code isn’t the problem


I recently read a post by @TestinGil able the cause of fragile tests:
https://www.everydayunittesting.com/2024/08/testing-basic-fragile-tests.htm

In this essay, he argues that the problem with fragile tests is complex code.

Here is my response:

I disagree with both the premise of this post and the solution.

While sometimes tests appear flaky due to buggy or complex code, that is not the case the majority of the time.

Tests are flaky most often because of limitations in automation — timing issues, etc. Secondarily, tests are flaky due to incorrectly written test code. This accounts for the vast majority of cases. So much so, that it is usually justified to “blame the test” and understandable why all other causes get lost in the noise.

The third most common source of test flakiness are environment related issues — either the test infrastructure environment or the system under test. This is an addressable problem but since it is often not under control of either the testers or developers, it is often neglected. Typically this is an operations issue, but devops are cranky, and we should excuse them for being dismissive from having encountered problems of the first and second type so often.

Finally, system complexity (not code complexity) is the real issue with creating test complexity. Having to automate a complex workflow — that need not be so complex — is a real problem, and exacerbates the problems of inherently brittle automation and poor quality test automation code.

One way to alleviate this is to simplify tests, for example, by testing functionality at the API layer instead of the UI layer where possible. Or by using lower layers (API or DB) for data setup and validation.

But my main complaint here is with putting the cart before the horse. If you have flaky tests that fail because of defective code, the problem isn’t that the developers have written defective code and the solution isn’t that they need to test it better — that what testing is for!

If your tests are finding bugs in code, that is their intended purpose. A test that does not find a defect is wasted effort. We do not always know which tests will find bugs, so this waste is expected.

The theory is that QA provides value by being able to look at the result of developer code and more efficiently find defects than the developers themselves. If that is not the case, then the problem is QA, but I don’t believe that. I believe that dedicated testers provide a fresh perspective, and have specific goals and incentive to find defects in a way that developers cannot — and that it can be done cheaper.

That doesn’t mean that developer tests are not also valuable. They are, but should not be expected to catch everything.

The issue is, I think, that tests that are themselves flaky, slow, or uninformative make the effort of finding real defects too costly, and is something that should be addressed.

Simplifying systems reduces the opportunity for defects, obviously, but is really outside the scope of the problem.

Unit tests longer than the code they test?

Here is my response to the following question on Quora:

Is it typical for unit tests to have a longer length than the code they are testing?

Yep. And that’s a bad thing.

There are two things that tend to bloat unit tests.

1. Extensive setup

If you find there is a lot of setup code then the problem is that it is difficult for you to test in isolation. Either you are creating complex mocks & stubs — because of too many dependencies; or you are having to handle external state — reading & writing to files or databases, instantiating other objects, configuring environments, etc.

In which case it’s not really a unit test. Which is ok, those tests may have value, but you should probably put them in a different bucket and run them at different times.

2. Overly complex tests

If there are many steps to perform (that are not strictly setup) to accomplish the goal you need to test, then there may be an abstraction you’re missing. Or your code is tightly coupled.

If there are many validations needed — then you’re probably trying to test more than one thing, and need to think about what the goal of this particular test is. You don’t necessarily need to test that every value in your result is correct, or inspect that every expected method has been called in a single test. It can be multiple targeted tests.

If you find that you are duplicating a lot of steps or that you are wanting to validate a lot of things in your tests, it may be because setup is complex, or execution is slow. Again, you may not be dealing with unit tests.

The main goal of units tests is not that your code is tested. Test coverage metrics are garbage anyway. The goal of writing unit tests should be that your code is testable in small units. And that may mean refactoring and thinking about it in terms of independent components. Which is a good thing.

Having good unit tests enables safer refactoring. But refactoring is often needed to have good unit tests. It’s not a catch-22, it’s an iterative process. You may start with a bloated end-to-end test. But being able to run that confidently allows you to make small internal changes until you can have small, isolated units that are quickly and easily testable.

Choosing testing tools

I’d like to see something that is target / platform / process agnostic.

I don’t mean cross browser or mobile app automation tools.

I mean something for testing systems — database to cloud, network to user interface.

Something that is test framework, reporting tool, continuous delivery platform, project management process, and target environment agnostic.

The only thing that comes close is to use low level test runners of the *unit variety, and then roll your own 99%.

The library that wraps chrome devtools protocol is such a small part of the decision process, but ends up making so many decisions for you.

The trick is to not let it.

Testers don’t create value, do they?

When asked what value testers provide you often hear something like the following:

I’ll just hire good developers who don’t make mistakes.
Developers can test their own software anyways.

Here is my response to the question about what value testers provide.
Originally posted as a response to this post on LinkedIn by Bridesh about whether testers should be paid less than developers:

I like to think of testing as a force multiplier.

Maybe testing software isn’t as complex as developing it. But having someone focusing on testing the software allows developers to be more productive as well as produce higher quality work.

Allowing developers to focus on what they do best, product managers to focus on what they do best, designers to focus on what they do best, and operations to focus on what they do best — instead of having them all do something that is outside their expertise, causes context switching, and that they may be too close to see their mistakes — having someone who specializes is making sure things work, finding out how it doesn’t work, and informing all those other roles of ways they can improve their work in a disciplined (and diplomatic) way maybe doesn’t produce something tangible, but it increases the value and velocity of everyone else’s efforts.

Next time management questions what value testers bring, ask them what value managers deliver, and they will probably see the value of enabling others to be productive more clearly.

Make sure your tests fail

When you write automated tests, it’s important to make sure that those tests can fail. This can be done by mutating the test so it’s expected conditions are not met, so that the test will fail (not error). When you satisfy those conditions by correcting the inputs to the test, you can have more confidence that your test is actually testing what you think it is — or at least that it is testing something.

It’s easy to make a test fail, and then change it to make it pass , but testing your tests can be more complex than that — which is a good reason why tests should be simple. You’ll only catch the error that you explicitly tested for (or completely bogus tests like assert true == true.)

Not to say those simple sanity checks don’t have value, but an even better check is to write a test that fails before a change, but passes after the change is applied to the system under test. This is easier to do with unit tests, but for system tests there is great value in seeing tests fail before a feature (or bug fix) is deployed and then seeing it succeed afterwards.

It can still lead to bogus tests (or at least partially bogus tests) but a few of these type of tests being run after a deployment are extremely valuable and can catch all kinds of issues, as well as give greater confidence that what you added actually worked. This is especially useful when moving code (and configuration) through a delivery pipeline across multiple environments—from dev to test to stage to production.

Having (and tracking) these sort of tests — which pass only when the change is applied makes the delivery pipeline much more valuable,

Also don’t forget the other tests — those that make sure what you changed didn’t break anything else — although these are the more common types of automated regression tests.

Originally posted in response to Bas Dijkstra on LinkedIn:

Never trust a test you haven’t seen fail

Do we really need QA (anymore)?

In recent years there has been a trend towards empowering (or at least encouraging) developers to do thier own testing. Automation frameworks, continuous integration, unit testing, contract based testing, and various other tools and processes make developer testing easier and more effective than ever before.

So do we really need testers? Do we need a separate QA depepartment?

Obviously, we need to assure quality, but should there be a separate QA role?

A good craftsman, after all, should make certain to deliver quality work, and who better to make sure that happens than the expert who designed and built it? And software development is craftsmanship.

There are several reasons why I think that QA should still exist as a separate role, but it boils down to 2 major arguments:

  1. Perspective
  2. Time

Perspective

The saying goes that “you can’t see the forest for the trees” and this applies to developers and testing — they are too close to the product to spot it’s flaws. Or looking at it another way — they are focused on creating something, not on finding out how it can be broken. Much like a writer needs an editor, and an artist needs a critic. Having someone with a different perspective is valuable for seeing things someone intimately involved in the creation process may not see. There is definitely value in having an expert examine the system, but there is also value on having someone inspect it with a naive approach.

And then there is the issue with my statement above about a craftsman judging his own work. Even if my premise about a naive perspective bringing value in testing is wrong, the problem lies with the assumption that a single craftsman creates his productx in isolation. Many software projects are so big, that no one person can understand the whole scope of the system, so one person can be responsible for ensuring that everything works together correctly.

And it is primarily in the cracks between different systems (or parts of a system) that bugs are found.

Time

Secondly, developers don’t always have the time to fully test their software, and even if they make the time (and ignoring any issues about perspective), they question becomes — is this the most effective use of their time?

Developers are experts, not only in the craft of making software, but also in the systems they are making software for. As such, their time is valuable – – and why they are (and should be) handsomely paid for their expertise.

Having someone whose expertise is different (if not less) perform the more labor intensive, repetitive tasks where less expertise (at least in software development) is required only makes sense. The CEO isn’t expected to do everything at a company, even in situations when they might actually be able to. Doctor have nurses, technicians, and other assistants to help them maximize the value of their expertise, and thus the time they can spend on their specialty, and developers should be no different.

My Perspective on QA

I look at my role in QA as providing a fresh perspective, but also especially providing a way to maximize the productivity of developers, product owners, and others by providing the services that they might very well be able to do on their own, but would slow down their velocity if they had to.

I look at test automation as an extension of that. The goal of test automation is to make testers more effective, and relieve them of the boring drudgery of repetitive regressions, that might otherwise cause them to miss important defects, and that allows them to do creative, exploratory, destructive testing.

DynamoDB might be the right database for your application

DynamoDB was designed to fit Amazon’s pricing model.

In other words, the primary architecture goal of DynamoDB is to charge you for data storage and transfer. So it was designed to force you to scan the whole table for every item of data, and to put every attribute in every record.

Some clever people have discovered ways to (sort of) make it work for simple scenarios using GSIs and LSIs, but Amazon’s response was to charge you for using them and limit how much you can use them.

If you want to find a piece of data in DynamoDB without scanning the whole table, you have to copy the whole table to create an index (GSI) — and then be charged for querying (and storing) both tables. And you’re only allowed to have 20 GSIs per table.

By the way, you cannot relate data between tables in DynamoDB, so you have to have all data for your entire application is a single table (or else download the full contents of each table and do the comparison by scanning the full contents of each table in your own code — hopefully using EC2). That means for your entire application, you can only have 20 relations. And that’s not relations between 20 attributes. If you want to look up information where 4 different attributes relate to each other, that’s 11 out of 20 permutations right there. And 11 copies of the full table, just to be able to index and them and search. If you need 5, that’s 26 relations — you can’t have that many.

If you want to have a Local Sort Index (and avoid copying the full table), well, you’re only allowed 3, and you must define it at table creation time. How can you know that up front? Don’t worry, Amazon recommends destroying and recreating your table (more data transfer costs) once you need a relation.

Transactions are possible, but are also limited, and charged.

But it’s not the annoying copying, rebuilding, and miscellaneous costs that Amazon is going for. What they really hope is that you live with the initial design, and need to scan the entire table on every query, so they can charge you for “usage”.

So, if you have a very simple application, that will never increase in complexity or size, and you can always look up records by ID or will always was to download the whole data set to load into application memory, DynamoDB may be a solution you can use. Or a CSV file.

Test Automation Meetup Talk on Test Frameworks

I gave a talk for the Test Automation Meetup sponsored by Zapple on building and evaluating a test automation framework.

Here is a link to the test automation meetup video:
https://www.youtube.com/watch?v=7m9NKpJVvQ0

And here are my slides:

In this presentation I talk about:

What do I mean by “Test Framework”

What are the parts of a test automation framework

What different tools can be called a framework, or may be considered as part of your test automation framework

What are the goals you have for a test automation framework – What do you need it to do, and what do you need to know.

How to go about building a test automation framework

What are the criteria for evaluating a test automation framework

How to measure the success of your test automation framework

QA testing should do these 4 things:

  1. Make sure software works
  2. Make sure software does what it should
  3. Make sure software doesn’t do what it shouldn’t do
  4. Make sure software doesn’t break
    when you (the user) do something wrong, for example

    Bonus:
  5. Make sure software delights the user

Most of the time, test automation is really only doing #1 — making sure that it works, by navigating around the application, and performing basic functions.

This is ok.

This is what automation is good at.

But you also need to do other things. Things that are harder. Things that are difficult for humans to figure out how to do, and even harder for computers, despite the hype around “AI”.

Sidebar: Artificial intelligence and artificial sweeteners

Artificial intelligence is like artificial sweetener. It tricks your brain without providing any of the benefits, like:

  • Energy (in the form of calories)
  • Pleasure (to the taste)
  • Tooth decay

Artificial sweeteners only simulate the pleasure from taste, which is really an anticipation of energy. It’s like the dopamine from a drug, or a video game that tricks your brain into thinking you’ve accomplished something.
Likewise, AI only simulates thinking, and it’s outputs give you a false confidence that someone has thought about the implications.
A large language model (LLM) like ChatGPT literally has no idea what it has written, whether it is bad or good, right or wrong, self contradictory or repetitive, or if it makes any sense at all.
The generative AI models don’t know how many fingers a human should have, whether a picture is a cate or a dog, or the thing it draws is representational at all, much less if it is possible to exist in any real space or follows any rules of logic or consistency.
The idea of leaving testing up to generative computer “AI” models is preposterous, given that testing is supposed to answer exactly these types of questions.

1. Does it work?

As I said, making sure something works is the easy part. Does an application start when I launch it? Can I login? Can I see a report when I click a button?

But does it work right?

B. Does it do what it should?

This is the area of functional testing. How can I tell software does what it should unless I know what it should do?

For that you need requirements.

Some requirements can be inferred. A tester can often figure out if software that is working, is doing the right thing using common sense and their knowledge about a topic.

Sidebar: Shopping cats and usability dark patterns

Wouldn’t it be nice if your shopping cart at the grocery store could scan the items as you take them off the shelf and tell you how much you’re going to spend?They discovered that you’re less likely to buy as much stuff if you realize how much you’re spending.

Look for this feature in online shopping carts as competition vanishes. When it’s hard to figure out your total, you can be pretty sure we are in an effective monopoly.

But some requirements are subtle. Does this item calculate price per item or per weight? What taxes are applied to which people and what products?

And some requirements require specialized knowledge. Domain knowledge. Knowledge about the business developing the software, or knowledge about the how and what software will be used for. Medical diagnostics, or aeronautic controls for example.

Sidebar: Agile considered as a helix of combinatorial complexity

If you have the requirements, that is, you can perhaps test them — assuming you understand them. But in this day and age of big a “Agile” top down bureaucracy and time filling meaningless ceremonies and complex processes and un-user-friendly tools, requirements are less clear than ever.
But I digress. Again.

If you have tests (and automation) that is going to make sure software does what it should, you’re going to need to

1. Know what it should do, and

2. Map your tests to those requirements

That is, assuming your tests (which are software), are actually doing what they should do.

Oh yeah, and you also need to

3. Know how to verify that those requirements are being met.

Because it’s virtually impossible for anyone to understand, much less enumerate, ***all requirements***, it stands to reason, you won’t be able to automate them all, or track what they are doing.

Combine this with the many numerous ways you can accomplish something:

- click a button
- click back
- no click it again
- hit refresh
- and so on

And you have a nearly infinite variety of ways that requirements can be met.

Not to mention the potential number of ways software can break along the way.

Actually, I am going to mention it. Because that is our next point:

4. Will it break?

Using the software will tell you if it works, and more than likely, as a tester you will discover novel and interesting 

(some may say “perverse”, or “diabolical”, but we’re not developers or project managers here.)

ways the software can break.

In fact, good testers relish in it. They love finding bugs. They love breaking things. They love seeing the smoke come out and the server room catch on fire.

Cloud data centers have really made this less fun, but there are additional benefits (*ahem* risks) to running your code over the network to servers halfway across the world controlled by someone else. And additional ways things can go wrong.

And they (I should say “we”, because I am myself a tester) get even more satisfaction, the more esoteric or bizarre ways they can trigger these bugs.

Perhaps nothing gives us more delight than hearing a developer scratch there head and say “It worked for me!” when we can clearly prove that’s not the case in all cases. Or for all definitions of “work”.

Breaking things is the delight of the tester, and while there are tools that can put stress on software with high loads and random inputs, nothing beats a dumb human for making dumb mistakes.

And finally, since I like to do things out of order (to see if something breaks) we need to see what else software can do (that it probably shouldn’t do):

# X. Bugs

Have you ever come across an unexpected behavior in software and been told,

“That’s not a bug, that’s a feature”

No, dear developer, that’s a bug, not a feature.
If it’s doing what it’s not supposed to, it’s not supposed to do it.
So it stands to reason that any undocumented feature should be considered a bug.
But as we pointed out earlier, not every feature requirement can be documented, and the effort probably isn’t even worth it, because, let’s be honest: no one will ever read comprehensive documentation, much less test it all

  • Every.
  • Single.
  • Time.
  • Something changes.

What’s the difference between a bug and a defect?

Some would have you say spelling is the only difference. I disagree. I think the pronunciation is also different.

A defect is when something you wanted in the system isn’t in it. Something is missing. A requirement isn’t met.

A bug (as Admiral Grace Hopper *allegedly* found out the hard way) is when something you didn’t want gets into the system.

Whether it’s a moth or Richard Pryor doesn’t matter. The point is, it’s not supposed to be there. But it is.

Sometimes this breaks the system (like in Admiral Hopper’s case) other times, it doesn’t break the system (as in Richard Pryor’s case).

It could be a security issue, but it doesn’t have to be. It could just live there happily, taking up bits, and burning cycles and nobody ever notices anything is wrong (except whoever pays the AWS bill).

Anyway, it shouldn’t be there if it isn’t intended to be there, even if it’s beneficial. If you discover it, and it turns out useful, you can document it, and then it becomes a documented feature.

No, adding a note to the bug report “Working as intended” does not count as documenting a feature.

But, it’s very hard to prove a negative. That is that it doesn’t have a feature it shouldn’t have.

***

So to reiterate, there are 4 things that testing (or Quality Assurance) should be accomplishing:

1. Making sure it works

Automation, or any random user, can see that this is the case. However, just because something works, doesn’t mean that it does what it’s supposed to, that it doens’t do what it shouldn’t, that it will keep working when it comes into contact with the enemy — I mean users.

Smoke tests fit well into this category, but it should go beyond just making sure it doesn’t burst into flames when you first turn it on.

B. Making sure it does what it’s supposed to

You need to know what it’s supposed to do to test this. Some things are obvious, but in some cases, requirements are needed. But comprehensive documentation is not practical.

This is often considered functional testing. Some of this can be automated, but due to many factors (including the reasons above), it’s not practical to automate everything.

 4. Making sure it doesn’t break

This can be harder to prove. But it’s important. Just because something is working at one point, doesn’t mean it always will be.

Load & Stress testing are a part of this. But so is “monkey testing” or “chaos testing” which as the names imply, are unguided.

Testers with their pernicious creativity and reasoning abilities can go beyond random behavior and deliberately try to break things.

The goal here is to make the system stable.

X. Making sure it doesn’t do what it’s not supposed to do

This is the hardest part, but the funnest part of testing. Often when something breaks, (e.g. a buffer overrun), it can also have unexpected behavior.

It can have serious security implications, but also may cause usability issues.

Which brings us to our bonus point:

# Bonus: Making sure it delights the user.

Something can work flawlessly, be perfectly secure, fulfill all requirements, and still be an unmitigated pain in the neck to use.

In actuality, trying to make something robust, reliable, secure, and complete ***usually*** ends up harming usability.

Add to this the simple principle that someone who created the system is ***probably*** going to understand the system better than someone who didn’t, means that they may make assumptions about how to use it that are either not valid, or obvious to the intended user.

Usability testing is an important part of testing and pretty much can’t be automated (although I’d be interested to hear ideas about how you think it could.)

Usability testing is also often neglected, or not done from the user perspective.

Anyway, that’s all I have to say about that, for now.

What are some Selenium WebDriver locator strategies?

Here is my answer to the question from Quora

What are some locator strategies that can be used in Selenium WebDriver?

Selenium WebDriver has several locator strategies — or methods for locating elements.

Whey you want to find an element, you need to locate it on the page. The way Selenium does this is by using Javascript to parse the HTML source content. In Javascript you can do the following:

document.getElementById(locator)
document.getElementsByName(locator)
document.getElementsByTagName(locator)
document.getElementsByClassName(locator)

WebDriver has corresponding locator strategies:

driver.findElement(By.id(locator))
driver.findElement(By.name(locator))
driver.findElement(By.tagName(locator))
driver.findElement(By.className(locator))

It also has additional methods for locating by XPATH, CSS Selector, and link text:

driver.findElement(By.xpath(locator))
driver.findElement(By.cssSelector(locator))
driver.findElement(By.linkText(locator))
driver.findElement(By.partialLinkText(locator))

XPath and CSS selectors are ways to parse the HTML document and give more precise locators including a combination of element tag hierarchies, attributes, CSS classes, and relative position (parent, child, sibling). I won’t go into details, but these are powerful strategies for parsing a document and finding specific elements based on several criteria.

LinkText and PartialLinkText searches for anchor <a> tags that contain the given text for the locator.

By.linkText(“Click Here”)
By.partialLinkText(“Click”)

WebDriver also has corresponding findElements() (plural) methods that can locate a list of matching elements. For instance, you can find all elements with tag name <div> or matching xpath //table/h1 (find all H1 tags within a table). By default, findElement() (singular) will return the first matching element.

Selenium 4 also introduced Relative Locators which can modify an existing locator with terms “above”, “below”, “rightOf”, “leftOf” or “near” (near meaning within 50 pixels). In practice, relative locators are often not reliable, because layout typically depends on a fixed screen size and layout. One use for relative locators is to check responsive layouts given a known screen size. For example, to make sure a button is below a div on mobile devices, but beside it on a full screen:

By mobileButton = RelativeLocator.with(By.id(“myButton”)).below(By.id(“myDiv”))
By desktopButton = RelativeLocator.with(By.id(“myButton”)).rightOf(By.id(“myDiv”))

Now, the next question is: Which locator strategy should I use — and why?

By.id is the most efficient locator, the most concise, and the least likely to change. But not every element has a unique id attribute. Use it when you can.

<button id=”login”>
driver.findElement(By.id(“login”))

By.name is useful for form elements, and is also concise and specific.

<input name=”email”>
driver.findElement(By.name(“email”))

Tag and and class name are often useful for finding all elements that match that specific criteria:

driver.findElements(By.tagName(“a”)) ← this will find all links on the page
driver.findElements(By.className(“dark”)) ← this will find all elements with the “dark” class attribute.

XPATH is definitely the most versatile, but can be very ugly and easy to break

driver.findElement(By.xpath(“//table/*/div[\@class=’result’]/a[contains(text(), ‘edit’)]") ← find the edit link in the first table that contains a div element with class name “result”

But CSS selectors can do most of the same things as XPATH (except finding parent, child, sibling, or text nodes) and is often more readable.

driver.findElement(By.cssSelector(“table div.result > a”)) ← find the first link in the first table within a div with class name “result”.

Note: CSS selector cannot find elements by specific text.

As you can see, CSS (and XPATH) locators can incorporate the above strategies (tag name, class name, id) into one locator. Many people prefer to use one of these locator strategies exclusively for consistency.

However, an important “strategy” when using XPATH or CSS selectors is to not use complex selectors that depend on the document hierarchy. You should try to find a unique locator as specifically as possible, by id, name, or tag/class combination that will not be likely to change as the page layout changes.

If you cannot identify a single element definitively, you can look for the closest unique parent element. Using a relative XPATH or CSS selector (different from a relative locator like “above” or “below”) from that parent is a good strategy.

driver.findElement(By.cssSelector(“#uniqueId > div”)) ← find the first div child of an element with the uniqueId.

In CSS Selectors:

div#uniqueID ← search for a <div id=”uniqueId”> element with ID attribute
div.className ← search for a <div class”=myClass”> element with class attribute

Personally, I recommend that given the choice between XPATH and CSS selectors, to choose CSS when possible. Both for readability, and as a practical consideration — web developers know CSS selectors well, but usually do not use XPATH.

Finally, you can locate one element, and then search for other elements below it by performing two searches.

driver.findElement(By.xpath(“//table”)).findElement(By.cssSelector(“.results”)) ← find the first table, then find the element with className “results”. 

This does incur a slight performance penalty by making multiple WebDriver findElement calls. You should try to find elements with a single locator when possibly, but not at the expense of either readability (complex locators) or maintainability (likely to change). These often coincide.

In summary, you should try to find unique locators for elements that will not break as the page layout changes. Finding elements by ID is the most efficient. XPATH, and then CSS selectors are the most versatile, and you can often get whatever you want with one of these two. You should strive for simple locators that identify an element uniquely, but avoid complex hierarchical locator strategies when possible because they can lead to difficult to maintain code.