Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

Imagine you are creating a video player in JavaScript. This video player loops the user's video repeatedly using a recursive function and, because of that, the browser will trigger a too much recursion rangeError at some time.

Probably no one will use the loop feature that much. Your application will never throw this error, not even if the user left the application looping for a week, but it still exists. Solving the problem will require you to redesign the way looping works in your application, which will take a considerable amount of time. What do you do? Why?

  • Fix the bug

  • Leave the bug

Shouldn't you only fix bugs people will stumble in? When does bugfixing become overkill, if it ever does?

share|improve this question
57  
Don't mess with my example case scenario mate – Tiago Marinho yesterday
6  
@PlasmaHH I'm using this hypothetical scenario to explain my question. If the bug exists or not it doesn't matter at all – Tiago Marinho 23 hours ago
10  
@TiagoMarinho: the point I am trying to make is: sometimes its just the right thing to do to define such a scenario as the intended behaviour. – PlasmaHH 23 hours ago
14  
Why on Earth would you run such a loop using recursion in the first place? You might not want to fix the bug, but you sure ought to reconsider your design process :-) – jamesqf 21 hours ago
8  
This seems more like a business question. You have to prioritize based on the cost-to-fix, and the impact/frequency of the bug. – Darthfett 20 hours ago

12 Answers 12

up vote 85 down vote accepted

You have to be pragmatic.

If the error is unlikely to be triggered in the real world and the cost to fix is high, I doubt many people would consider it a good use of resources to fix. On that basis I'd say leave it but ensure the hack is documented for you or your successor in a few months (see last paragraph).

That said, you should use this issue as a "learning experience" and the next time you do looping do not use a recursive loop unnecessarily.

Also, be prepared for that bug report. You'd be amazed how good end users are at pushing against the boundaries and uncovering defects. If it does become an issue for end users, you're going to have to fix it - then you'll be glad you documented the hack.

share|improve this answer
71  
Totally agree with "You'd be amazed how good end users are at pushing against the boundaries and uncovering defects." – Spotted yesterday
41  
End users are in no way restricted by what you think is a reasonable use of your software. There will be users who want to loop a video forever. It's a feature that your software provides, so they will use it. – gnasher729 yesterday
16  
@gnasher729 "10-hour XXXX" videos on Youtube is a good identifier that indeed, some people just want to loop something forever. – Chris Cirefice yesterday
13  
Another problem: If your software is popular, then someone encounters a bug that indeed happens in a rare situation only, posts it on the internet, and suddenly everyone and their dog says "this software is rubbish, it crashes if I loop a video for a day". Or a competitor uses it to demonstrate how easy it is to crash your application. – gnasher729 yesterday
5  
+1 for 'ensure the hack is documented' – Pureferret 22 hours ago

There was a similar bug in Windows 95 that caused computers to crash after 49.7 days. It was only noticed some years after release, since very few Win95 systems stayed up that long anyway. So there's one point: bugs may be rendered irrelevant by other, more important bugs.

What you have to do is a risk assessment for the program as a whole and an impact assessment for individual bugs.

  • Is this software on a security boundary?
  • If so, can this bug result in an exploit?
  • Is this software "mission critical" to its intended users? (See the list of things the Java EULA bans you from using it for)
  • Can the bug result in data loss? Financial loss? Reputational loss?
  • How likely is this bug to occur? (You've included this in your scenario)

And so on. This affects bug triage, the process of deciding which bugs to fix. Pretty much all shipping software has very long lists of minor bugs which have not yet been deemed important enough to fix.

share|improve this answer
1  
I also recall the (hardware) bug in some Intel CPUs where a specific floating point value went all wrong. – William Kappler 22 hours ago
3  
@WilliamKappler en.wikipedia.org/wiki/Pentium_FDIV_bug is what I believe you are referring to. Was up for a year before anybody noticed it. – Jeutnarg 17 hours ago
1  
That 49.7 day bug cost Microsoft dearly in reputation. – gnasher729 15 hours ago
3  
@gnasher729 - Not really, they were already at the bottom and still digging :) Most people had to re-install Win 95 more frequently than 49.7 days IIRC. – mcottle 11 hours ago
2  
@Luaan The comment was intended as a lighthearted dig at M$, hence the smiley after the first sentence. They were behind the eightball with '95 because it came out very late in 95 (probably because having Win95 released in 1996 would have been a bad look), half baked (Remember the USB BSOD?) and inclined to become unstable and require regular reinstalls hence my second sentence - which never mentioned running a server on Windows 95, I don't know where you got THAT from (flashbacks?). The second release CD improved matters but the initial release of '95 was a doozy. – mcottle 5 hours ago

The other answers are already very good, and I know your example is just an example, but I want to point out a big part of this process that hasn't been discussed yet:

You need to identify your assumptions, and then test those assumptions against corner cases.

Looking at your example, I see a couple assumptions:

  • The recursive approach will eventually cause an error.
  • Nobody will see this error because videos take too long to play to reach the stack limit.

Other people have discussed the first assumption, but look at the second assumption: what if my video is only a fraction of a second long?

And sure, maybe that's not a very common use case. But are you really sure that nobody will upload a very short video? You're assuming that videos are a minimum duration, and you probably didn't even realize you were assuming anything! Could this assumption cause any other bugs in other places in your application?

Unidentified assumptions are a huge source of bugs.

Like I said, I know that your example is just an example, but this process of identifying your assumptions (which is often harder than it sounds) and then thinking of exceptions to those assumptions is a huge factor in deciding where to spend your time.

So if you find yourself thinking "I shouldn't have to program around this, since it will never happen" then you should take some time to really examine that assumption. You'll often think of corner cases that might be more common than you originally thought.

That being said, there is a point where this becomes an exercise in futility. You probably don't care if your JavaScript application works perfectly on a TI-89 calculator, so spending any amount of time on that is just wasted.

The other answers have already covered this, but coming up with that line between "this is important" and "this is a waste of time" is not an exact science, and it depends on a lot of factors that can be completely different from one person or company to another.

But a huge part of that process is first identifying your assumptions and then trying to recognize exceptions to those assumptions.

share|improve this answer
    
Very good point Kevin. Note my comment on the selected answer above that focuses on the analysis question What's the worst thing that could happen? – O.M.Y. 10 mins ago

I would recommend that you read the following paper:

Dependability and Its Threats: A Taxonomy

Among other things, it describes various types of faults that can occur in your program. What you described is called a dormant fault, and in this paper it is described like this:

A fault is active when it produces an error, otherwise it is dormant. An active fault is either a) an internal fault that was previously dormant and that has been activated by the computation process or environmental conditions, or b) an external fault. Fault activation is the application of an input (the activation pattern) to a component that causes a dormant fault to become active. Most internal faults cycle between their dormant and active states

Having described this, it all boils down to a cost-benefit ratio. The cost would consist of three parameters:

  • How often would the issue present itself?
  • What would the consequences be?
  • How much it bothers you personally?

The first two are crucial. If it is some bug that would manifest itself once in a blue moon and/or nobody cares for it, or have a perfectly good and practical workaround, then you can safely document it as a known issue and move on to some more challenging and more important tasks. However, if the bug would cause some money transaction to fail, or interrupt a long registration process, thus frustrating the end user, then you have to act upon it. The third parameter is something I strongly advise against. In the words of Vito Corleone:

It's not personal. It's business.

If you are a professional, leave the emotions aside and act optimally. However, if the application you are writing is a hobby of yours, then you are emotionally involved, and the third parameter is as valid as any in terms of deciding whether to fix a bug or not.

share|improve this answer

That bug only stays undiscovered until the day someone puts your player on a lobby screen running a company presentation 24/7. So it's still a bug.

The answer to What do you do? is really a business decision, not an engineering one:

  • If the bug only impacts 1% of your users, and your player lacks support for a feature required by another 20%, the choice is obvious. Document the bug, then carry on.
  • If the bugfix is on your todo list, it's often better to fix it before you start adding new features. You'll get the benefits of zero-defect software development process, and you won't lose much time since it's on your list anyway.
share|improve this answer

Expecially in big companies (or big projects) there's a very pragmatic way to establish what to do.

If the cost of the fixing is greater than the return that the fix will bring then keep the bug. Viceversa if the fix will return more than its cost then fix the bug.

In your sample scenario it depends on how much users you expect to lose vs how much user you will gain if you develop new features instead of correcting that expensive bug.

share|improve this answer
4  
The ROI for fixing a bug is seldom easy to evaluate - you generally have to rely on your judgment. – Ant P 22 hours ago
    
The return that the fix will bring is mostly reputation which is almost impossible to quantify. If I am the only one that even knows that there might be a bug and then in a year or two I switch jobs and the new company is thinking of embedding a video player in their product (possibly selling millions of units) would I recommend using this one? – Jerry Jeremiah 12 hours ago
    
@JerryJeremiah if the bug prevents a business process from running it's not about reputation, it depends on the business process' importance. And in every case and every policy you apply to correct bugs or not you have to make a subjective evaluation based on your experience and knowledge. Even if you can know the exact number of user who will face the bug you still have to make a human choice (also ROI policy can also include bug hits stats to extimate costs). As today there's no mechanical way to know a priori the perfect thing to do. – JoulinRouge 5 hours ago

tl;dr This is why RESOLVED/WONTFIX is a thing. Just don't overuse it - technical debt can pile up if you're not careful. Is this a fundamental problem with your design, likely to cause other problems in the future? Then fix it. Otherwise? Leave it be until it becomes a priority (if it ever does).

share|improve this answer

There are actually three errors in the situation you describe:

  1. The lack of a process to evaluate all logged errors (you did log the error in your ticket/backlog/whatever system you have in place, right?) to determine whether it should be fixed or not. This is a management decision.

  2. The lack of skills in your team that leads to the use of faulty solutions like this. This is urgent to have this addressed to avoid future problems. (Start learning from your mistakes.)

  3. The problem that the video may stop displaying after a very long time.

Of the three errors only (3) might not need to be fixed.

share|improve this answer

One thing I've learned in my years of coding is that a bug will come back. The end user will always discover it and report back. Whether you will fix the bug or not is "merely" a priority and deadline matter.

We've had major bugs (in my opinion major) that were decided against fixing in one release, only to become a show stopper for the next release because the end user stumbled upon it over and over again. The same vice-versa - we were pushed to fix a bug in a feature that nobody uses, but it was handy for management to see.

share|improve this answer

I once wrote a program in C# to display the image of a license plate whenever you clicked on the plate number. A QA manager was able to make it crash by hooking up two keyboards and pressing the up arrow on one keyboard and the down arrow on another keyboard. I fixed it by putting requests to display license plate images in a queue for a separate thread to fetch. If the queue got large, the fetch thread deleted items from the queue and then started sleeping for 300 milliseconds between image displays. On many occasions, I have had to go back and make my desktop programs resistant to a denial-of-service attack by QA.

For instance, if QA clicks on buttons as fast as they can, I code it so when one button is clicked, all buttons become disabled until the first action finished. This is a consequence of starting a background thread to complete a task when the user clicks a button.

The answer is that I recommend is that if QA does something crazy to make your program crash, you fix the bug. These things happen because someone is trying to make you look bad or some other political reason. You just have to smile, work through lunch, and fix it.

share|improve this answer
9  
You make it sound like QA was being unreasonable but, on the contrary, they are doing exactly what they need to do to ensure that your software is robust enough to survive malicious use. – Lightness Races in Orbit 19 hours ago

There are lots of answers here discussing evaluating the cost of the bug being fixed as opposed to leaving it. They all contain good advice, but I'd like to add that the cost of a bug is often underestimated, possibly hugely underestimated. The reason is that existing bugs muddles the waters for continued development and maintenance. Making your testers keep track of several "won't fix" bugs while navigating your software trying to find new bugs make their work slower and more prone to error. A few "won't fix" bugs that are unlikely to affect end users will still make continued development slower and the result will be buggier.

share|improve this answer

A post-it on a senior developer's desk at my workplace says

Does it help anyone?

I think that's often a good starting point for the thought process. There are always lots of things to fix and improve - but how much value are you actually adding? ...whether that's in usability, reliability, maintainability, readability, performance... or any other aspect.

share|improve this answer

protected by gnat 3 hours ago

Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?

Not the answer you're looking for? Browse other questions tagged or ask your own question.