Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

Should I write unit tests for complex regular expressions in my application?

  • On the one hand: they are easy to test because input and output format is often simple and well-defined, and they can often become so complex so tests of them specifically are valuable.
  • On the other hand: they themselves are seldom part of the interface of some unit. It might be better to only test the interface and do that in a way that implicitly tests the regexes.
share|improve this question
5  
"they themselves are seldom part of the interface of some unit." - if your classes have interesting code buried deep under the interface, break up your classes. This is an example of how thinking about tess can improve design. – Nathan Cooper 21 hours ago
1  
The same question in a more general manner: which internal components should be unit tested? See programmers.stackexchange.com/questions/16732/… – Doc Brown 20 hours ago
    
Sorta related, see Regex101. They have a section to write unit tests for your regex.For example: regex101.com/r/tR3mJ2/2 – David Grinberg 13 hours ago
2  
Disclaimer - this comment is my humble opinion: 1 first of all I believe that the complex regexps are pure evil - also see blog.codinghorror.com/… 2 real value of testing such expressions comes when you test them over a large database of real data blog.codinghorror.com/testing-with-the-force 3 I have a strange feeling that these tests are not unit tests exactly – Boris Treukhov 11 hours ago

Testing dogmatism aside, the real question is whether it provides value to unit test complex regular expressions. It seems pretty clear that it does provide value (regardless of whether the regex is part of a public interface) if the regex is complex enough, since it allows you to find and reproduce bugs and prevent against regressions.

share|improve this answer
2  
+1 sanity prevails – Tony Ennis 12 hours ago
5  
+1, though if a regular expression is complex enough that this is an issue, then it probably makes sense to move it into a "wrapper" unit with appropriate methods (isValid, parse, tryParse, or whatnot, depending exactly how it's being used), so that the client code doesn't have to know that it's currently implemented using a regex. The wrapper unit would then have detailed tests, which -- again -- wouldn't need to know the current implementation. These tests, of course, are de facto testing the regex, but in an implementation-agnostic way. – ruakh 11 hours ago
1  
A reg ex is a program, though in a specialized and very terse language. As such, testing is appropriate for nontrivial expressions ... And certainly the code which is invoking the expression should be tested, which may implicitly test the reserved. – keshlam 10 hours ago
1  
@ruakh Well said. The benefit to a wrapper class for a regex is that you can neatly replace it with ordinary code if that becomes necessary. Code with complex input/output should always have unit testing, because it is remarkably difficult to debug without. If you need to refer to documentation to understand the code's effects, it should have unit tests. If it's just a quick 1:1 mapping like type conversion, then there's no problem. Regexes get past that point of requiring docs very quickly. – Aaron3468 6 hours ago

Regex can be a powerful tool, but it is not a tool you can trust to just still work if you make even minor changes to complex regexes.

So create lots of tests that documents the cases that it should cover. And create lots of tests that documents cases it should fail, if it is used for validation.

Whenever you need to change your regexes you add the new cases as tests, modify your regex and hope for the best.

If I were in an organization that in general didn't use unit tests, I would still write a test program that would test any regex we'd use. I would even do it on my own time if I had to, my hair does not need to lose any more colour.

share|improve this answer

In short, you should test your application, period. Whether you test your regex with automated tests that run it in isolation, as part of a bigger black box or if you just fiddle around with it by hand is secondary to the point that you need to make sure it works.

The main advantage of unit tests is that they save time. They let you test the thing as many times as you like now or at any point in the future. If there's any reason at all to believe that your regex will at any point be refactored, tweaked, get more constraints etc, then yeah, you probably want some regression tests for it, or when you do change it, you'll have to go through an hour of thinking through all edge cases so you didn't break it. That, or you learn to live with being scared of your code and simply never change it.

share|improve this answer
1  
A rule of thumb I've come to realize; if I needed docs to write and inspect the code, then I will need a unit test. They've saved me many headaches, catching null pointers, none types, and incorrect output. They also give the end user the ability to repair your code to spec with minimal effort when it inevitably breaks. – Aaron3468 6 hours ago

On the other hand: they themselves are seldom part of the interface of some unit. It might be better to only test the interface and do that in a way that implicitly tests the regexes.

I think with this you answered it yourself. Regexes in a unit are most likely an implementation detail.

What goes for testing your SQL probably also goes for regexes. When you change a piece of SQL, you probably run it through some SQL client by hand to see if it yields what you expect. The same goes for when I change a regex I use some regex tool with some sample input to see if it does what I expect.

What I find useful is a comment near the regex with a sample of text which it should match.

share|improve this answer
    
"When you change a piece of SQL you probably run it trough some SQL client by hand to see if it yields what you expect." But this kind of answers the question in the other way... If I need or think it's useful to test the regexes by hand then I should make a unit test for that instead. Exactly this is what makes it a tricky thing to decide! – Lii 21 hours ago
    
It really depends. What you want your unit tests for is the ability to make changes. How often do you change a specific regex? If the answer is often then by all means create a test for it. – Christiaan 21 hours ago
    
If the regex is part of a bigger whole and it is difficult to test you can always extract the regex into its own module/function and write tests for that module/function/unit. – Christiaan 21 hours ago
5  
All other things being equal, it's better to have an automated test than a "test by hand." – Robert Harvey 19 hours ago
    
Why would you not test a regex using automation? – Tony Ennis 12 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.