Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I have recently encountered a class which provides pretty much every single-character as a constant, everything from COMMA to BRACKET_OPEN. Wondering whether this was necessary, I read an "article" which suggests that it may be helpful to pull single-character literals into constants. So I'm skeptical...

The main appeal of using constants is they minimize maintenance when a change is needed. But when are we going to start using a different symbol than ',' to represent a comma?

The only reason I see for using constants instead of literals is to make the code more readable. But is city + CharacterClass.COMMA + state (for example) really more readable than city + ',' + state?

For me the cons outweigh the pros, mainly that you introduce another class and another import. And I believe in less code where possible. So I'm wondering what the general consensus is here.

share|improve this question
8  
Very related: programmers.stackexchange.com/questions/221034/… – Blrfl Jul 6 at 17:48
21  
Hmm... it might be useful for different locales, maybe? For example, some languages use guillements (angle quotes, « and ») as quotation marks instead of English's standard " (or nicer-looking and ). Apart from that, it just sounds like a set of magic characters. Assuming two instances of CharacterClass called englishChars and frenchChars, it's possible that englishChars.LEFT_QUOTE might be , while frenchChars.LEFT_QUOTE might be «. – Justin Time Jul 6 at 19:28
2  
There are lots of different variants on commas: en.wikipedia.org/wiki/Comma#Comma_variants - perhaps this is not such a dumb idea, especially if your source code can be encoded as utf-8. – Aaron Hall Jul 6 at 19:57
11  
In your case, it's like calling a variable "number". Your constant should've been called DELIMITER. Or it should be CITY_STATE = "{0}, {1}" – the_lotus Jul 7 at 11:23
7  
That article you linked is very terrible. Constants should never be thrown into a bucket like that. Put them on the classes where they have context: in essence, the class with the constant provides the context in which the constant is used. For example, Java's File.separator. The class tells you the type of separator. Having a class named Consts or Constants provides no context and makes constants harder to use correctly. – Snowman 2 days ago

11 Answers 11

up vote 115 down vote accepted

Tautology:

It is very clear if you read the very first sentence of the question that this question is not about appropriate uses like eliminating magic numbers, it is about terrible mindless foolish consistency at best. Which is what this answer addresses

Common sense tells you that const char UPPER_CASE_A = 'A'; or const char A = 'A' does not add anything but maintenance and complexity to your system. const char STATUS_CODE.ARRIVED = 'A' is a different case.

Constants are supposed to represent things that are immutable at runtime, but may need to be modified in the future at compile time. When would const char A = correctly equal anything other than A?

If you see public static final char COLON = ':' in Java code, find whomever wrote that and break their keyboards. If the representation for COLON ever changes from : you will have a maintenance nightmare.

Obfuscation:

What happens when someone changes it to COLON = '-' because where they are using it needs a - instead everywhere? Are you going to write unit tests that basically say assertThat(':' == COLON) for every single const reference to make sure they do not get changed? Only to have someone fix the test when they change them?

If someone actually argues that public static final String EMPTY_STRING = ""; is useful and beneficial, you just qualified their knowledge and safely ignore them on everything else.

Having every printable character available with a named version just demonstrates that whomever did it, is not qualified to be writing code unsupervised.

Cohesion:

It also artificially lowers cohesion, because it moves things away from the things that use them and are related to them.

In computer programming, cohesion refers to the degree to which the elements of a module belong together. Thus, cohesion measures the strength of relationship between pieces of functionality within a given module. For example, in highly cohesive systems functionality is strongly related.

Coupling:

It also couples lots of unrelated classes together because they all end up referencing files that are not really related to what they do.

Tight coupling is when a group of classes are highly dependent on one another. This scenario arises when a class assumes too many responsibilities, or when one concern is spread over many classes rather than having its own class.

If you used a better name like DELIMITER = ',' you would still have the same problem, because the name is generic and carries no semantic. Reassigning the value does no more to help do an impact analysis than searching and replacing for the literal ','. Because what is some code uses it and needs the , and some other code uses but needs ; now? Still have to look at every use manually and change them.

In the Wild:

I recently refactored a 1,000,000+ LOC application that was 18 years old. It had things like public static final COMMA = SPACE + "," + SPACE;. That is in no way better than just inlining " , " where it is needed.

If you want to argue readability you need to learn you to configure your IDE to display whitespace characters where you can see them or whatever, that is just an extremely lazy reason to introduce entropy into a system.

It also had , defined multiple times with multiple misspellings of the word COMMA in multiple packages and classes. With references to all the variations intermixed together in code. It was nothing short of a nightmare to try and fix something without breaking something completely unrelated.

Same with the alphabet, there were multiple UPPER_CASE_A, A, UPPER_A, A_UPPER that most of the time were equal to A but in some cases were not. For almost every character, but not all characters.

In no sane reality can you argue that this practice is not doing anything but starting out at maximum entropy.

I refactored all this mess out and inlined all the tautologies and the new college hires were much more productive because they did not have to hunt down through multiple levels of indirection what these const references actually pointed to, because they were not reliable in what they were named vs what they contained.

share|improve this answer
89  
Maybe you should add a counterexample: const char DELIMITER = ':' would be actually useful. – Bergi Jul 6 at 19:03
80  
I would make several arguments that EMPTY_STRING is beneficial. (1) I can much more easily find all uses of EMPTY_STRING in a file than I can find all uses of "". (2) when I see EMPTY_STRING I know for darn sure that the developer intended that string to be empty, and that it is not a mis-edit or a placeholder for a string to be supplied later. Now, you claim that by me making this argument that you may qualify my knowledge, and safely ignore me forever. So, how do you qualify my knowledge? And are you planning on ignoring my advice forever? I have no issue either way. – Eric Lippert Jul 6 at 22:32
25  
@immibis: We can stop thinking about these things as useful in the context of managing change. They're constants. They don't change. Think of them as useful in the context of humans searching and comprehending the semantics of code. Knowing that something is a key-value-pair-delimiter is much more useful than knowing it is a colon; that is a fact about the semantic domain of the program's concern, not its syntax. – Eric Lippert Jul 6 at 23:14
11  
@EricLippert: I'm kinda seeing the point of others here who point out that the only guarantee that a const provides is that it won't change at runtime (after compilation), though I do agree with you that the semantic meaning of the constis far more important than its use as a change management tool. That said, I can certainly imagine a const EARLIEST_OS_SUPPORTED which is not only semantically consistent, but will also change over time as the program evolves and old cruft is removed. – Robert Harvey Jul 6 at 23:23
10  
@DanielJour: So this then is a third argument for EMPTY_STRING; that a well-designed IDE will surface tools that allow me to treat this entity symbolically, rather than syntactically. Generalize this to a fourth argument: that the library of code analysis tools that sits below the IDE may allow for advanced programmatic analysis of code correctness at the symbolic level. A developer who wishes to take advantage of tools more advanced than those written literally 40 years ago need only make small changes to their habits in order to reap the rewards of advanced tooling. – Eric Lippert 2 days ago

The main appeal of using constants is they minimize maintenance when a change is needed.

ABSOLUTELY NOT. This is not at all the reason to use constants because constants do not change by definition. If a constant ever changes then it was not a constant, was it?

The appeal of using constants has nothing whatsoever to do with change management and everything to do with making programs amenable to being written, understood and maintained by people. If I want to know everywhere in my program where a colon is used as a URL separator, then I can know that very easily if I have the discipline to define a constant URLSeparator, and cannot know that easily at all if I have to grep for : and get every single place in the code where : is used to indicate a base class, or a ?: operator, or whatever.

I thoroughly disagree with the other answers which state that this is a pointless waste of time. Named constants add meaning to a program, and those semantics can be used by both humans and machines to understand a program more deeply and maintain it more effectively.

The trick here is not to eschew constants, but rather to name them with their semantic properties rather than their syntactical properties. What is the constant being used for? Don't call it Comma unless the business domain of your program is typography, English language parsing, or the like. Call it ListSeparator or some such thing, to make the semantics of the thing clear.

share|improve this answer
30  
While I agree with the spirit of what you're saying here, your second/third sentences aren't really correct. A constant can change between versions of a file. In fact, most programs I write have a constant named something like MY_VER, which contains the current version number of the program, which can then be used throughout the remainder of the program rather than a magic string like "5.03.427.0038". The added benefit is as you say that it's provided semantic information. – Monty Harder Jul 6 at 19:50
34  
To be fair, the point of a constant is that it doesn't change during runtime after being initialised, not that it doesn't change between compilations. From a compiler's perspective, the point is that the compiler can make assumptions that the program is unable to modify it; whether the programmer is allowed to modify it when they recompile doesn't change its constant-ness. There can also be cases where software takes a read-only value from hardware, maybe by dereferencing a const volatile T* pointer to a predetermined address; while the program can't change it, the hardware can. – Justin Time Jul 6 at 20:01
4  
@MontyHarder: Good point. My opinion is informed by the fact that I typically use languages that distinguish between constants -- which must be forever unchanging -- and variables which may be assigned once -- which can change from version to version, run to run, or whatever. A constant and a variable are different things; one stays the same and one varies over time. – Eric Lippert Jul 6 at 20:04
6  
@SteveCox: I agree; the way C/C++ characterize "const" is weird and of limited use. The property I want of constants is that their values do not change, not that I am restricted from changing them in some functions but not in others. – Eric Lippert Jul 6 at 22:27
7  
"This is not at all the reason to use constants because constants do not change by definition. If a constant ever changes then it was not a constant, was it?" Changing constants at compile time (not runtime obviously) is perfectly normal. That's why you made them a clearly labeled "thing" in the first place. Of course, the constants of the OP are junk, but think of something like const VERSION='3.1.2' or const KEYSIZE=1024 or whatever. – AnoE Jul 7 at 9:14

No, that is dumb.

What is not necessarily dumb is pulling things like that into named labels for localization reasons. For example, the thousands delimiter is a comma in America (1,000,000), but not a comma in other locales. Pulling that into a named label (with an appropriate, non-comma name) allows the programmer to ignore/abstract those details.

But making a constant because "magic strings are bad" is just cargo culting.

share|improve this answer
4  
Localization is usually more complicated than just string constants. For example, some languages want list delimiter between all list items, while others exclude the delimiter before the last item. So, usually one needs not localized constants, but localized rules. – Vlad Jul 7 at 8:51
13  
Actually the thousands delimiter is not necessarily a thousands delimiter in other locales (China/Japan). It's not even set after a constant number of digits (India). Oh, and there may be different delimiters depending on if it's a 1000 delimiter or the 1000000 delimiter (Mexico). But that's less of a problem than not using ASCII digits 0-9 in some locales (Farsi). ux.stackexchange.com/questions/23667/… – Peter Jul 7 at 9:43
1  
@Vlad Localization is much more complex than that, however, the thousands separator is a well-known example that people recognize. – Snowman 2 days ago
    
It depends on the localization strategy ... do you change all the constants in your program to translate it? Or should you rather read the values from a file (or other data store), making them effectively runtime variables? – Paŭlo Ebermann 18 hours ago

There are a few characters that are can be ambiguous or are used for several different purposes. For example, we use '-' as a hyphen, a minus sign, or even a dash. You could make separate names as:

static const wchar_t HYPHEN = '-';
static const wchar_t MINUS = '-';
static const wchar_t EM_DASH = '-';

Later, you could choose to modify your code to disambiguate by redefining them as:

static const wchar_t HYPHEN = '-';
static const wchar_t MINUS = '\u2122';
static const wchar_t EM_DASH = '\u2014';

That might be a reason why you'd consider defining constants for certain single characters. However, the number of characters that are ambiguous in this manner is small. At most, it seems you'd do it only for those. I'd also argue that you could wait until you actually have a need to distinguish the ambiguous characters before you factor the code in this manner.

As typographical conventions can vary by language and region, you're probably better off loading such ambiguous punctuation from a translation table.

share|improve this answer
    
For me this is the only valid reason one might create character constants – Florian Peschka Jul 7 at 8:03
    
Using - as an em dash is quite misleading ... it is much to short for that in most fonts. (It is even shorter than an en dash.) – Paŭlo Ebermann 18 hours ago

The idea that a constant COMMA is better than ',' or "," is rather easy to debunk. Sure there are cases where it makes sense, for example making final String QUOTE = "\""; saves heavily on the readibility without all the slashes, but barring language control characters like \ ' and " I haven't found them to be very useful.

Using final String COMMA = "," is not only bad form, it's dangerous! When someone wants to change the separator from "," to ";" they might go change the constants file to COMMA = ";" because it's faster for them to do so and it just works. Except, you know, all the other things that used COMMA now also are semicolons, including things sent to external consumers. So it passes all your tests (because all the marshalling and unmarshalling code was also using COMMA) but external tests will fail.

What is useful is to give them useful names. And yes, sometimes multiple constants will have the same contents but different names. For example final String LIST_SEPARATOR = ",".

So your question is "are single char constants better than literals" and the answer is unequivically no, they aren't. But even better than both of those is a narrowly scoped variable name that explicitly says what its purpose is. Sure, you'll spend a few extra bytes on those extra references (assuming they don't get compiled out on you, which they probably will) but in long term maintenance, which is where most of the cost of an application is, they are worth the time to make.

share|improve this answer
    
How about conditionally defining DISP_APOSTROPHE as either an ASCII 0x27 or a Unicode single right quote character (which is a more typographically-appropriate rendition of an apostrophe), depending upon the target platform? – supercat Jul 6 at 21:21
2  
actually QUOTE example proves it is a bad idea as well since you are assigning it to what is generally/popularly known as the DOUBLE QUOTE and QUOTE implies SINGLE_QUOTE which is more correctly referred to as APOSTROPHE. – Jarrod Roberson Jul 6 at 21:38
3  
@JarrodRoberson I don't feel quote implies single quote, personally - but that's another good reason to remove ambiguity where you can! – corsiKa Jul 6 at 21:59
1  
I don't like the QUOTE example for an additional reason - it makes reading strings constructed with it even harder "Hello, my name is " + QUOTE + "My Name" + QUOTE this is a trivial example and yet it still looks bad. Oh, sure, instead of concatenation you can use replace tokens, too "Hello, my name is %sMy Name%s".format(QUOTE, QUOTE) may just be worse. But, hey, let's try indexed tokens "Hello, my name is {0}My Name{0}".format(QUOTE) ugh, not that much better. Any non-trivial string generated with quotes in it would be even worse. – Vld yesterday
1  
@corsiKa - I'll live with the escaped actual quotes. If I miss escaping one, the IDE I use would immediately complain. Code most likely won't compile, either. It's fairly easy to spot. How easy it is to make a mistake when doing "My name is" + QUOTE + "My Name" + QUOTE I actually made that same mistake three times writing the above comment. Can you spot it? If it takes you a bit, it's the missing space after is. Do you format the string? In which case, a string with multiple tokens to replace is going to be even worse to work out. How am I to use it so that it's more readable? – Vld yesterday

Some answers seem like they want to contradict each other, like the one by Eric, and the one by Jarrod. But they are in perfect agreement.

A constant must add meaning.

Defining COMMA to be a comma doesn't add meaning, because we know that a comma is a comma. Instead we destroy meaning, because now COMMA might actually not be a comma anymore.

If you use a comma for <purpose> you can declare <PURPOSE> to be a comma.

So while city + CharacterClass.COMMA + state is horribly wrong, city + CITY_STATE_DELIMITER + state is acceptable. How acceptable it is depends on many factors that are currently unknown.

Use functions for formatting

I'd personally prefer Format<PurposeOfFormat>(city, state) and wouldn't care about how the body of that function looks as long as it's short and passes the test cases. In other words, if you have to worry about using constants to implement formatting rules, you may consider to use functions for the formatting rules instead - that way the question about using constants inside these functions becomes so irrelevant that nobody has to care.

share|improve this answer
    
Ah, but a comma is not always the same comma. I could define COMMA = '\u0559' or '\u060C' etc. (see Unicode) or even turn it into a variable later and read it from a config file. That way, it will still have the same meaning, but just a different value. How about that. – Mr Lister 22 hours ago

In addition to all the fine answers here, I'd like to add as food for thought, that good programming is about providing appropriate abstractions that can be built upon by yourself and maybe others, without having to repeat the same code over and over.

Good abstractions make the code easy to use on the one hand, and easy to maintain on the other hand.

I totally agree the DELIMITER=':' in and of itself is a poor abstraction, and only just better than COLON=':' (since the latter is totally impoverished).

A good abstraction involving strings and separators would include a way to pack one or more individual content items into the string and to unpack them from the packed string as well, first and foremost, before telling you what the delimiter is. Such an abstraction would be bundled as a concept, in most languages as a class; for example, so that its use would be practically self documenting, in that you can search for all places where this class is used and be confident of what the programmer's intention regarding the format of the packed strings in each case where some abstraction is used.

Once such an abstraction is provided, it would be easy to use without ever having to consult what the value of the DELIMITER or COLON is, and, changing the implementation details would generally be limited to the implementation. So, in short, these constants should really be implementation details hidden within an appropriate abstraction.

The main appeal of using constants is they minimize maintenance when a change is needed.

Good abstractions, which are typically compositions of several related capabilities, are better at minimizing maintenance. First, they clearly separate the provider from the consumers. Second, they hide the implementation details and instead provide directly useful functionality. Third, they document at a high level when and where they are being used.

share|improve this answer

I've done some work writing lexers and parsers and used integer constants to represent terminals. Single-character terminals happened to have the ASCII code as their numeric value for simplicity's sake, but the code could have been something else entirely. So, I'd have a T_COMMA that was assigned the ASCII-code for ',' as its constant value. However, there were also constants for nonterminals which were assigned integers above the ASCII set. From looking at parser generators such as yacc or bison, or parsers written using these tools, I got the impression that's basically how everybody did it.

So, while, like everybody else, I think it's pointless to define constants for the express purpose of using the constants instead of the literals throughout your code, I do think there are edge cases (parsers, say) where you might encounter code riddled with constants such as you describe. Note that in the parser case, the constants aren't just there to represent character literals; they represent entities that might just happen to be character literals.

I can think of a few more isolated cases where it might make sense to use constants instead of the corresponding literals. For example, you might define NEWLINE to be the literal '\n' on a unix box, but '\r\n' or '\n\r' if you're on windows or mac box. The same goes for parsing files which represent tabular data; you might define FIELDSEPARATOR and RECORDSEPARATOR constants. In these cases, you're actually defining a constant to represent a character that serves a certain function. Still, if you were a novice programmer, maybe you'd name your field separator constant COMMA, not realizing you should have called it FIELDSEPARATOR, and by the time you realized, the code would be in production and you'd be on the next project, so the wrongly named constant would stay in the code for someone to later find and shake his head at.

Finally, the practice you describe might make sense in a few cases where you write code to handle data encoded in a specific character encoding, say iso-8859-1, but expect the encoding to change later on. Of course in such a case it would make much more sense to use localization or encoding and decoding libraries to handle it, but if for some reason you couldn't use such a library to handle encoding issues for you, using constants you'd only have to redefine in a single file instead of hard-coded literals littered all over your source-code might be a way to go.

As to the article you linked to: I don't think it tries to make a case for replacing character literals with constants. I think it's trying to illustrate a method to use interfaces to pull constants into other parts of your code base. The example constants used to illustrate this are chosen very badly, but I don't think they matter in any way.

share|improve this answer
2  
I think it's trying to illustrate a method to use interfaces to pull constants into other parts of your code base. which is an even worse anti-pattern and is tightly coupling and low cohesion as well, there is no valid reason to do that either. – Jarrod Roberson Jul 6 at 21:26

The one time I have seen such constants used effectively is to match an existing API or document. I've seen symbols such as COMMA used because a particular piece of software was directly connected to a parser which used COMMA as a tag in an abstract syntax tree. I've also seen it used to match a formal specification. in formal specifications, you'll sometimes see symbols like COMMA rather than ',' because they want to be as utterly clear as possible.

In both cases, the use of a named symbol like COMMA helps provide cohesiveness to an otherwise disjoint product. That value can often outweigh the cost of overly verbose notations.

share|improve this answer

Observe that you are trying to make a list.

So, refactor it as: String makeList(String[] items)

In other words, factor out the logic instead of the data.
Languages might be different in how they represent lists, but commas are always commas (that's a tautology). So if the language changes, changing the comma character won't help you -- but this will.

share|improve this answer

Maybe.

Single character constants are relatively hard to distinguish. So it can be rather easy to miss the fact that you're adding a period rather than a comma

city + '.' + state

whereas that's a relatively hard mistake to make with

city + Const.PERIOD + state

Depending on your internationalization and globalization environment, the difference between an ASCII apostrophe and the Windows-1252 open and close apostrophe (or the ASCII double quote and the Windows-1252 open and close double quote) may be significant and is notoriously difficult to visualize looking at code.

Now, presumably, if mistakenly putting a period rather than a comma was a significant functional issue, you would have an automated test that would find the typo. If your software is generating CSV files, I would expect that your test suite would discover pretty quickly that you had a period between the city and the state. If your software is supposed to run for clients with a variety of internationalization configurations, presumably your test suite will run in each environment and will pick up if you have a Microsoft open quote if you meant to have an apostrophe.

I could imagine a project where it made more sense to opt for more verbose code that could head off these issues particularly when you've got older code that doesn't have a comprehensive test suite even though I probably wouldn't code this way in a green field development project. And adding a constant for every punctuation character rather than just those that are potentially problematic in your particular application is probably gross overkill.

share|improve this answer
1  
what happens when some moron changes Const.PERIOD to be equal to ~? There is no justification for a tautology of named characters, it just adds maintenance and complexity that is uneeded in modern day programming environments. Are you going to write a suite of unit tests that basically say assert(Const.PERIOD == '.')? – Jarrod Roberson Jul 6 at 17:54
3  
@JarrodRoberson - That would suck, sure. But you'd be in just as much trouble if someone added a Unicode constant that looks almost exactly like a comma rather than an actual comma. Like I said, this isn't the sort of thing that I'd do in a greenfield development project. But if you have a legacy code base with a spotty test suite where you've tripped over the comma/ period or apostrophe/ Microsoft abomination apostrophes issues a couple times, creating some constants and telling people to use them may be a reasonable way to make the code better without spending a year writing tests. – Justin Cave Jul 6 at 17:58
2  
your legacy example is a poor one, I just finished refactoring a 1,000,000+ LOC code base that is 18 years old. It had every printable character defined like this multiple times with different conflicting names even. And many times things named COMMA were actually set = SPACE + "," + SPACE. Yes some idiot had a SPACE constant. I refactored them ALL out and the code was orders of magitude more readable and college hires were much more able to track things down and fix them without having 6 levels of indirection to find out what something was actually set to. – Jarrod Roberson Jul 6 at 18:02

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.