How are globals any different from a database?

Question

I just ran across this old question asking what's so evil about global state, and the top-voted, accepted answer asserts that you can't trust any code that works with global variables, because some other code somewhere else might come along and modify its value and then you don't know what the behavior of your code will be because the data is different! But when I look at that, I can't help but think that that's a really weak explanation, because how is that any different from working with data stored in a database?

When your program is working with data from a database, you don't care if other code in your system is changing it, or even if an entirely different program is changing it, for that matter. You don't care what the data is; that's the entire point. All that matters is that your code deals correctly with the data that it encounters. (Obviously I'm glossing over the often-thorny issue of caching here, but let's ignore that for the moment.)

But if the data you're working with is coming from an external source that your code has no control over, such as a database (or user input, or a network socket, or a file, etc...) and there's nothing wrong with that, then how is global data within the code itself--which your program has a much greater degree of control over--somehow a bad thing when it's obviously far less bad than perfectly normal stuff that no one sees as a problem?

It's nice to see veteran members challenge the dogmas a little ... — svidgen, yesterday
In an application, you usually provide a mean to access the database, this mean is passed to functions which want to access the database. You don't do that with global variables, you simply know they're at hand. That's a key difference right there. — David Packer, yesterday
Global state is like having a single database with a single table with a single row with infinitely many columns accessed concurrently by an arbitrary number of applications. — BevynQ, yesterday
@BevynQ that makes no sense at all to me, could you elaborate? — kai, 17 hours ago

Jules · Answer 1 · 2016-05-24 20:05:18Z

up vote 57 down vote

First, I'd say that the answer you link to overstates that particular issue and that the primary evil of global state is that it introduces coupling in unpredictable ways that can make it difficult to change the behaviour of your system in future.

But delving into this issue further, there are differences between global state in a typical object-oriented application and the state that is held in a database. Briefly, the most important of these are:

Object-oriented systems allow replacing an object with a different class of object, as long as it is a subtype of the original type. This allows behaviour to be changed, not just data.
Global state in an application does not typically provide the strong consistency guarantees that a database does -- there are no transactions during which you see a consistent state for it, no atomic updates, etc.

Additionally, we can see database state as a necessary evil; it is impossible to eliminate it from our systems. Global state, however, is unnecessary. We can entirely eliminate it. So even were the issues with a database just as bad, we can still eliminate some of the potential problems and a partial solution is better than no solution.

answered yesterday

Jules

8,7031035

18

I think the point of the consistency is actually the main reason: When global variables are used in code, there is usually no telling when they are actually initialized. The dependencies between the modules are deeply hidden inside the sequence of calls, and simple stuff like swapping two calls can produce really nasty bugs because suddenly some global variable is not correctly initialized anymore when it's first used. At least that is the problem I have with the legacy code that I need to work with, and which makes refactoring a nightmare. – cmaster yesterday

3

@DavidHammen I've actually worked on world-state simulation for an online game, which is clearly in the category of application you're talking about, and even there I would not (and did not) use global state for it. Even if some efficiency gains can be made by using global state, the issue is that global state is not scalable. It becomes difficult to use once you move from a single-threaded to multi-threaded architecture. It becomes inefficient when you move to a NUMA architecture. It becomes impossible when you move to a distributed architecture. The paper you cite dates from... – Jules 13 hours ago

3

1993. These problems were less of an issue then. The authors were working on a single processor system, simulating interactions of 1,000 objects. In a modern system you'd likely run a simulation of that kind on at the very least a dual-core system, but quite likely it could be at least 6 cores in a single system. For larger problems still, you'd run it on a cluster. For this kind of change, you must avoid global state because global state cannot be effectively shared. – Jules 13 hours ago

4

I think calling database state a "necessary evil" is a bit of a stretch. I mean, since when did state become evil? State is the entire purpose of a database. State is information. Without state, all you have are operators. What good are operators without something to operate on? That state has to go somewhere. At the end of the day, functional programming is just a means to an end and without state to mutate there would be no point in doing anything at all. It's a bit like a baker calling the cake a necessary evil - it's not evil. It's the entire point of the thing. – J... 11 hours ago

| show 6 more comments

Snowman · Answer 2 · 2016-05-24 20:11:18Z

First, what are the problems with global variables, based on the accepted answer to the question you linked?

Very briefly, it makes program state unpredictable.

Databases are, the vast majority of the time, ACID compliant. ACID specifically addresses the underlying issues that would make a data store unpredictable or unreliable.

Further, global state hurts the readability of your code.

This is because global variables exist in a scope far away from their usage, maybe even in a different file. When using a database, you are using a record set or ORM object that is local to the code you are reading (or should be).

Database drivers typically provide a consistent, understandable interface to access data that is the same regardless of problem domain. When you get data from a database, your program has a copy of the data. Updates are atomic. Contrast to global variables, where multiple threads or methods may be operating on the same piece of data with no atomicity unless you add synchronization yourself. Updates to the data are unpredictable and difficult to track down. Updates may be interleaved, causing bog-standard textbook examples of multithreaded data corruption (e.g. interleaved increments).

Databases typically model different data than global variables to begin with, but leaving that aside for a moment, databases are designed from the ground-up to be an ACID-compliant data store that mitigates many of the concerns with global variables.

+1 What you're saying is that databases have transactions, making it possible to read and write multiple pieces of global state atomically. Good point, which can only be circumvented by using global variables for each completely independent piece of information. — l0b0, 16 hours ago

svidgen · Answer 3 · 2016-05-24 22:57:48Z

I'd offer a few observations:

Yes, a database is global state.

In fact, it's a super-global state, as you pointed out. It's universal! Its scope entails anything or anyone that connects to the database. And, I suspect lots of folks with years of experience can tell you horror stories about how "strange things" in the data led to "unexpected behavior" in one or more of the relevant applications...

One of the potential consequences of using a global variable is that two distinct "modules" will use that variable for their own distinct purposes. And to that extent, a database table is no different. It can fall victim to the same problem.

Hmm ... Here's the thing:

If a module doesn't operate extrinsically in some way, it does nothing.

A useful module can be given data or it can find it. And, it can return data or it can modify state. But, if it doesn't interact with the external world in some way, it may as well do nothing.

Now, our preference is to receive data and return data. Most modules are simply easier to write if they can be written with utter disregard for what the outside world is doing. But ultimately, something needs to find the data and modify that external, global state.

Furthermore, in real-world applications, the data exists so that it can be read and updated by various operations. Some issues are prevented by locks and transactions. But, preventing these operations from conflicting with each other in principle, at the end of the day, simply involves careful thinking. (And making mistakes...)

But also, we're generally not working directly with the global state.

Unless the application lives in the data layer (in SQL or whatever), the objects our modules work with are actually a copies of the shared global state. We can do whatever we want those without any impact to the actual, shared state.

And, in cases where we need to mutate that global state, under the assumption that the data we were given hasn't changed, we can generally perform the same-ish sort of locking that we would on our local globals.

And finally, we usually do different things with databases than we might with naughty globals.

A naughty, broken global looks like this:

Int32 counter = 0;

public someMethod() {
  for (counter = 0; counter < whatever; counter++) {
    // do other stuff.
  }
}

public otherMethod() {
  for (counter = 100; counter < whatever; counter--) {
    // do other stuff.
  }
}

We simply don't use databases for in-process/operational stuff like that. And it might be the slow nature of the database and the relative convenience of a simple variable that deters us: Our sluggish, awkward interaction with databases simply make them bad candidates for many of the mistakes we've historically made with variables.

The way to guarantee (since we can't assume) "that the data we were given hasn't changed" in a database would be a transaction. — l0b0, 16 hours ago

Jeffrey Sweeney · Answer 4 · 2016-05-24 20:03:05Z

The point that the sole reason global variables can't be trusted since the state can be changed somewhere else is, in itself, not reason enough to not use them, agreed (it's a pretty good reason though!). It's likely the answer was mainly describing usage where restricting a variable's access to only areas of code that its concerned with would make more sense.

Databases are a different matter, however, because they're designed for the purpose of being accessed "globally" so to speak.

For example:

Databases typically have built in type and structure validation that goes further than the language accessing them
Databases almost unanimously update based off transactions, which prevents inconsistent states, where there's no guarantees what the end state will look like in a global object (unless it's hidden behind a singleton)
Database structure is at least implicitly documented based off table or object structure, more-so than the application utilizing it

Most importantly though, databases serve a different purpose than a global variable. Databases are for storing and searching large quantities of organized data, where global variables serve specific niches (when justifiable).

Huh. You beat me to it while I was half-way through writing an almost identical answer. :) — Jules, yesterday

Michael Anderson · Answer 5 · 2016-05-25 01:10:45Z

I disagree with the fundamental claim that:

When your program is working with data from a database, you don't care if other code in your system is changing it, or even if an entirely different program is changing it, for that matter.

My initial thought was "Wow. Just Wow". So much time and effort is spent trying to avoid exactly this - and working out what trade-offs and compromises work for each application. To just ignore it is a recipe for disaster.

But I also diasgree on an architectural level. A global variable is not just global state. It's global state that is accessible from anywhere transparently. In contrast to use a database you need to have a handle to it - (unless you store than handle in a global variable....)

For example using a global variable might look like this

int looks_ok_but_isnt() {
  return global_int++;
}

int somewhere_else() {
  ...
  int v = looks_ok_but_isnt();
  ...
}

But doing the same thing with a database would have to be more explicit about what its doing

int looks_like_its_using_a_database( MyDB * db ) {
   return db->get_and_increment("v");
}

int somewhere_else( MyBD * db ) { 
   ...
   v = looks_like_its_using_a_database(db);
   ...
}

The database one is obviously mucking with a database. If you wanted to not use a database you can use explicit state and it looks almost the same as the database case.

int looks_like_it_uses_explicit_state( MyState * state ) {
   return state->v++;
}


int somewhere_else( MyState * state ) { 
   ...
   v = looks_like_it_uses_explicit_state(state);
   ...
}

So I would argue using a database is much more like using explicit state, than using global variables.

David Hammen · Answer 6 · 2016-05-24 23:24:40Z

But when I look at that, I can't help but think that that's a really weak explanation, because how is that any different from working with data stored in a database?

Or any different from a working with an interactive device, with a file, with shared memory, etc. A program that does exactly the same thing every time it runs is a very boring and rather useless program. So yes, it's a weak argument.

To me, the difference that make a difference with regard to global variables is that they form hidden and unprotected lines of communication. Reading from a keyboard is very obvious and protected. I have to make a certain function call, and I cannot access the keyboard driver. The same applies to file access, shared memory, and your example, databases. It's obvious to the reader of the code that this function reads from the keyboard, that function accesses a file, some other function accesses shared memory (and there had better be protections around that), and yet some other function accesses a database.

With global variables, on the other hand, its not obvious at all. The API says to call foo(this_argument, that_argument). There's nothing in the calling sequence that says the global variable g_DangerWillRobinson should be set to some value but before calling foo (or examined after calling foo).

Google banned the use of non-const reference arguments in C++ primarily because it is not obvious to the reader of the code that foo(x) will change x because that foo takes a non-constant reference as an argument. (Compare with C#, which dictates that both the function definition and the call site must qualify a reference parameter with the ref keyword.) While I do not agree with the Google standard on this, I do understand their point.

Code is written once and modified a few times, but if it's at all good, it is read many, many times. Hidden lines of communications are very bad karma. C++'s non-const reference represent a minor hidden line of communication. A good API or a good IDE will show me that "Oh! This is call by reference." Global variables are a huge hidden line of communication.

5gon12eder · Answer 7 · 2016-05-24 20:19:56Z

I think that the quoted explanation oversimplifies the issue to the point where the reasoning becomes ridiculous. Of course, the state of an external database contributes to the global state. The important question is how your program depends on the (mutable) global state. If a library function to split strings on white-space would depend on intermediary results stored in a database, I would object to this design at least as much as I would object to a global character array used for the same purpose. On the other hand, if you decide that your application doesn't need a full-blown DBMS to store business data at this point and a global in-memory key-value structure will do, this is not necessarily a sign of poor design. What is important is that – no matter what solution you pick to store your data – this choice is isolated to a very small portion of the system so most components can be agnostic to the solution chosen for deployment and unit-tested in isolation and the deployed solution can be changed at a later time with little effort.

Luaan · Answer 8 · 2016-05-25 08:02:13Z

Okay, let's start from the historical point.

We're in an old application, written in your typical mix of assembly and C. There's no functions, just procedures. When you want to pass an argument or return value from a procedure, you use a global variable. Needless to say, this is quite hard to keep track of, and in general, every procedure can do whatever it wants with every global variable. Unsurprisingly, people turned to passing arguments and return values in a different way as soon as it was feasible (unless it was performance critical not to do so - e.g. look at the Build Engine (Duke 3D) source code). The hate of global variables was born here - you had very little idea what piece of global state each procedure would read and change, and you couldn't really nest procedure calls safely.

Does this mean that global variable hate is a thing of the past? Not quite.

First, I have to mention that I've seen the exact same approach to passing arguments in the project I'm working on right now. For passing two reference type instances in C#, in a project that's about 10 years old. There's literally no good reason to do it like this, and was most likely born out of either cargo-culting, or a complete misunderstanding of how C# works.

The bigger point is that by adding global variables, you're expanding the scope of every single piece of code that has access to that global variable. Remember all those recommendations like "keep your methods short"? If you have 600 global variables (again, real-world example :/), all your method scopes are implicitly expanded by those 600 global variables, and there's no simple way to keep track of who has access to what.

If done wrong (the usual way :)), global variables may have coupling between each other. But you have no idea how they are coupled, and there's no mechanism to ensure that the global state is always consistent. Even if you introduce critical sections to try and keep things consistent, you'll find that it compares poorly to a proper ACID database:

There's no way to rollback a partial update, unless you preserve the old values before the "transaction". Needless to say, by this point, passing a value as an argument is already a win :)
Everyone accessing the same state must adhere to the same synchronization process. But there's no way to enforce this - if you forget to setup the critical section, you're screwed.
Even if you correctly synchronize all access, there might be nested calls that access partially modified state. This means that you either deadlock (if your critical sections aren't reëntrant), or deal with inconsistent data (if they are reëntrant).

Is it possible to resolve these issues? Not really. You need encapsulation to handle this, or really strict discipline. It's hard to do things right, and that's generally not a very good recipe for success in software development :)

Smaller scope tends to make code easier to reason about. Global variables make even the simplest pieces of code include huge swathes of scope.

Of course, this doesn't mean that global scoping is evil. It just shouldn't be the first solution you go for - it's a typical example of "simple to implement, hard to maintain".

Stig Hemmer · Answer 9 · 2016-05-25 10:33:28Z

A global variable is a tool, it can be used for good and for evil.

A database is a tool, it can be used for good and for evil.

As the original poster notes, the difference isn't all that big.

Inexperienced students often think that bugs is something that happen to other people. Teachers use "Global variables are evil" as a simplified reason to penalize bad design. Students generally doesn't understand that just because their 100-line program is bug free doesn't mean that the same methods can be used for 10000-line programs.

When you work with databases, you cannot just ban global state since that's what the program is all about. Instead you get more details guidelines like ACID and Normal Forms and so on.

If people used the ACID approach to global variables, they wouldn't be so bad.

On the other hand, if you design databases badly, they can be nightmares.

Typical student claim on stackoverflow: Help me! My code is perfect, but it isn't working right! — David Hammen, 10 hours ago

G DeMasters · Answer 10 · 2016-05-25 01:28:52Z

up vote 1 down vote

To me, the primary evil is Globals have no protection against concurrency issues. You can add mechanisms to handle such issues with Globals, but you'll find that the more concurrency issues you solve, the more your Globals start to mimick a database. The secondary evil is no contract on usage.

answered 22 hours ago

G DeMasters

471

1

For example, errno in C. – David Hammen 10 hours ago

add a comment |

MichelHenrich · Answer 11 · 2016-05-25 02:20:39Z

Depending on what aspect you're judging, global variables and database access may be worlds apart, but as long as we're judging them as dependencies, they are the same.

Let's consider functional programming's definition of a pure function states that it must depends solely on the parameters it takes as inputs, producing a deterministic output. That is, given the same set of arguments twice, it must produce the same result.

When a function depends on a global variable, it can no longer be considered pure, since, for the same set or arguments, it may yield different outputs because the value of the global variable may have changed between the calls.

However, the function can still be seen as deterministic if we consider the global variable as much a part of the function's interface as its other arguments, so it isn't the problem. The problem is only that this is hidden until the moment we are surprised by unexpected behavior from seemingly obvious functions, then go read their implementations to discover the hidden dependencies.

This part, the moment where a global variable becomes a hidden dependency is what is considered evil by us programmers. It makes the code harder to reason about, hard to predict how it will behave, hard to reuse, hard to test and especially, it increases debug and fix time when a problem occurs.

The same thing happens when we hide the dependency on the database. We can have functions or objects making direct calls to database queries and commands, hiding these dependencies and causing us the exact same trouble that global variables cause; or we can make them explicit, which, as it turns out, is considered a best-practice that goes by many names, such as repository pattern, data-store, gateway, etc.

P.S.: There are other aspects which are important to this comparison, such as whether concurrency is involved, but that point is covered by other answers here.

Arnab Datta · Answer 12 · 2016-05-25 12:35:59Z

There are several differences:

A database value can be modified on the fly. The value of a global that is set in code on the other hand, cannot be changed unless you redeploy your application and modify your code. In fact, this is intentional. A database is for values that might change over time, but global variables should only be for things that will never change and when they do not contain actual data.
A database value (row,column) has a context and a relational mapping in the database. This relation can be easily extracted and analysed using tools like Jailer (for instance). A global variable on the other hand, is slightly different. You can find all the usages, but it would be impossible for you to tell me all the ways in which the variable interacts with the rest of your world.
Global variables are faster. Getting something from a database requires a database connection to be made, a select to me run and then the database connection must be closed. Any type conversions you might need come on top of that. Compare that to a global being accessed in your code.

These are the only that I can think of right now, but I'm sure there are more. Simply put, they are two different things and should be used for different objectives.

JeffO · Answer 13 · 2016-05-25 13:25:23Z

A database can be a global state, but it doesn't have to be all the time. I disagree with the assumption that you don't have control. One way to manage that is locking and security. This can be done at the record, table or entire database. Another approach is to have some sort of version field that would prevent the changing of a record if the data are stale.

Like a global variable, the value(s) in a database can be changed once they are unlock, but there are many ways to control the access (Don't give all the devs the password to the account allowed to change data.). If you have a variable that has limited access, it's not very global.

Graham · Answer 14 · 2016-05-25 14:39:47Z

As a software engineer working predominantly with embedded firmware, I'm almost always using global variables for anything going between modules. In fact, it's best practise for embedded. They are assigned statically, so there's no risk of blowing the heap/stack and there's no extra time taken for stack allocation/clean-up on function entry/exit.

The downside of this is that we do have to consider how those variables are used, and a lot of that comes down to the same kind of thought that goes into database-wrangling. Any assynchronous read/writes of variables MUST be atomic. If more than one place can write a variable, some thought must go into making sure they always write valid data, so the previous write is not arbitrarily replaced (or that arbitrary replacement is a safe thing to do). If the same variable is read more than once, some thought must go into considering what happens if the variable changes value between reads, or a copy of the variable must be taken at the start so that processing is done using a consistent value, even if that value becomes stale during processing.

(For that last one, on my very first day of a contract working on an aircraft countermeasures system, so highly safety-related, the software team were looking at a bug report they'd been trying to figure out for a week or so. I'd had just enough time to download the dev tools and a copy of the code. I asked "couldn't that variable be updated between reads and cause it?" but didn't really get an answer. Hey, what does the new guy know, after all? So whilst they were still discussing it, I added protective code to read the variable atomically, did a local build, and basically said "hey guys, try this". Way to prove I was worth my contracting rate. :)

So global variables are not an unambiguously bad thing, but they do leave you open to a wide range of issues if you don't think about them carefully.

Byron Jones · Answer 15 · 2016-05-25 18:19:29Z

Globals are not evil; they are simply a tool. MISUSE of globals is problematic, as is the misuse of any other programming feature.

My general recommendation is that globals should only be used in situations that are well understood and thought out, where other solutions are less optimal. Most importantly, you want to ensure that you have well documented where that global value might be modified, and if you are running multithreaded, that you are ensuring that global and any co-dependent globals are access in a way that is transactional.

o11c · Answer 16 · 2016-05-25 03:47:14Z

up vote -3 down vote

What makes you think databases are global state?

A single program can create multiple independent connections to an external database, each with its own transactions, etc.

answered 20 hours ago

o11c

51426

1

I assume for the same reason that a filesystem is global state -- code anywhere in the program can create a connection to your "main" database, via whatever discovery mechanisms your database provider has for specifying a database by name, URL, etc. Anything discoverable has unrestricted scope. A temporary database (for example an sqlite file that you unlink from the directory structure on creation) could be non-global state since it's accessible only from the code that creates it plus anything that code passes a reference to. – Steve Jessop 13 hours ago

add a comment |

asked	yesterday
viewed	3825 times
active	today

current community

your communities

more stack exchange communities

How are globals any different from a database?

16 Answers 16

protected by gnat 4 hours ago

Not the answer you're looking for? Browse other questions tagged state globals or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

How are globals any different from a database?

16 Answers 16

protected by gnat 4 hours ago

Not the answer you're looking for? Browse other questions tagged state globals or ask your own question.

Linked

Related

Hot Network Questions