I only heard about Robert Martin today, and it seems like he's a notable figure in the software world, so I don't mean for my title to appear as if it's a click bait or me putting words in his mouth, but this is simply how I interpreted what I heard from him with my limited experience and understanding.

I was watching a video today (on software architecture), on a talk by Robert C. Martin, and in the latter half of the video, the topic of databases was the main focus.

From my understanding of what he said, it seemed like he was saying that SSDs will reduce the usefulness of databases (considerably).

To explain how I came to this interpretation:

He discussed how with HDDs/spinning disks, retrieving data is slow. However, these days we use SSDs, he noted. He starts off with "RAM is coming" and then continues by mentioning RAM disks, but then says he can't call it RAM disk, so resorts to just saying RAM. So with RAM, we don't need the indexes, because every byte takes the same time to get. (this paragraph is paraphrased by me)

So, him suggesting RAM (as in computer memory) as a replacement for DBs (as that's what I interpreted his statement as) doesn't make sense because that's like saying all the records are in-memory processed in the lifetime of an application (unless you pull from a disk file on demand)

So, I resorted to thinking by RAM, he means SSD. So, in that case, he's saying SSDs reduce the usefulness of databases. He even says "If I was Oracle, I'd be scared. The very foundation of why I exist is evaporating."

From my little understanding of SSDs, unlike HDDs, which are O(n) seek time (I'd think), SSDs are near O(1), or almost random. So, his suggestion was interesting to me, because I've never thought about it like that. The first time I was introduced to databases a few years ago, when a professor was describing the benefits over regular filesystem, I concluded the primary role of a database is essentially being a very indexed filesystem (as well as optimizations, caching, concurrent access, etc), thus, if indexes aren't needed in SSD, this kind of does make databases less useful.

Regardless of that though, prefacing that I'm a newb, I find it hard to believe that they become less useful, as everyone still uses DBs as the primary point of their application, instead of pure filesystem, and felt as if he was oversimplifying the role of databases.

Note: I did watch till the end to make sure he didn't say something different.

For reference: 42:22 is when the whole database topic comes up, 43:52 is when he starts off with "Why do we even have databases"

This answer does say SSDs speed DBs up considerably. This question asks about how optimization is changed.

To TL;DR my question, does the advent of widespread SSD use in the server market (whether it's upcoming or has happened already) reduce the usefulness of databases?

share|improve this question
    
Reduce the usefulness of databases compared to what? – Forrest 14 hours ago
1  
I would say just the opposite. Since read/write speeds are so fast, now you can get a GPU accelerated database to crunch numbers even faster. Now you can have even more complex queries run faster. Now queries which people wouldn't even consider running can be run at a reasonable speed. The more complex, and the more data the better off you are. – cybernard 12 hours ago
    
This is a very odd claim. It's like saying you don't need to have both parties sign a contract if you write it with a pen instead of a quill. – Aaroninus 11 hours ago
2  
While Bob Martin has been around for a long time and his opinions are generally worth listening to (if not agreeing with :-), in this case I think he's diving into the "The Death Of Relational Databases Is Upon Us" crowd (of which I'm an associate member :-). For some things under limited circumstances a somewhat convincing argument can be made that non-relational database technologies can provide an edge. That having been said, however, IMO the relational model, flawed in various and sundry ways as it may be, still provides the best general purpose database model available today. YMMV. – Bob Jarvis 10 hours ago
3  
The primary reason that we use databases isn't because disks are slow (indeed, originally, that was cited as a reason not to use databases), but rather because data is complicated. The primary purpose of a database is to enable multiple apps/users to be able to find the correct data and even to be able to simultaneously alter it in a controlled manner. Doing that quickly is only a secondary goal of databases. – RBarryYoung 9 hours ago
up vote 16 down vote accepted

There are some things in a database that should be tweaked when you use SSDs. For instance, speaking for PostgreSQL you can adjust effective_io_concurrency, and random_page_cost. However, faster reads and faster random access isn't what a database does. It ensures

He's just wrong about indexes. If the whole table can be read into ram, an index is still useful. Don't believe me? Let's do a thought experiment,

  • Imagine you have a table with one indexed column.

    CREATE TABLE foobar ( id text PRIMARY KEY );
    
  • Imagine that there are 500 million rows in that table.

  • Imagine all 500 million rows are concatenated together into a file.

What's faster,

  1. grep 'keyword' file
  2. SELECT * FROM foobar WHERE id = 'keyword'

It's not just about where data is at, it's about how you order it and what operations you can do it. PostgreSQL supports B-tree, Hash, GiST, SP-GiST, GIN and BRIN indexes (and Bloom through an extension). You'd be foolish to think that all of that math and functionality goes away because you have faster random access.

share|improve this answer
6  
Just an addendum - OP should be careful not to conflate "random access" with "content-addressable access". As OP noted, "random access" means that getting to each byte of memory is O(1). However, FINDING data in that "random-access memory" still requires sequentially searching through it; that is, you can't ask the memory "find me the data that looks like this" and have it magically handed to you. – Bob Jarvis 11 hours ago
    
@BobJarvis You're correct. Your comment helps clear up even more @EvanCarroll's "What's faster" example on why indexing and even subindexing matter, and just grabbing in O(1) isn't sufficient for the use cases a DB provides – Abdul 11 hours ago

Based on your post, it appears the clear message is that RDBMS lookup time optimizations are being replaced with hardware which makes IO time negligible.

This is absolutely true. SSD on database servers combined with high (actual) RAM makes IO waiting significantly shorter. However, RDBMS indexing and caching is still of value because even systems with this huge IO boon can and will have IO bottlenecks from poorly performing queries caused by bad indexing. This is typically only found under high workload applications or poorly written applications.

The key value to RDBMS systems in general is data consistency, data availability, and data aggregation. Utilizing an excel spreadsheet, csv file, or other method of keeping a "data base" yields no guarantees.

SSD doesn't protect you from your primary server become unavailable for any reason (network, OS corruption, power loss). SSD doesn't protect you from a bad data modification. SSD doesn't make it faster to run analytics compared to "just having" them.

share|improve this answer
    
Although I've gained better insight, I was asking in the context of raw SSD data storage vs data storage on a DB w/ HDD, and your answer is in the context of DB on SSD (due to poor question phrasing from me) – Abdul 11 hours ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.