Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

This is a simple question that I can't seem to find the answer for.

In terms of performance, If I have a WHERE clause such as a=0 and b=0 and ... z=0, Would I gain any performance if I replaced that condition with a+b+...+z=0?

In other words, is there any performance gain by replacing the following

Select * 
From MyTable 
Where A=0 and B=0 and C=0 and D=0...

With

Select * 
From MyTable 
Where A+B+C+D=0...

I know it can depend on indexes, but for this purpose, let's just say no indexes exist. Does the arithmetic operator (+) perform better than an "OR" or "AND" Logical Operator?

I'm under the impression that the addition performs better than multiple conditions with ANDs or ORs.

TEST RESULTS

On a Table of 4.2 million rows Returning rows Where A=0 B=0 and C=0 -> 351748 Rows

The addition (A+B+C=0) took 5 seconds while the Logical conditions A=0 and B=0 and C=0 took 11 seconds.

On the Other hand

Returning rows Where A<>0 B<>0 or C<>0 -> 3829750 Rows 58 seconds

Returning rows Where F65+ F67+f64<>0 -> 3829750 Rows 57 seconds

For the OR, it seems that there is no significant difference.

I agree with gbn (+1)"... If A is -1 and B is 1, A+B=0 but A=0 and B= 0 is false...." and with AMtwo (+1): "...ABS(A)+ABS(B)+ABS(C)+ABS(D)... Even if you expect only positive values, if the column accepts negative values, you should assume that you might encounter one"

The results are very impressive, as I thought, It seems that the addition is much quicker than the logical operators.

share|improve this question
13  
Performance aside, you would likely need to have your math to be ABS(A)+ABS(B)+ABS(C)+ABS(D)... Even if you expect only positive values, if the column accepts negative values, you should assume that you might encounter one. – AMtwo 2 days ago
4  
You really think a logical comparison takes longer than addition. – Paparazzi 2 days ago
5  
@JohnG: are you letting the rows be rendered in SSMS? If so, you are measuring SSMS performance.With no indexes, the query requires a table scan and will take the same time pretty much. Addition won't be quicker, it's quite simple – gbn 2 days ago
2  
Do SELECT * INTO to test. Believe what you want, but don't use addition on my servers.... – gbn 2 days ago
1  
Don't do SELECT * INTO .. as you are then adding in possible disk contention and possibly concurrency issues if a schema lock is required to create the new table, and at the very least there will be additional locks. You might even run into auto-growth latency. – srutzky 2 days ago
up vote 41 down vote accepted

In your question, you detail some tests that you've prepared where you "prove" that the addition option is quicker than comparing the discrete columns. I suspect your test methodology may be flawed in several ways, as @gbn and @srutzky have alluded to.

First, you need to ensure you're not testing SQL Server Management Studio (or whatever client you're using). For instance, if you are running a SELECT * from a table with 3 million rows, you're mostly testing SSMS's ability to pull rows from SQL Server and render them on-screen. You're far better off to use something like SELECT COUNT(1) which negates the need to pull millions of rows across the network, and render them on screen.

Second, you need to be aware of SQL Server's data cache. Typically, we test the speed of reading data from storage, and processing that data, from a cold-cache (i.e. SQL Server's buffers are empty). Occasionally, it makes sense to do all your testing with a warm-cache, but you need to approach your testing explicitly with that in mind.

For a cold-cache test, you need to run CHECKPOINT and DBCC DROPCLEANBUFFERS prior to each run of the test.

For the test you've asked about in your question, I created the following test-bed:

IF COALESCE(OBJECT_ID('tempdb..#SomeTest'), 0) <> 0
BEGIN
    DROP TABLE #SomeTest;
END
CREATE TABLE #SomeTest
(
    TestID INT NOT NULL
        PRIMARY KEY 
        IDENTITY(1,1)
    , A INT NOT NULL
    , B FLOAT NOT NULL
    , C MONEY NOT NULL
    , D BIGINT NOT NULL
);

INSERT INTO #SomeTest (A, B, C, D)
SELECT o1.object_id, o2.object_id, o3.object_id, o4.object_id
FROM sys.objects o1
    , sys.objects o2
    , sys.objects o3
    , sys.objects o4;

SELECT COUNT(1) 
FROM #SomeTest;

This returns a count of 260,144,641 on my machine.

To test the "addition" method, I run:

CHECKPOINT 5;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;

SET STATISTICS IO, TIME ON;
GO
SELECT COUNT(1)
FROM #SomeTest st
WHERE (st.A + st.B + st.C + st.D) = 0;
GO
SET STATISTICS IO, TIME OFF;

The messages tab shows:

Table '#SomeTest'. Scan count 3, logical reads 1322661, physical reads 0, read-ahead reads 1313877, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 49047 ms, elapsed time = 173451 ms.

For the "discrete columns" test:

CHECKPOINT 5;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;

SET STATISTICS IO, TIME ON;
GO
SELECT COUNT(1)
FROM #SomeTest st
WHERE st.A = 0
    AND st.B = 0
    AND st.C = 0
    AND st.D = 0;
GO

SET STATISTICS IO, TIME OFF;

again, from the messages tab:

Table '#SomeTest'. Scan count 3, logical reads 1322661, physical reads 0, read-ahead reads 1322661, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 8938 ms, elapsed time = 162581 ms.

From the stats above you can see the second variant, with the discrete columns compared to 0, the elapsed time is about 10 seconds shorter, and the CPU time is about 6 times less. The long durations in my tests above are mostly a result of reading a lot of rows from disk. If you drop the number of rows to 3 million, you see the ratios remain about the same but the elapsed times drop noticeably, since the disk I/O has much less of an effect.

With the "Addition" method:

Table '#SomeTest'. Scan count 3, logical reads 15255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 499 ms, elapsed time = 256 ms.

With the "discrete columns" method:

Table '#SomeTest'. Scan count 3, logical reads 15255, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 94 ms, elapsed time = 53 ms.

What will make a really really big difference for this test? An appropriate index, such as:

CREATE INDEX IX_SomeTest ON #SomeTest(A, B, C, D);

The "addition" method:

Table '#SomeTest'. Scan count 3, logical reads 14235, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 546 ms, elapsed time = 314 ms.

The "discrete columns" method:

Table '#SomeTest'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

The execution plan for each query (with the above index in-place) is quite telling.

The "addition" method, which must perform a scan of the entire index:

enter image description here

and the "discrete columns" method, which can seek to the first row of the index where the leading index column, A, is zero:

enter image description here

share|improve this answer
2  
Well done on the analysis. Thanks for taking the time to put together this test code. – srutzky 2 days ago
1  
@Max Vernon, Very well detailed. You have convinced me that using Logical Operators is the way to go. Thank you. That answers my question. – JohnG yesterday

Let's say you have an index on A, B, C and D. Could be filtered too.

This is more likely to use the index then addition.

Where A=0 and B=0 and C=0 and D=0

In other news, If A is -1 and B is 1, A+B=0 is true but A=0 and B=0 is false.

share|improve this answer
9  
If A is -1 and B is 1, A+B=0 but A=0 and B= 0 is false is a critical point... – Rozwel 2 days ago
1  
There are certainly times where you balance correctness with speed, but generally you want it to be intentional. =) – corsiKa 2 days ago
    
I wouldn't even say "more likely to use the index". I'm pretty sure there's no way for addition to use an index, unless specifically optimized for this exact scenario which would really only happen if more people did "clever" solutions like this. Not to mention that addition requires quite a bit more CPU, so concurrent queries are going to hurt particularly badly. – Luaan 23 hours ago

I would guess that the separate AND conditions would be preferred since the optimizer would be more able to short-circuit the operation if a single one of them is 0, without needing to do a computation first.

Still, since this is a question of performance, you should first set up a test to determine the answer on your hardware, and then report here what the result was (and showing your test code) and ask for others to look it over to make sure it was a good test, and if there are other factors that need to be considered.

share|improve this answer

Some general reasoning, if you don't have any indexes at hand I don't think it will matter much which of the two solutions you choose, both will perform badly. If you on the other hand have an index on one or more of the columns in the predicate the first one will likely perform better than the second one, since the second one probably wont be able to utilize the index(es).

Disjunctions (OR) in general performs worse than conjunctions (AND), but even if you have a query with disjunctions I'll put my money on the first one.

share|improve this answer

This is a simple question

No, it is not. This (kind of) question is what plagues many DBAs and software developers day in day out, and it is all but trivial.

that I can't seem to find the answer for.

Yes, you won't. At least not a general answer. First of all, it will depend hugely on which RDBMS you are using (OK, you are using , but still). It may even change when you go from one version of your RDBMS to the next.

Then, it can depend on any amount of other small details, for example how your DB stores the data, if you have sub-selects/joins that confuse the issue for the plan optimizer etc.. The optimizer might give you different execution plans depending on how many rows you have...

Doing a real-world test is usually the only useful way to solve questions like this. Also, any gains gotten by "arcane" optimizations like this are usually swallowed up tenfold by smart choice of indexes, so I would not bother spending too much time on it, before a use of indexes has really been ruled out.

share|improve this answer

I would guess that the separate AND conditions would be preferred since the optimizer would be more able to short-circuit the operation if a single one of them is 0, without needing to do a computation first.

Still, since this is a question of performance, you should first set up a test to determine the answer on your hardware, and then report here what the result was (and showing your test code) and ask for others to look it over to make sure it was a good test, and if there are other factors that need to be considered.

                       (or)

Some general reasoning, if you don't have any indexes at hand I don't think it will matter much which of the two solutions you choose, both will perform badly. If you on the other hand have an index on one or more of the columns in the predicate the first one will likely perform better than the second one, since the second one probably wont be able to utilize the index(es).

Disjunctions (OR) in general performs worse than conjunctions (AND), but even if you have a query with disjunctions I'll put my money on the first one.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.