Why is there no `nand` instruction in modern CPUs?

Question

Why did x86 designers (or other CPU architectures as well) decide not to include it? It is a logic gate that can be used to build other logic gates, thus it is fast as a single instruction. Rather than chaining not and and instructions (both are created from nand), why no nand instruction?.

What usecase do you have for the nand instruction? Probably x86 designers never found any — PlasmaHH, 23 hours ago
Something like this in C code !(a & b) can be translated into a single instruction instead of 2. So, why not? It's just a few nand gates. — Amumu, 23 hours ago
Well, they would be 32-input nand gates. Or whatever the register size of a and b. Maybe it is just more common to use "and," "or," "xor" and "not." I think our brains have trouble thinking in "nor" and "nand." — mkeith, 23 hours ago
ARM has the BIC instruction, which is a & ~b. Arm Thumb-2 has the ORN instruction which is ~(a | b). ARM is pretty modern. Encoding an instruction in the CPU instruction set has its costs. So only the most "useful" ones are making their way into ISA. — Eugene Sh., 22 hours ago
@Amumu We could have ~(((a << 1) | (b >> 1)) | 0x55555555) instruction too. The purpose would be so that ~(((a << 1) | (b >> 1)) | 0x55555555) can be translated into a single instruction instead of 6. So, why not? — immibis, 18 hours ago

pjc50 · Accepted Answer · 2017-01-17 16:46:29Z

up vote 21 down vote accepted

http://www.ibm.com/support/knowledgecenter/ssw_aix_61/com.ibm.aix.alangref/idalangref_nand_nd_instrs.htm : POWER has NAND.

But generally modern CPUs are built to match automated code generation by compilers, and bitwise NAND is very rarely called for. Bitwise AND and OR get used more often for manipulating bitfields in data structures. In fact, SSE has AND-NOT but not NAND.

Every instruction has a cost in the decode logic and consumes an opcode that could be used for something else. Especially in variable-length encodings like x86, you can run out of short opcodes and have to use longer ones, which potentially slows down all code.

answered 22 hours ago

pjc50

25.7k12561

AND-NOT is useful in about as many situations as xor, but never seems to have become popular as a gate. instruction, or language operator. – supercat 19 hours ago

1

@supercat AND-NOT is commonly used to turn off bits in a bit-set variable. e.g. if(windowType & ~WINDOW_RESIZABLE) { ... do stuff for variable-sized windows ... } – adib 18 hours ago

@adib: Yup. An interesting feature of "and-not" is that unlike the "bitwise not" operator [~] the result size won't matter. If foo is a uint64_t, the statement foo &= ~something; may sometimes clear out more bits than intended, but if there were a &~= operator such problems could be avoided. – supercat 17 hours ago

I think you have your cause and effect backwards - the instructions emitted by a compiler must necessarily be the ones provided by a processor. The early processors didn't include NAND, therefore the compilers were designed to live without it. – Mark Ransom 17 hours ago

2

@MarkRansom: No, the cause-and effect is entirely correct from computing history. This phenomena of designing CPUs that are optimized for compilers instead of human assembly programmers was part of the RISC movement (though, the RISC movement itself is wider than just that aspect). CPUs designed for compilers include the ARM and Atmel AVR. In the late 90s and early 00s people hired compiler writers and OS programmers to design CPU instruction sets – slebetman 10 hours ago

| show 4 more comments

Wouter van Ooijen · Answer 2 · 2017-01-17 18:35:23Z

The cost of such an ALU functions is

1) the logic that performs the function itself

2) the selector that selects this function result instead of the others out of all ALU functions

3) the cost of having this option in the instruction set (and not having some other usefull function)

I agree with you that the 1) cost is very small. The 2) and 3) cost however is almost independent of the function. I think in this case the 3) cost (the bits occupied in the instruction) were the reason not to have this specific instruction. Bits in an instruction are a very scarce resource for a CPU/architecture designer.

Brian Drummond · Answer 3 · 2017-01-17 18:40:25Z

Turn it around - first see why Nand was popular in hardware logic design - it has several useful properties there. Then ask whether those properties still apply in a CPU instruction...

TL/DR - they don't, so there's no downside to using And, Or or Not instead.

The biggest advantage to hardwired Nand logic was speed, gained by reducing the number of logic levels (transistor stages) between a circuit's inputs and outputs. In a CPU, the clock speed is determined by the speed of much more complex operations like addition, so speeding up an AND operation won't enable you to increase clock rate.

And the number of times you need to combine other instructions is vanishingly small - enough so that Nand really doesn't earn its space in the instrucnion set.

Peter Green · Answer 4 · 2017-01-17 17:44:40Z

First off don't confuse bitwise and logical operations.

Bitwise operations are usually used to set/clear/toggle/check bits in bitfields. None of these operations require nand ("and not", also known as "bit clear" is more useful).

Logical operations in most modern programming languages are short-circuit. So usually a branch-based approach to implementing them is needed. Even when the compiler could prove that short-circuit vs complete evaluation makes no difference to program behaviour the operands for the logical operations are usually not in a conviniant form to implement the expression using the bitwise asm operations.

Marcus Müller · Answer 5 · 2017-01-17 20:34:18Z

I'd like to agree with Brian here, and Wouter and pjc50.

I'd also like to add that on general-purpose, especially CISC, processors, instructions don't all have the same throughputs – a complicated operation might simply take more cycles that an easy one.

Consider X86: AND (which is an "and" operation) is probably very fast. Same goes for NOT. Let's look at a bit of disassembly:

Input code:

#include <immintrin.h>
#include <stdint.h>

__m512i nand512(__m512i a, __m512i b){return ~(a&b);}
__m256i nand256(__m256i a, __m256i b){return ~(a&b);}
__m128i nand128(__m128i a, __m128i b){return ~(a&b);}
uint64_t nand64(uint64_t a, uint64_t b){return ~(a&b);}
uint32_t nand32(uint32_t a, uint32_t b){return ~(a&b);}
uint16_t nand16(uint16_t a, uint16_t b){return ~(a&b);}
uint8_t nand8(uint8_t a, uint8_t b){return ~(a&b);}

Command to produce assembly:

gcc -O3 -c -S  -mavx512f test.c

Output Assembly (shortened):

    .file   "test.c"
nand512:
.LFB4591:
    .cfi_startproc
    vpandq  %zmm1, %zmm0, %zmm0
    vpternlogd  $0xFF, %zmm1, %zmm1, %zmm1
    vpxorq  %zmm1, %zmm0, %zmm0
    ret
    .cfi_endproc
nand256:
.LFB4592:
    .cfi_startproc
    vpand   %ymm1, %ymm0, %ymm0
    vpcmpeqd    %ymm1, %ymm1, %ymm1
    vpxor   %ymm1, %ymm0, %ymm0
    ret
    .cfi_endproc
nand128:
.LFB4593:
    .cfi_startproc
    vpand   %xmm1, %xmm0, %xmm0
    vpcmpeqd    %xmm1, %xmm1, %xmm1
    vpxor   %xmm1, %xmm0, %xmm0
    ret
    .cfi_endproc
nand64:
.LFB4594:
    .cfi_startproc
    movq    %rdi, %rax
    andq    %rsi, %rax
    notq    %rax
    ret
    .cfi_endproc
nand32:
.LFB4595:
    .cfi_startproc
    movl    %edi, %eax
    andl    %esi, %eax
    notl    %eax
    ret
    .cfi_endproc
nand16:
.LFB4596:
    .cfi_startproc
    andl    %esi, %edi
    movl    %edi, %eax
    notl    %eax
    ret
    .cfi_endproc
nand8:
.LFB4597:
    .cfi_startproc
    andl    %esi, %edi
    movl    %edi, %eax
    notl    %eax
    ret
    .cfi_endproc

As you can see, for the sub-64-sized data types, things are simply all handled as longs (hence the andl and notl), since that's the "native" bitwidth of my compiler, as it seems.

The fact that there's movs in between is only due to the fact that eax is the register that contains a function's return value. Normally, you'd just calculate on in the edi general purpose register to calculate on with the result.

For 64 bits, it's the same – just with "quad" (hence, trailing q) words, and rax/rsi instead of eax/edi.

It seems that for 128 bit operands and larger, Intel didn't care to implement a "not" operation; instead, the compiler produces an all-1 register (self-comparison of the register with itself, result stored in the register with the vdcmpeqd instruction), and xors that.

In short: By implementing a complicated operation with multiple elementary instructions, you don't necessarily slow down operation – there's simply no advantage to having one instruction that does the job of multiple instructions if it isn't faster.

asked	today
viewed	4920 times
active	today

current community

your communities

more stack exchange communities

Why is there no `nand` instruction in modern CPUs?

5 Answers 5

Your Answer

Not the answer you're looking for? Browse other questions tagged cpu or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Why is there no `nand` instruction in modern CPUs?

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged cpu or ask your own question.

Linked

Related

Hot Network Questions