Programming Puzzles & Code Golf Stack Exchange is a question and answer site for programming puzzle enthusiasts and code golfers. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

Background

We already have a challenge about throwing SIGSEGV, so why not a challenge about throwing SIGILL?

What is SIGILL?

SIGILL is the signal for an illegal instruction at the processor, which happens very rarely. The default action after receiving SIGILL is terminating the program and writing a core dump. The signal ID of SIGILL is 4. You encounter SIGILL very rarely, and I have absolutely no idea how to generate it in your code except via sudo kill -s 4 <pid>.

Rules

You will have root in your programs, but if you don't want to for any reasons, you may also use a normal user. I'm on a Linux computer with German locale and I do not know the English text which is displayed after catching SIGILL, but I think it's something like 'Illegal instruction'. The shortest program which throws SIGILL wins.

share|improve this question
3  
You might want to clarify whether the instruction has to be generated by the kernel or not. In particular, do you want to allow the program just generating it directly using the libc call raise(SIGILL)? – ais523 2 days ago
    
It really does say Illegal instruction (core dumped). – Erik the Golfer 2 days ago
    
@ais523 Everything is allowed. – Mega Man 2 days ago
2  
"I'm on a German Linux computer"? I think you mean that you have set your locale to German. – Carsten S yesterday
2  
For any hardware that can raise SIGILL, the answer will be the same as the instruction length. Just put an illegal instruction somewhere and try to execute it. The only interesting thing will be the convoluted toolchain involved. – OrangeDog yesterday

17 Answers 17

PDP-11 Assembler (UNIX Sixth Edition), 1 byte

9

Instruction 9 is not a valid instruction on the PDP-11 (in octal, it would be 000011, which does not appear on the list of instructions (PDF). The PDP-11 assembler that ships with UNIX Sixth Edition apparently echoes everything it doesn't understand into the file directly; in this case, 9 is a number, so it generates a literal instruction 9. It also has the odd property (unusual in assembly languages nowadays) that files start running from the start, so we don't need any declarations to make the program work.

You can test out the program using this emulator, although you'll have to fight with it somewhat to input the program.

Here's how things end up once you've figured out how to use the filesystem, the editor, the terminal, and similar things that you thought you already knew how to use:

% a.out
Illegal instruction -- Core dumped

I've confirmed with the documentation that this is a genuine SIGILL signal (and it even had the same signal number, 4, all the way back then!)

share|improve this answer
    
It had the same signal number because POSIX and UNIX and the SUS are closely related :) – cat 2 days ago
2  
Almost all of the signal numbers in V6 still have the same meanings today; the mnemonics have actually been less stable than the numbers. Compare minnie.tuhs.org/cgi-bin/utree.pl?file=V6/usr/sys/param.h with github.com/freebsd/freebsd/blob/master/sys/sys/signal.h — identical semantics for 1 through 13, but only 1, 2, and 13 have exactly the same names. (SIGALRM/14 and SIGTERM/15 were only added in V7.) (The System V lineage has a couple of changes, notably moving SIGBUS from 10 to 7 (replacing the useless SIGEMT) and SIGSYS above 15, to make room for SIGUSR1 and SIGUSR2.) – zwol yesterday
3  
@cat POSIX and SUS don't actually specify the values of signals - they do specify the meaning of some numbers when passed as arguments to the kill command, but SIGILL is not included. – Random832 yesterday
1  
@Random832: For the record: this is where POSIX documents the required name->number mappings for the kill command. Wikipedia copied that into a table at the end of this section. I'm not sure if anything prevents the kill shell command from remapping the numbers itself to make kill -9 work even if SIGKILL is actually a different number in the C API, but that would be really silly (like Deathstation 9000 kind of behaviour). – Peter Cordes yesterday

C (x86_64, TCC), 7 bytes

main=6;

Inspired by this answer.

How it works

The generated assembly looks like this.

    .globl  main
main:
    .long 6

Note that TCC doesn't place the defined "function" in a data segment.

After compilation, _start will point to main as usual. When the resulting program is executed, it expects code in main and finds the little-endian(!) 32-bit integer 6, which is encoded as 0x06 0x00 0x00 0x00. The first byte – 0x06 – is an invalid opcode, so the program terminates with SIGILL.

Verfication

$ xxd -g 1 ill.c
0000000: 6d 61 69 6e 3d 36 3b                             main=6;
$ tcc ill.c
$ ./a.out
Illegal instruction

C (x86_64, GCC), 13 bytes

const main=6;

Test it with rextester.

How it works

Without the const modifier, the generated assembly looks like this.

    .globl  main
    .data
main:
    .long   6
    .section    .note.GNU-stack,"",@progbits

GCC's linker treats the last line as a hint that the generated object does not require an executable stack. Since main is explicitly placed in a data section, the opcode it contains ins't executable, so the program terminates will SIGSEGV (segmentation fault).

Removing either the second or the last line will make the generated executable work as intended. The last line could be ignored with the compiler flag -z execstack, but this costs 13 bytes.

A shorter alternative is to declare main with the const modifier, resulting in the following assembly.

        .globl  main
        .section    .rodata
main:
        .long   6
        .section    .note.GNU-stack,"",@progbits

This works without any compiler flags. Note that main no resides main=6; would write the defined "function" in data, but the const modifier makes GCC write it in rodata instead, which (at least on my platform) is allowed to contain code.

Verification

$ gcc --version
gcc (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064]
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ xxd -g 1 ill.c
0000000: 63 6f 6e 73 74 20 6d 61 69 6e 3d 36 3b           const main=6;
$ gcc ill.c
$ ./a.out
Illegal instruction
share|improve this answer
    
Love the complete avoidance of even using a function :) How does this work in C terms, though? Does the compiler see that main is a 6 and try to call it (which I guess would make it give up and try the instruction)? – Jack Dobson 2 days ago
5  
@JackDobson it's undefined behavior, so it doesn't work in terms of C; you're at the compiler's mercy. Clang even has a warning for this for some reason: "variable named 'main' with external linkage has undefined behavior". – Bobby Sacamano 2 days ago
1  
GCC will complain about main not being a function but only if you turn on the warnings (either -Wall or -pedantic will do it). – zwol yesterday
    
I think it's pretty standard for executables for Unix-like systems to have text/data/bss segments. The linker places the .rodata section inside the text segment of the executable, and I expect this will be the case on pretty much any platform. (The kernel's program-loader only cares about segments, not sections). – Peter Cordes yesterday
2  
Also note that 06 is only an invalid instruction in x86-64. In 32-bit mode, it's PUSH ES, so this answer only works with compilers that default to -m64. See ref.x86asm.net/coder.html#x06. The only byte sequence that's guaranteed to decode as an illegal instruction on all future x86 CPUs is the 2 byte UD2: 0F 0B. Anything else could be some future prefix or instruction-encoding. Still, upvoted for a cool way to get a C compiler to stick a main label on some bytes! – Peter Cordes yesterday

C (x86_64), 11, 30, 34, or 34+15 = 49 bytes

main[]="/";
c=6;main(){((void(*)())&c)();}
main(){int c=6;((void(*)())&c)();}

I've submitted a couple of solutions that use library functions to throw SIGILL via various means, but arguably that's cheating, in that the library function solves the problem. Here's a range of solutions that use no library functions, and make varying assumptions about where the operating system is willing to let you execute non-executable code. (The constants here are chosen for x86_64, but you could change them to get working solutions for most other processors that have illegal instructions.)

06 is the lowest-numbered byte of machine code that does not correspond to a defined instruction on an x86_64 processor. So all we have to do is execute it. (Alternatively, 2F is also undefined, and corresponds to a single printable ASCII character.) Neither of these are guaranteed to always be undefined, but they aren't defined as of today.

The first program here executes 2F from the read-only data segment. Most linkers aren't capable of producing a working jump from .text to .rodata (or their OS's equivalent) as it's not something that would ever be useful in a correctly segmented program; I haven't found an operating system on which this works yet. You'd also have to allow for the fact that many compilers want the string in question to be a wide string, which would require an additional L; I'm assuming that any operating system that this works on has a fairly outdated view of things, and thus is building for a pre-C94 standard by default. It's possible that there's nowhere this program works, but it's also possible that there's somewhere this program works, and thus I'm listing it in this collection of more-dubious-to-less-dubious potential answers. (After I posted this answer, Dennis also mentioned the possibility main[]={6} in chat, which is the same length, and which doesn't run into problems with character width, and even hinted at the potential for main=6; I can't reasonably claim these answers as mine, as I didn't think of them myself.)

The second program here executes 06 from the read-write data segment. On most operating systems this will cause a segmentation fault, because writable data segments are considered to be a bad design flaw that makes exploits likely. This hasn't always been the case, though, so it probably works on a sufficiently old version of Linux, but I can't easily test it.

The third program executes 06 from the stack. Again, this causes a segmentation fault nowadays, because the stack is normally classified as nonwritable for security reasons. The linker documentation I've seen heavily implies that it used to be legal to execute from the stack (unlike the preceding two cases, doing so is occasionally useful), so although I can't test it, I'm pretty sure there's some version of Linux (and probably other operating systems) on which this works.

Finally, if you give -Wl,-z,execstack (15 byte penalty) to gcc (if using GNU ld as part of the backend), it will explicitly turn off executable stack protection, allowing the third program to work and give an illegal operation signal as expected. I have tested and verified this 49-byte version to work. (Dennis mentions in chat that this option apparently works with main=6, which would give a score of 6+15. I'm pretty surprised that this works, given that the 6 blatantly isn't on the stack; the link option apparently does more than its name suggests.)

share|improve this answer
    
On x86-64/linux with gcc6 in its default (lenient) mode, const main=6; works, as do several variations. This linker (which I suspect is also your linker) is capable of generating a jump from .text to .rodata; the problem you were having is that, without const, you're jumping into the writable data segment (.data), which is not executable on modern hardware. It would have worked on older x86, where the memory protection hardware could not mark pages as readable-but-not-executable. – zwol 2 days ago
    
Note that even in C89, main is required to be a function (§5.1.2.2.1) -- I don't know why gcc considers declaring main as a data object to only deserve a warning, and only with -pedantic on the command line. Someone back in the early 1990s perhaps thought that nobody would do that by accident, but it's not like it's a useful thing to do on purpose except for this sort of game. – zwol 2 days ago
    
... Rereading again, it seems you expected main[]="/" to jump to the read-only data segment, because string literals go in rodata. You've been caught out by the difference between char *foo = "..." and char foo[] = "...". char *foo = "..." is syntactic sugar for const char __inaccessible1[] = "..."; char *foo = (char *)&__inaccessible1[0];, so the string literal does go in rodata, and foo is a separate, writable global variable that points to it. With char foo[] = "...", however, the entire array goes in the writable data segment. – zwol 2 days ago

Swift, 5 bytes

[][0]

Access index 0 of an empty array. This calls fatalError(), which prints an error message and crashes with a SIGILL. You can try it here.

share|improve this answer
    
This is one of the trickier ones ;) – Mega Man 34 mins ago

GNU C, 25 bytes

main(){__builtin_trap();}

GNU C (a specific dialect of C with extensions) contains an instruction to crash the program intentionally. The exact implementation varies from version to version, but often the developers make an attempt to implement the crash as cheaply as possible, which normally involves the use of an illegal instruction.

The specific version I used to test is gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0; however, this program causes a SIGILL on a fairly wide range of platfoms, and thus is fairly portable. Additionally, it does it via actually executing an illegal instruction. Here's the assembly code that the above compiles into with default optimization settings:

main:
    pushq %rbp
    movq %rsp, %rbp
    ud2

ud2 is an instruction that Intel guarantees will always remain undefined.

share|improve this answer
4  
−6: main(){asm("ud2");} – wchargin 2 days ago
    
Also, I don't know how we count bytes for raw assembly, but 00 00 0f 0b is the machine language for ud2 – wchargin 2 days ago
1  
@wchargin: that's an x86 + GNU C answer. This one is portable to all GNU systems. Also note that UD2 is only 2 bytes. IDK where you got those 00 bytes; they're not part of the machine code for UD2. BTW, as I commented on Dennis's answer, there are one-byte illegal instructions in x86-64 for now, but they're not guaranteed to stay that way. – Peter Cordes yesterday
    
@wchargin: We count bytes for machine-code functions / programs like you'd expect. See some of my answers, like Adler32 in 32 bytes of x86-64 machine code, or GCD in 8 bytes of x86-32 machine code. – Peter Cordes yesterday

Bash on Raspbian on QEMU, 4 (1?) bytes

Not my work. I merely report the work of another. I'm not even in a position to test the claim. Since a crucial part of this challenge seems to be finding an environment where this signal will be raised and caught, I'm not including the size of QEMU, Raspbian, or bash.

On Feb 27, 2013 8:49 pm, user emlhalac reported "Getting 'illegal instruction' when trying to chroot" on the Raspberry Pi fora.

ping

producing

qemu: uncaught target signal 4 (Illegal instruction) - core dumped
Illegal instruction (core dumped)

I imagine much shorter commands will produce this output, for instance, tr.

EDIT: Based on @fluffy's comment, reduced the conjectured lower bound on input length to "1?".

share|improve this answer
    
I'd think the [ command would win. :) – fluffy 20 hours ago

GNU as (x86_64), 3 bytes

ud2

$ xxd sigill.S

00000000: 7564 32                                  ud2

$ as --64 sigill.S -o sigill.o ; ld -S sigill.o -o sigill

sigill.S: Assembler messages:
sigill.S: Warning: end of file not at end of a line; newline inserted
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400078

$ ./sigill

Illegal instruction

$ objdump -d sigill

sigill:     file format elf64-x86-64

Disassembly of section .text:

0000000000400078 <__bss_start-0x200002>:>
  400078:       0f 0b                   ud2
share|improve this answer
    
There must be a way to express that in a two-byte "source"... – OrangeDog yesterday
    
Oh, clever. I was wondering if there was a way to build this which would put the entry point at the start of the file (with no declarations) and which wouldn't incur penalties for an unusual build system configuration. I couldn't find one, but it looks like you did. – ais523 17 hours ago

Perl, 9 bytes

kill+4,$$

Simply calls the appropriate library function for signalling a process, and gets the program to signal itself with SIGILL. No actual illegal instructions are involved here, but it produces the appropriate result. (I think this makes the challenge fairly cheap, but if anything's allowed, this is the loophole you'd use…)

share|improve this answer
    
Came here to post the same, with a space instead of the +. :) – simbabque yesterday
1  
When people learn Perl for non-golf programming, they learn it with a +. After golfing for a while, they do + occasionally to show off. Eventually, they've written enough programs where they needed to avoid whitespace for some reason or other that the + becomes habit. (It also parses less ambiguously, because it works around triggering the special case in the parser for parentheses.) – ais523 yesterday

Ruby, 13 bytes

`kill -4 #$$`

I guess it's safe to assume that we are running this from a *nix shell. The backtick literals runs the given shell command. $$ is the running Ruby process, and the # is for string interpolation.


Without calling the shell directly:

Ruby, 17 bytes

Process.kill 4,$$
share|improve this answer

C (32-bit Windows), 34 bytes

f(i){(&i)[-1]-=9;}main(){f(2831);}

This only works if compiling without optimizations (else, the illegal code in the f function is "optimized out").

Disassembly of the main function looks like this:

68 0f 0b 00 00    push 0b0f
e8 a1 d3 ff ff    call _f
...

We can see that it uses a push instruction with a literal value 0b0f (little-endian, so its bytes are swapped). The call instruction pushes a return address (of the ... instruction), which is situated on the stack near the parameter of the function. By using a [-1] displacement, the function overrides the return address so it points 9 bytes earlier, where the bytes 0f 0b are.

These bytes cause an "undefined instruction" exception, as designed.

share|improve this answer

Microsoft C (Visual Studio 2005 onwards), 16 bytes

main(){__ud2();}

I can't easily test this, but according to the documentation it should produce an illegal instruction by intentionally trying to execute a kernel-only instruction from a user-mode program. (Note that because the illegal instruction crashes the program, we don't have to try to return from main, meaning that this K&R-style main function is valid. Visual Studio never having moved on from C89 is normally a bad thing, but it came in useful here.)

share|improve this answer
    
Can you compile for linux with VS2015? Because I do not think SIGILL is defined in Windows, is it? – Andrew Savinykh 14 hours ago

x86 MS-DOS COM file, 2 bytes

EDIT: May not be valid, due to the way DOS processes CPU exceptions. See comments below.

0F 0B

From the documentation:

Generates an invalid opcode. This instruction is provided for software testing to explicitly generate an invalid opcode.

Which is pretty self-explanatory. Save as a .com file, and run in any DOS emulator.

share|improve this answer
    
Does it really generate anything like exception or signal? One emulator I have access to just hangs; which one did you use? – anatolyg 2 days ago
    
It technically does cause the processor to generate an illegal instruction exception. However, DOS has very limited memory protection and exception handling capabilities, and I wouldn't be surprised if it just lead to undefined behavior / OS crash. The OP didn't say the kernel had to catch the error and print "Illegal Instruction" to the console. – maservant yesterday
2  
I thought of this solution, but I don't believe it's valid. The question requires an illegal instruction signal, not just an illegal instruction processor trap, so the aim was to find an operating system which would actually generate a signal in response to the #UD trap. (Also, I decided to actually test it, and it appeared to throw my DOS emulator into an infinite loop.) – ais523 yesterday
1  
I tested this on actual MS-DOS on an AMD K6-II. Running it just hangs the system, both with and without EMM386 running. (EMM386 traps some errors and halts the system with a message, so it was worth testing to see if it made a difference.) – Mark yesterday
    
Awesome, some people still have actual MS-DOS systems! I'm stuck with an emulator :( . I do kind of agree with ais523, this is not actually a signal, it just triggers an interrupt and passes control to the operating system, which emits the actual signal. If the OS doesn't have a way to handle this situation, it will most likely just crash (which may not count as a solution). – maservant yesterday

AutoIt, 93 bytes

Using flatassembler inline assembly:

#include<AssembleIt.au3>
Func W()
_("use32")
_("ud2")
_("ret")
EndFunc
_AssembleIt("int","W")

When run in SciTE interactive mode, it'll crash immediately. The Windows debugger should popup for a fraction of a second. The console output will be something like this:

--> Press Ctrl+Alt+Break to Restart or Ctrl+Break to Stop
0x0F0BC3
!>14:27:09 AutoIt3.exe ended.rc:-1073741795

Where -1073741795 is the undefined error code thrown by the WinAPI. This can be any negative number.

Similar using my own assembler LASM:

#include<LASM.au3>
$_=LASM_ASMToMemory("ud2"&@CRLF&"ret 16")
LASM_CallMemory($_,0,0,0,0)
share|improve this answer

ELF + x86 machine code, 45 bytes

This should be the smallest executable program on an Unix machine that throws SIGILL (due to Linux not recognizing the executable if made any smaller).

Compile with nasm -f bin -o a.out tiny.asm, tested on an x64 virtual machine.

Actual 45 bytes binary:

0000000 457f 464c 0001 0000 0000 0000 0000 0001

0000020 0002 0003 0020 0001 0020 0001 0004 0000

0000040 0b0f c031 cd40 0080 0034 0020 0001

Assembly listing (see source below):

;tiny_sigill.asm      
BITS 32


            org     0x00010000

            db      0x7F, "ELF"             ; e_ident
            dd      1                                       ; p_type
            dd      0                                       ; p_offset
            dd      $$                                      ; p_vaddr 
            dw      2                       ; e_type        ; p_paddr
            dw      3                       ; e_machine
            dd      _start                  ; e_version     ; p_filesz
            dd      _start                  ; e_entry       ; p_memsz
            dd      4                       ; e_phoff       ; p_flags


_start:
                ud2                             ; e_shoff       ; p_align
                xor     eax, eax
                inc     eax                     ; e_flags
                int     0x80
                db      0
                dw      0x34                    ; e_ehsize
                dw      0x20                    ; e_phentsize
                db      1                       ; e_phnum
                                                ; e_shentsize
                                                ; e_shnum
                                                ; e_shstrndx

  filesize      equ     $ - $$

Disclaimer: code from the following tutorial on writing the smallest assembly program to return a number, but using opcode ud2 instead of mov: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

share|improve this answer

Java, 48 bytes

void a(){Runtime.getRuntime.exec("kill -4 $$");}

Command stolen from fluffy's answer.

For the sake of doing something more impressive in such a verbose language, here's a 70-byte bonus:

void a(int A){for(;;A++)Runtime.getRuntime.exec("sudo kill -s 4 "+A);}

Goes through ALL processes (including itself) and makes each of them throw a SIGILL. Better brace for a violent crash.

share|improve this answer
    
Upvoted for scorched earth solution – Toadfish 6 hours ago

Python, 32 bytes

from os import*;kill(getpid(),4)
share|improve this answer
    
Same byte count: import os;os.kill(os.getpid(),4) – Oliver 22 hours ago

Any shell (sh, bash, csh, etc.), any POSIX (10 bytes)

Trivial answer but I hadn't seen anyone post it.

kill -4 $$

Just sends SIGILL to the current process. Example output on OSX:

bash-3.2$ kill -4 $$
Illegal instruction: 4
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.