Programming Puzzles & Code Golf Stack Exchange is a question and answer site for programming puzzle enthusiasts and code golfers. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

Background

We already have a challenge about throwing SIGSEGV, so why no challenge about throwing SIGILL?

What is SIGILL?

SIGILL is the signal for an illegal instruction at the processor, which happens very rarely. The default action after receiving SIGILL is terminating the program and writing a core dump. The signal ID of SIGILL is 4. You encounter SIGILL very rarely, and I have absolutely no idea how to generate it in your code except via sudo kill -s 4 <pid>.

Rules

You will have root in your programs, but if you don't want to for any reasons, you may also use a normal user. I'm on a german Linux computer and I do not know the English text which is displayed after catching SIGILL, but I think it's something like 'Illegal instruction'. The shortest program which throws SIGILL wins.

share|improve this question
1  
You might want to clarify whether the instruction has to be generated by the kernel or not. In particular, do you want to allow the program just generating it directly using the libc call raise(SIGILL)? – ais523 18 hours ago
    
It really does say Illegal instruction (core dumped). – Erik the Golfer 18 hours ago
    
@ais523 Everything is allowed. – Mega Man 17 hours ago

11 Answers 11

PDP-11 Assembler (UNIX Sixth Edition), 1 byte

9

Instruction 9 is not a valid instruction on the PDP-11 (in octal, it would be 000011, which does not appear on the list of instructions (PDF). The PDP-11 assembler that ships with UNIX Sixth Edition apparently echoes everything it doesn't understand into the file directly; in this case, 9 is a number, so it generates a literal instruction 9. It also has the odd property (unusual in assembly languages nowadays) that files start running from the start, so we don't need any declarations to make the program work.

You can test out the program using this emulator, although you'll have to fight with it somewhat to input the program.

Here's how things end up once you've figured out how to use the filesystem, the editor, the terminal, and similar things that you thought you already knew how to use:

% a.out
Illegal instruction -- Core dumped

I've confirmed with the documentation that this is a genuine SIGILL signal (and it even had the same signal number, 4, all the way back then!)

share|improve this answer
    
It had the same signal number because POSIX and UNIX and the SUS are closely related :) – cat 14 hours ago
2  
Almost all of the signal numbers in V6 still have the same meanings today; the mnemonics have actually been less stable than the numbers. Compare minnie.tuhs.org/cgi-bin/utree.pl?file=V6/usr/sys/param.h with github.com/freebsd/freebsd/blob/master/sys/sys/signal.h — identical semantics for 1 through 13, but only 1, 2, and 13 have exactly the same names. (SIGALRM/14 and SIGTERM/15 were only added in V7.) (The System V lineage has a couple of changes, notably moving SIGBUS from 10 to 7 (replacing the useless SIGEMT) and SIGSYS above 15, to make room for SIGUSR1 and SIGUSR2.) – zwol 8 hours ago
2  
@cat POSIX and SUS don't actually specify the values of signals - they do specify the meaning of some numbers when passed as arguments to the kill command, but SIGILL is not included. – Random832 5 hours ago
    
@Random832: For the record: this is where POSIX documents the required name->number mappings for the kill command. Wikipedia copied that into a table at the end of this section. I'm not sure if anything prevents the kill shell command from remapping the numbers itself to make kill -9 work even if SIGKILL is actually a different number in the C API, but that would be really silly (like Deathstation 9000 kind of behaviour). – Peter Cordes 2 hours ago

C (TCC), 7 bytes

main=6;

Inspired by this answer.

How it works

The generated assembly looks like this.

    .globl  main
main:
    .long 6

Note that TCC doesn't place the defined "function" in a data segment.

After compilation, _start will point to main as usual. When the resulting program is executed, it expects code in main and finds the little-endian(!) 32-bit integer 6, which is encoded as 0x06 0x00 0x00 0x00. The first byte – 0x06 – is an invalid opcode, so the program terminates with SIGILL.

Verfication

$ xxd -g 1 ill.c
0000000: 6d 61 69 6e 3d 36 3b                             main=6;
$ tcc ill.c
$ ./a.out
Illegal instruction

C (GCC), 13 bytes

const main=6;

Test it with rextester.

How it works

Without the const modifier, the generated assembly looks like this.

    .globl  main
    .data
main:
    .long   6
    .section    .note.GNU-stack,"",@progbits

GCC's linker treats the last line as a hint that the generated object does not require an executable stack. Since main is explicitly placed in a data section, the opcode it contains ins't executable, so the program terminates will SIGSEGV (segmentation fault).

Removing either the second or the last line will make the generated executable work as intended. The last line could be ignored with the compiler flag -z execstack, but this costs 13 bytes.

A shorter alternative is to declare main with the const modifier, resulting in the following assembly.

        .globl  main
        .section    .rodata
main:
        .long   6
        .section    .note.GNU-stack,"",@progbits

This works without any compiler flags. Note that main no resides main=6; would write the defined "function" in data, but the const modifier makes GCC write it in rodata instead, which (at least on my platform) is allowed to contain code.

Verification

$ gcc --version
gcc (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064]
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ xxd -g 1 ill.c
0000000: 63 6f 6e 73 74 20 6d 61 69 6e 3d 36 3b           const main=6;
$ gcc ill.c
$ ./a.out
Illegal instruction
share|improve this answer
    
Love the complete avoidance of even using a function :) How does this work in C terms, though? Does the compiler see that main is a 6 and try to call it (which I guess would make it give up and try the instruction)? – Jack Dobson 12 hours ago
4  
@JackDobson it's undefined behavior, so it doesn't work in terms of C; you're at the compiler's mercy. Clang even has a warning for this for some reason: "variable named 'main' with external linkage has undefined behavior". – Bobby Sacamano 11 hours ago
    
If you compile with -z execstack, it'll give you an illegal instruction error in gcc and clang, without the const qualifier. I guess that must just set the program's non-heap data segment to text. – Bobby Sacamano 11 hours ago
    
@BobbySacamano Yes, precisely. I was just writing up what GCC does internally. – Dennis 11 hours ago
    
@JackDobson I've added an explanation. – Dennis 11 hours ago

C (x86_64), 11, 30, 34, or 34+15 = 49 bytes

main[]="/";
c=6;main(){((void(*)())&c)();}
main(){int c=6;((void(*)())&c)();}

I've submitted a couple of solutions that use library functions to throw SIGILL via various means, but arguably that's cheating, in that the library function solves the problem. Here's a range of solutions that use no library functions, and make varying assumptions about where the operating system is willing to let you execute non-executable code. (The constants here are chosen for x86_64, but you could change them to get working solutions for most other processors that have illegal instructions.)

06 is the lowest-numbered byte of machine code that does not correspond to a defined instruction on an x86_64 processor. So all we have to do is execute it. (Alternatively, 2F is also undefined, and corresponds to a single printable ASCII character.) Neither of these are guaranteed to always be undefined, but they aren't defined as of today.

The first program here executes 2F from the read-only data segment. Most linkers aren't capable of producing a working jump from .text to .rodata (or their OS's equivalent) as it's not something that would ever be useful in a correctly segmented program; I haven't found an operating system on which this works yet. You'd also have to allow for the fact that many compilers want the string in question to be a wide string, which would require an additional L; I'm assuming that any operating system that this works on has a fairly outdated view of things, and thus is building for a pre-C94 standard by default. It's possible that there's nowhere this program works, but it's also possible that there's somewhere this program works, and thus I'm listing it in this collection of more-dubious-to-less-dubious potential answers. (After I posted this answer, Dennis also mentioned the possibility main[]={6} in chat, which is the same length, and which doesn't run into problems with character width, and even hinted at the potential for main=6; I can't reasonably claim these answers as mine, as I didn't think of them myself.)

The second program here executes 06 from the read-write data segment. On most operating systems this will cause a segmentation fault, because writable data segments are considered to be a bad design flaw that makes exploits likely. This hasn't always been the case, though, so it probably works on a sufficiently old version of Linux, but I can't easily test it.

The third program executes 06 from the stack. Again, this causes a segmentation fault nowadays, because the stack is normally classified as nonwritable for security reasons. The linker documentation I've seen heavily implies that it used to be legal to execute from the stack (unlike the preceding two cases, doing so is occasionally useful), so although I can't test it, I'm pretty sure there's some version of Linux (and probably other operating systems) on which this works.

Finally, if you give -Wl,-z,execstack (15 byte penalty) to gcc (if using GNU ld as part of the backend), it will explicitly turn off executable stack protection, allowing the third program to work and give an illegal operation signal as expected. I have tested and verified this 49-byte version to work. (Dennis mentions in chat that this option apparently works with main=6, which would give a score of 6+15. I'm pretty surprised that this works, given that the 6 blatantly isn't on the stack; the link option apparently does more than its name suggests.)

share|improve this answer
    
On x86-64/linux with gcc6 in its default (lenient) mode, const main=6; works, as do several variations. This linker (which I suspect is also your linker) is capable of generating a jump from .text to .rodata; the problem you were having is that, without const, you're jumping into the writable data segment (.data), which is not executable on modern hardware. It would have worked on older x86, where the memory protection hardware could not mark pages as readable-but-not-executable. – zwol 12 hours ago
    
Note that even in C89, main is required to be a function (§5.1.2.2.1) -- I don't know why gcc considers declaring main as a data object to only deserve a warning, and only with -pedantic on the command line. Someone back in the early 1990s perhaps thought that nobody would do that by accident, but it's not like it's a useful thing to do on purpose except for this sort of game. – zwol 11 hours ago
    
... Rereading again, it seems you expected main[]="/" to jump to the read-only data segment, because string literals go in rodata. You've been caught out by the difference between char *foo = "..." and char foo[] = "...". char *foo = "..." is syntactic sugar for const char __inaccessible1[] = "..."; char *foo = (char *)&__inaccessible1[0];, so the string literal does go in rodata, and foo is a separate, writable global variable that points to it. With char foo[] = "...", however, the entire array goes in the writable data segment. – zwol 11 hours ago

GNU C, 25 bytes

main(){__builtin_trap();}

GNU C (a specific dialect of C with extensions) contains an instruction to crash the program intentionally. The exact implementation varies from version to version, but often the developers make an attempt to implement the crash as cheaply as possible, which normally involves the use of an illegal instruction.

The specific version I used to test is gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0; however, this program causes a SIGILL on a fairly wide range of platfoms, and thus is fairly portable. Additionally, it does it via actually executing an illegal instruction. Here's the assembly code that the above compiles into with default optimization settings:

main:
    pushq %rbp
    movq %rsp, %rbp
    ud2

ud2 is an instruction that Intel guarantees will always remain undefined.

share|improve this answer
1  
−6: main(){asm("ud2");} – wchargin 12 hours ago
    
Also, I don't know how we count bytes for raw assembly, but 00 00 0f 0b is the machine language for ud2 – wchargin 12 hours ago

Microsoft C (Visual Studio 2005 onwards), 16 bytes

main(){__ud2();}

I can't easily test this, but according to the documentation it should produce an illegal instruction by intentionally trying to execute a kernel-only instruction from a user-mode program. (Note that because the illegal instruction crashes the program, we don't have to try to return from main, meaning that this K&R-style main function is valid. Visual Studio never having moved on from C89 is normally a bad thing, but it came in useful here.)

share|improve this answer

Perl, 9 bytes

kill+4,$$

Simply calls the appropriate library function for signalling a process, and gets the program to signal itself with SIGILL. No actual illegal instructions are involved here, but it produces the appropriate result. (I think this makes the challenge fairly cheap, but if anything's allowed, this is the loophole you'd use…)

share|improve this answer

Ruby, 13 bytes

`kill -4 #$$`

I guess it's safe to assume that we are running this from a *nix shell. The backtick literals runs the given shell command. $$ is the running Ruby process, and the # is for string interpolation.


Without calling the shell directly:

Ruby, 17 bytes

Process.kill 4,$$
share|improve this answer
    
This is not guaranteed to be portable to all POSIX systems; The table here lists n/a in the "portable number" column for SIGILL, unlike 9=SIGKILL and 2=SIGINT across all systems, and a few other numbers which are defined by POSIX. Of course, most of these answers are x86-only, so that's not a problem, just something you should mention. SIGILL=4 on x86 Linux at least, and probably many other systems. edit: the question says SIGILL=4, so I should comment there. – Peter Cordes 2 hours ago

x86 MS-DOS COM file, 2 bytes

0F 0B

From the documentation:

Generates an invalid opcode. This instruction is provided for software testing to explicitly generate an invalid opcode.

Which is pretty self-explanatory. Save as a .com file, and run in any DOS emulator.

share|improve this answer
    
Does it really generate anything like exception or signal? One emulator I have access to just hangs; which one did you use? – anatolyg 9 hours ago
    
It technically does cause the processor to generate an illegal instruction exception. However, DOS has very limited memory protection and exception handling capabilities, and I wouldn't be surprised if it just lead to undefined behavior / OS crash. The OP didn't say the kernel had to catch the error and print "Illegal Instruction" to the console. – maservant 7 hours ago
    
I thought of this solution, but I don't believe it's valid. The question requires an illegal instruction signal, not just an illegal instruction processor trap, so the aim was to find an operating system which would actually generate a signal in response to the #UD trap. (Also, I decided to actually test it, and it appeared to throw my DOS emulator into an infinite loop.) – ais523 6 hours ago
    
I tested this on actual MS-DOS on an AMD K6-II. Running it just hangs the system, both with and without EMM386 running. (EMM386 traps some errors and halts the system with a message, so it was worth testing to see if it made a difference.) – Mark 2 hours ago

Swift, 5 bytes

[][0]

Access index 0 of an empty array. This calls fatalError(), which prints an error message and crashes with a SIGILL. You can try it here.

share|improve this answer

C (32-bit Windows), 34 bytes

f(i){(&i)[-1]-=9;}main(){f(2831);}

This only works if compiling without optimizations (else, the illegal code in the f function is "optimized out").

Disassembly of the main function looks like this:

68 0f 0b 00 00    push 0b0f
e8 a1 d3 ff ff    call _f
...

We can see that it uses a push instruction with a literal value 0b0f (little-endian, so its bytes are swapped). The call instruction pushes a return address (of the ... instruction), which is situated on the stack near the parameter of the function. By using a [-1] displacement, the function overrides the return address so it points 9 bytes earlier, where the bytes 0f 0b are.

These bytes cause an "undefined instruction" exception, as designed.

share|improve this answer

AutoIt, 93 bytes

Using flatassembler inline assembly:

#include<AssembleIt.au3>
Func W()
_("use32")
_("ud2")
_("ret")
EndFunc
_AssembleIt("int","W")

When run in SciTE interactive mode, it'll crash immediately. The Windows debugger should popup for a fraction of a second. The console output will be something like this:

--> Press Ctrl+Alt+Break to Restart or Ctrl+Break to Stop
0x0F0BC3
!>14:27:09 AutoIt3.exe ended.rc:-1073741795

Where -1073741795 is the undefined error code thrown by the WinAPI. This can be any negative number.

Similar using my own assembler LASM:

#include<LASM.au3>
$_=LASM_ASMToMemory("ud2"&@CRLF&"ret 16")
LASM_CallMemory($_,0,0,0,0)
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.