Guide To Distributed Code Jam

What's the least I need to know to participate?

There's a lot of important information in the official Terms and the FAQ, and if you're competing, you should read them. However, here we provide the main pieces of information on how to actually compete. We assume you know what Google Code Jam is about (since you need to do well in Google Code Jam round 1 to be qualified for Distributed Code Jam), so here's the bare bones for the distributed contest:

In Distributed Code Jam, you submit code (and not outputs), and we compile and run your code.
Your code will run on multiple computers (nodes). Use the message library to communicate between nodes. You can send at most 1000 messages from each node, and the total size of all messages sent from a node cannot exceed 8MB, unless specified otherwise in the problem statement. Regardless of the specified total limit, each individual message cannot exceed 8MB. Also, you cannot send a message from node A to node B if there is already a message from A to B that has not been read.
You should not read from standard input. The input will be provided through a library function specified for each problem. Each node will get the same input. You need to include/import the library into your solution, using:
- #include "problem_name.h" in C++
- it gets imported automatically in Java
Exactly one node has to print the correct answer to standard output. The other nodes should print nothing.
When you submit a solution to the small input, you will learn whether it was accepted after two minutes. You cannot submit other solutions to that problem in the meantime, so take care!
All nodes have to finish (with the exit code 0) within the time limit given in the problem.

As an example, we provide a sample solution for a very simple problem – output the sum of all the numbers given in the input – in C++ and Java.

A few more details about the contest, please?

The "message" library

Your program runs on multiple computers (nodes). The message library provides the functions needed for the nodes to identify themselves and communicate. You can find the interface for this library here: C++, Java and you can find usage examples in the "Sum all numbers" example above.

The number of nodes on which your program runs is given in the problem statement. However, you don't need to copy it from there, the NumberOfNodes function will also give you this number. Each node is identified by an "ID", between 0 and NumberOfNodes() - 1, inclusive. The MyNodeId() function will return the ID of a node.

The nodes use messages to communicate. A message is a sequence of values. A node constructs a message using the PutChar, PutInt and PutLL methods (to append a byte, a 32-bit integer and a 64-bit integer, respectively), and then sends it using the Send method. The target of the message can call Receive, and then can retrieve the values using GetChar, GetInt and GetLL. The order in which the values are retrieved has to be the same as the order in which they were inserted; otherwise, the behavior is undefined.

The Send method is non-blocking. That is, when you call Send, the sender does not wait for the receiver to receive the message; it just fires the message off and proceeds. The Receive method, on the other hand, is blocking - that is, when a node calls Receive, it will not return until it actually receives a message. In particular, if the sender never sends anything, the receiver will hang forever, and your solution will be judged as timed out.

Internally, the "message" library has buffers, one buffer for each possible source/target. The buffers for messages on which you called Receive, but didn't yet read the contents; and for the messages which you constructed with Put*, but didn't yet Send, count against your solution's memory limit.

Your submission results

When you submit a solution and it gets judged, you can view the results of the submission in the "View my submissions" section (the link is in the tab on the left). All your submissions will be listed there. The status for each submission can be one of the following:

In progress — your submission is still being judged. If this persists for significantly over two minutes, notify the organizers through the "Ask a question" form.
Correct — your solution was accepted. Congratulations!
Wrong answer — your solution finished successfully, but the output was not the correct answer.
Compilation error — your solution failed to compile. Make sure you have correctly included the input library of the problem and the message library.
Time limit exceeded — the execution of your solution exceeded the allocated time limit.
Output limit exceeded — your solution exceeded the 1MB output limit on one of the nodes.
Runtime error — for instance segmentation faults or Java exceptions.
Rule violation — your solution tried to call illegal instructions like spawning new threads or creating new files.
- Sometimes, exceeding the memory limit triggers language constructs that the sandboxer sees as evidence of an attempt to break out; this results in a "Rule violation" verdict.
- The "clock" call gets reported as "Rule violation" in C++.
- Some runtime errors in Java (most notably reading outside the bounds of a received message) can be judged as a Rule violation.
- If you encounter this error, you should continue to participate in the round; we are aware that there may be false positives. We will work to eliminate these for the 2017 contest.
Messages count limit exceeded — one of the nodes sent too many messages (remember, the default limit is 1000).
Messages size limit exceeded — one of the nodes tried to send too much data. The limit is a total of 8MB from each node (that is, the total size of all messages sent cannot exceed 8MB), unless stated otherwise in the problem statement. Regardless of the specified total limit, each individual message cannot exceed 8MB. Also, you cannot send a message from node A to node B if there is already a message from A to B that has not been read.

Small and large inputs

Similarly to the main track of Code Jam, Distributed Code Jam has small and large inputs, typically one of each for every problem. The small input is intended to have lower limits or be simpler in some other way.

You can submit solutions to the small input for a problem many times. Each solution you submit will get judged in approximately 2 minutes, and you will only be able to submit a new solution once the previous one is judged. Once a submission to the small input is judged correct, you will get points for this input, and you will get penalty time for every previous incorrect submission, as in the main Code Jam track.

You will still be allowed to submit solutions to the small input after one of your submissions is accepted, although the subsequent submissions will not affect your score or penalty time in any way. You can use this option to test the code you intend to submit for the large input. Note that this option is still subject to the two-minute cooldown period.

For the large input, your solution will get judged at the end of the contest. You can resubmit the large input, and only your last submission will be judged by our judging system. Even if you submit a correct solution, but then submit an incorrect one, you will not get points for the large input, so be careful! Similarly, the penalty time (if your submission is correct) will be calculated from the last submission you send.

As in the main track of Code Jam, the scoreboard will show "optimistic" results for all the large submissions.

How fast is the message transfer?

In the first approximation, around 5ms pass between a message being sent and being received, out of which around 3.7ms is spent "in the network" – that is, not blocking either the sender or the receiver. The details depend on the specifics of the traffic pattern (just as the details of the speed of processing depend on details like memory access patterns).

As examples, we provide a few simple benchmarks:

Technical information

The solutions will be judged on a 64-bit Linux system. On each node, the program will get limited time and memory (the limits will be specified in the problem statement), as well as have limited network bandwidth (up to 1000 sent messages and 8MB total size, unless specified otherwise in the problem statement).

Solutions in C++ are allowed to use STL. Every node is allowed to write up to 1MB to stderr, this will be ignored. If a solution exceeds 1MB of stderr or stdout output, it will be judged incorrect. If a solution writes to stdout on more than one node, it will be also judged incorrect.

The solutions should not:

spawn new threads or processes
execute other programs
use inlined assembler (in C/C++)
use network functions (like socket) – all network communication should go through the message library
open files (in particular, using temporary files is not allowed)
threaten the system security
wait for user interaction, in particular read from stdin

Any of the above can cause the solution to be judged incorrect with a "Rules Violation" verdict, and can result in disqualification if judged as malicious (based on 7.1C of the Terms and Conditions).

The size of the submitted code cannot exceed 100KB, and the size of the compiled code (for compiled languages) cannot exceed 10MB. The compilation time cannot exceed 30 seconds.

The solutions will be compiled as follows:

g++ -std=gnu++0x -O2 -static -lm, using gcc version 4.7.2 for C++
javac, using openjdk 2.2.5 for Java

Local testing tool

You can test your solution locally without submitting the code and receiving penalty time with our local testing tool. Simply download and unpack one of the packages available below. A few warnings:

The tool is provided on an as-is basis, and we can't guarantee it will work as intended on a particular platform.
Under the hood, the tool spawns a separate process for each node you require. If you have too many nodes, or they use too much CPU/memory, running the tool can freeze or crash your machine.
The runtimes with the tool will likely be different than the runtimes on our testing platform. The tool is intended mainly to allow you to estimate the correctness of your solutions, not their performance.

Usage is quite straightforward:

alias dcj='{unpack_directory}/dcj.sh'
dcj build --source=solution.c
dcj run --executable=./solution --nodes=100 --output=all

You can also run dcj test, which bundles the build and run commands together.

The packages are available here:

In order to use the tool, you first need to upack the files in a directory convenient for you. You may read the description of how to run the tool by running the following command: python path_of_directory/dcj.py --help.

In case you encounter any problems, or we have not published a package built for your operating system, we encourage you to read the README file from the package. You can build the tools yourself, as well as modifying the configuration (config.json) file.

To make Java work on Windows, you will have to modify paths and commands in build.py. We are working on a version which would work for common Windows configurations. One option is to use MinGW instead of cygwin. Make sure to add MinGW's environment path before cygwin's. Then, "python dcj.py test --source sandwich.cpp --nodes 10" works (because we can't use shell scripts in MinGW).

You should provide the input to your solution, in a similar format to what you can download from the webpage. More precisely:

For C++:
- You should have a problem_name.h (like the downloadable example inputs) available in the same directory as your solution file that contains the definitions of the input methods.
- Run dcj test --source your_source.c --nodes 10 to test your program.
For Java:
- Your solution file should be named Main.java.
- Put the problem_name.java file (like the downloadable example inputs) containing the class defining the input methods in the same directory as your solution.
- Run dcj test --source Main.java --library problem_name.java --nodes 10 to test your program.