Spring Cloud Data Flow

Spring Cloud Data Flow is the cloud-native redesign of Spring XD – a project that aimed to simplify the development of Big Data applications. The stream and batch modules from Spring XD are refactored as Spring Boot based stream and task/batch microservice applications respectively. These applications are now autonomous deployment units and they can "natively" run in modern runtimes such as Cloud Foundry, Apache YARN, Apache Mesos, and Kubernetes.

Spring Cloud Data Flow offers a collection of patterns and best practices for microservices-based distributed streaming and task/batch data pipelines.

Features

Develop using DSL, REST-APIs, Dashboard, and the drag-and-drop GUI - Flo
Create, unit-test, troubleshoot and manage microservice applications in isolation
Build data pipelines rapidly using the out-of-the-box stream and task/batch applications
Consume microservice applications as maven or docker artifacts
Scale data pipelines without interrupting data flows
Orchestrate data-centric applications on a variety of modern runtime platforms including Cloud Foundry, Apache YARN, Apache Mesos, and Kubernetes
Take advantage of metrics, health checks, and the remote management of each microservice application

Quick Start

Step 1 - Download the Spring Cloud Data Flow Local Server and Shell apps:

wget http://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-server-local/1.1.2.RELEASE/spring-cloud-dataflow-server-local-1.1.2.RELEASE.jar

wget http://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-shell/1.1.2.RELEASE/spring-cloud-dataflow-shell-1.1.2.RELEASE.jar

Step 2 - Download and Start Kafka 0.10 [used as: messaging middleware]

Step 3 - Launch the Data Flow Local Server java -jar spring-cloud-dataflow-server-local-1.1.2.RELEASE.jar

Step 4 - Launch Shell on the same machine where the Data Flow Local Server is runnign java -jar spring-cloud-dataflow-shell-1.1.2.RELEASE.jar

Step 5 - Import all the out-of-the-box application coordinates in bulk

dataflow:>app import --uri http://bit.ly/Avogadro-GA-stream-applications-kafka-10-maven

Step 6 - Create ‘ticktock’ Stream dataflow:>stream create ticktock --definition "time | log" --deploy

You'll notice the following in ‘Local’ Server console.

2016-07-18 22:08:24.777  INFO 73058 --- [nio-9393-exec-9] o.s.c.d.spi.local.LocalAppDeployer       : deploying app ticktock.log instance 0
   Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-dataflow-5011521526937452211/ticktock-1468904904769/ticktock.log
2016-07-18 22:08:25.081  INFO 73058 --- [nio-9393-exec-9] o.s.c.d.spi.local.LocalAppDeployer       : deploying app ticktock.time instance 0
   Logs will be in /var/folders/c3/ctx7_rns6x30tq7rb76wzqwr0000gp/T/spring-cloud-dataflow-5011521526937452211/ticktock-1468904905074/ticktock.time

Step 7 - Verify the ‘ticktocks’: tail -f /var/folders/ ... /ticktock.log/stdout_0.log

Step 8 - Launch Dashboard at: http://localhost:9393/dashboard

Spring Cloud Data Flow Implementations

Server Type	Stable Release	Milestone Release
Local Server	1.1.2.RELEASE[docs]	1.2.0.BUILD-SNAPSHOT[docs]
Cloud Foundry Server	1.1.0.RELEASE[docs]	1.2.0.BUILD-SNAPSHOT[docs]
Apache YARN Server	1.1.0.RELEASE[docs]	1.1.1.BUILD-SNAPSHOT[docs]
Kubernetes Server	1.1.1.RELEASE[docs]	1.1.2.BUILD-SNAPSHOT[docs]
Apache Mesos Server	1.0.0.RELEASE[docs]	1.1.0.BUILD-SNAPSHOT[docs]

Community Spring Cloud Data Flow Implementations

Spring Cloud Data Flow for HashiCorp Nomad

Spring Cloud Data Flow for Red Hat OpenShift

Building Blocks of Spring Cloud Data Flow

Spring Cloud Data Flow builds upon several projects and the top-level building blocks of the ecosystem are listed in the following visual representation. Each project represents a core capability and they evolve in isolation, with separate release cadences - follow the links to find more details about each project.

Spring Cloud Data Flow Local Server

Spring Cloud Data Flow Cloud Foundry Server

Spring Cloud Data Flow Apache Yarn Server

Spring Cloud Data Flow Kubernetes Server

Spring Cloud Data Flow Apache Mesos Server