STATUS
Overview
OpenStack is a free and open software suite for providing Infrastructure as a Service (
IaaS) both for private clouds like we're implementing and large public clouds like Rackspace. We are currently using "Nova" compute services, "Keystone" identity service, "Horizon" web dashboard, "Cinder" volume service,"Glance" image management,"Neutron" networking service, and "Heat" orchestration service. We are currently using the "Juno" release (aka 2014.2 released October 2014) 14.04LTS "Trusty" with KVM as the virtualization layer.
NOTE: OpenStackKiloUgrade scheduled for Monday August 24, 2015.
As of December 2014 we have 76 physical nodes, with 1,000 physical cores (presenting as 2000 due to hyperthreading) and 6.75T RAM.
-- JonProulx - 12 Dec 2014
Contents
Caveats
Join the Openstack-Users email list! this is where announcements of potentially disruptive changes will go, and a place where you can ask questions and suggest improvements to the system.
By default all VMs are ephemeral and their state is deleted on shutdown please read this documentation to find out how to snapshot your instances and how to create and attach persistent storage if you need to preserve local state.
By default all network access to instances is blocked,
the Security Groups section below describes how to open up access to your systems.
Except for UPS instance types, the cloud systems do not and
will not have battery or generator backup, if we lose power that's too
bad. Individual physical nodes are single points of failure for the
virtual instances running on them, at this point it is left as an
exercise for the user to implement HA Clustering across multiple VMs
or to monitor their services and restart them if they fail. If your
system is a single VM and is not fault tolerant be sure to use one of
the ups instance types.
Getting a CSAIL OpenStack Account
You can register for an OpenStack account at
https://cloud-registration.csail.mit.edu (requires
CSAIL Web Certificate). After you have an
existing account this same link can be used to reset your password if
you forget it.
On registration you will be given a personal 'usersandbox' project with a
small quota to play around with. If your group has an existing
OpenStack project (and it allows self registration which is the
default) then you can also elect to join that project during the
registaration process.
Users can belong to multiple projects even though our web wrapper
only allows one. If you need to be in multiple groups
email help@csail.mit.edu, for existing groups have a current member or
sponsoring PI make the request for you so we know it's authorized.
If your group doesn't have an OpenStack project yet email
help@csail.mit.edu with the name you'd like, the name of a CSAIL PI
(probably your supervisor) , and ideally some description of how you
plan to use OpenStack so we can make a reasonable guess at an
initial quota. All work in OpenStack is done in a "project" (or
tenant depending on which documents you're reading but they are the
same thing), quotas are assigned by group, operating system images
are shared within groups, etc.
When a new user account is created through the web interface all
existing members get an email notifying them. This postgating is the
only insurance against random people joining your project (within the
lab, not the whole internet), so do keep an eye on it.
While most projects are comfortable with this rather loose gating,
since it allows self sign up. Upon request we can create projects
that do not allow self sign up, this is more secure but does mean all
new users will need be maunually created by TIG.
-- JonProulx - 27 Mar 2015
Who pays for this service?
As of December 2015, individual research groups are not charged for their usage. Some of the hardware that the service runs on is donated; all the remaining expenses come out of the CSAIL overhead budget. In the absence of a chargeback system, the quota system encourages users to consult TIG before placing large demands on the service and helps to avoid gross waste of resources. That said, if the OpenStack service meets your needs, you should not hesitate to use it rather than buying hardware just for yourself.
Quick Start
- Walk through launching an Ephemeral VM -- in progress 19 Mar 2014
- Walk through launching a Persistent VM -- in progress 19 Mar 2014
Logging into your instance
Once you've booted an instance as described above and the appropriate Security Group Rules have been applied, it can be accessed remotely via SSH. With the Ubuntu and CSAIL Ubuntu images we provide, you can log into the instance as the 'ubuntu' user, provided that you've associated an SSH key with the instance (see OpenStackSSHKey).
Logging in as the 'ubuntu' user is the only way to log into instances newly booted from the standard Ubuntu cloud image. You can access CSAIL Ubuntu images either via your SSH key and the 'ubuntu' user, or with your CSAIL Kerberos account. More information on the base images provided by TIG is available below.
Lastly, If you happen to have launched your instance using the Heat Orchestration service, you may need to log into your instance as the 'ec2_user' rather than the 'ubuntu' user.
Got Root?
For privileged access to Ubuntu cloud images, and the CSAIL images
based on them, you must provide an ssh public key when you boot your
instances and connect via ssh as the user "ubuntu". This user is
configured with passwordless sudo access so you can run commands with
root privilege by prefixing them with 'sudo' for example:
ubuntu@my-vm:~$ sudo apt-get install mit-scheme
Details of how to
setup and use public keys with OpenStack are on the OpenStackSSHKey
wiki page.
Basic tools
The easiest way to interact with OpenStack is through the web
dashboard at https://horizon.csail.mit.edu this provides most common
features, though does not expose all functionality. To get advanced
functions it is sometimes necessary to use the command line or write
your own code to drive the API.
Command line access
Certain advanced features, such as affinity
and anti-affinity groups, to keep instances together for speed or
apart for fault tolerance, are only available through the CLI or
direct API calls.
A public login server named ubuntu-login.csail.mit.edu running CSAIL/Ubuntu 14.04 is available with all the latest OpenStack related command line tools installed. This system is running on OpenStack and should be considered Beta in terms of stability, if you plan on regularly using the CLI it's best to install them on a group server or your workstation.
http://horizon.csail.mit.edu provides a basic WebUI to our OpenStack cluster and, under "Access and Security" -> "API Access", links to download credentials for use with command line tools.
OpenStack provides a native API client called nova which is available for Ubuntu 12.04 and newer systems in the python-novaclient package.
To install the command line tools on Mac OSX first install pip sudo easy_install pip the nova cli sudo pip install python-novaclient.
Full CLI docs at http://docs.openstack.org/cli/quick-start/content/index.html we use "nova", "glance", "quantum" (in process of being renamed to "neutron"), and "keystone" projects, but "nova" is the one you want to read about first.
It is also possible to use AWS EC2 compatible tools like the euca2ools package to interact with OpenStack. This covers most common actions but only the intersection of OpenStack and EC2 features. This is convenient if you are using both our OpenStack and EC2, but if you are only using OpenStack the nova CLI is the the most feature complete tool.
Using templated orchestration
The Heat Orchestration system provides a
templated method for dealing with more complex sets of resources.
this is documented separately (and still a bit sparsely) on the
OpenStackTemplates page. This is analogous to Amazon's
"CloudFormation" product an provides some compatibility with the
Amazon template language.
API access
Complete API reference is available at http://docs.openstack.org/
Your favorite programming language probably has a library or module for talking to OpenStack. Python users should look at Boto which uses the Amazon EC2 compatibility API to to talk to OpenStack. Ruby hackers can look at the Fog gem which supports a variety of cloud APIs. If you do hack around with these or other programmatic interfaces it would be great to create a wiki page describing your setup and experiences and link to it from here...
Stock Images
TIG provides a number of stock images. A full current listing can be found in the WebUI http://horizon.csail.mit.edu/ under "Images & Snapshots". Currently all stock images are Linux based as that is what people seem to want, it is possible to run other operating systems such as FreeBSD or Windows (it is against the terms of licensing to run MacOS in a non-Apple virtual host).
While the versions may change from time to time we generally provide the following classes of image:
- CSAIL-Ubuntu-<version> - A 64bit ubuntu cloud image with CSAIL accounts, AFS, and configuration management
- CSAIL-Ubuntu-<version> + autofs - As above but with access to CSAIL NFS storage via the auto mounter
- Ubuntu-<version>-amd64 - 64bit ubuntu cloud image as distributed by canonical from http://cloud-images.ubuntu.com/
- Ubuntu-<version>-i386 - 32bit version of the above
- Windows2012 - Windows Server 2012 Image, not yet public but available on request for testing. see link for details.
Custom Images
Note that you will get best performance from "raw"
images. We are using Ceph as our
storage back end, this allows VMs to use copy on write clones of the
stored images for instant start up and space efficiency, but only if
the image or snapshot is in "raw" format.
Image snapshots
The easiest thing to do to create a custom image is to boot one of the provided generic images, make the changes you need, then take a snapshot with either the web or cli tools (cli instructions). You can then use that snapshot as the basis for launching new instances.
Note that creating a snapshot suspends the running
instance, this is required to create a consistent snapshot
image. How long the instance is unreachable is directly proportional
to the size of the root volume. It takes much longer to copy 32G than
2G. For this reason it is strongly recommended that you use the
for.snapshot instance type when working on new images in this
way. While usually a bit faster, you should plan on about 1-2 minutes of down time per gigabyte.
Custom images from scratch
For documentation on image creation from scratch see http://docs.openstack.org/trunk/openstack-image/content/
Essentially you make a KVM based VM on your local system and then
import the virtual disk using the glance cli once you have it set to
your liking. Remember to use "raw" type virtual disk rather than the
default "qcow2" type.
Converting existing images to raw
As described above storing your images in 'raw" format is currently
the best choice. Prior to about Oct 6th 2014 we were storing images
differently and recommending "qcow2", though even if you up loaded
images prior to Oct 6th in 'raw' format the will be in the old image
store and will need to be downloaded and re-uploaded to move to the
new Ceph storage (just skip the conversion step below).
The easiest thing is to just launch a new VM (to be sure it is Ceph
backed) and take a Snapshot. This will put
everything in the right place and format.
There no particular advantage to manual conversion but for the curious
this is the long form process:
- Configure your CLI environment
- Download the existing image:
glance image-download --progress --file <image-name>.qcow2 image-name
- Convert using the
qemu-img command from qemu-utils package:
qemu-img convert -O raw <image-name>.qcow2 <image-name>.raw
- Upload new image:
glance image-create --disk-format raw --container-format bare \
--progress --name <image-name> --file <image-name>.raw
- If you are using a shared system like ubuntu-login.csail.mit.edu be sure you
rm <image-name>.raw
- Optional: go to https://horizon.csail.mit.edu and remove or rename the old qcow2 version of the image
Instance Types
OpenStack requires the use of predefined "Instance Types" also referred to as "Flavors". These define the virtual hardware including number of cpus, memory size, size of the root disk and optionally additional ephemeral disk space. We've defined instance types using the following scheme:
- s1.<N>core - N cores, N x 512m RAM, 10G root disk
- m1.<N>core - N cores, N x 1024m RAM, 16G root disk
- lg.<N>core - N cores, N x 2048m RAM, 32G root disk
- xl.<N>core - N cores, N x 4096m RAM, 64G root disk
These standard types run on cluster nodes without redundant components
and without UPS power and are best suited to tasks where the uptime of
an individual component is not critical such as compute nodes or
worker nodes behind a load balancer (see LBaaS section).
Note: For a fixed total amount of resources, a greater number of smaller instances (say a few cores and a few GB of RAM each) will be easier to pack onto the available host machines than a smaller number of larger instances (tens of cores or tens of GB of RAM).
There's one special flavor for instances launched specifically for
creating image snapshots:
- for.snapshot - 2 cores, 4096M RAM, 0G root disk
Obviously it's not really 0G, that is magic for whatever the minimum
size of the base image is. This is ideal for making snapshots as you
want them to be as small as possible.
We also provide a selection of instance types that run on a (much)
smaller pool or less performant but more redundant hardware (including
UPS and Generator backed power, redundant power supplies and mirrored
hard drives). This is the same configuration used by TIG for hosting
our virtualized services on OpenStack such as
http://people.csail.mit.edu. These instance types all have 32G root
disks with the number of cores (c) and gigabytes of
RAM (g) encoded in the name:
- ups.1c1g
- ups.1c2g
- ups.2c2g
- ups.2c4g
- ups.4c4g
- ups.4c8g
It is possible to create custom instance types for specific projects, so your project may have extra types available, ask someone in your group why that is and what they are for. If you need a custom size created email help@csail.mit.edu.
Network
Currently the is only one publicly available network defined in our OpenStack world called "inet". This network puts your instance directly on a publicly accessible IPv4 network. Unlike our previous configuration which used private IP space and NAT, the IP your operating systems sees is the same as it's public IP.
Security Groups
By default all network access to instances is blocked
More specifically outbound ("egress") and related inbound ("ingress")
traffic is allowed, all other traffic is blocked. This means your
instance will be able to run package updates, but you won't be able to
ping it or ssh to it.
Each project can edit its own default security group or create additional security groups to open up network access. Forgetting to assign the correct security group is one of the most common mistakes when starting instances and results in them being unreachable. It is possible to add or remove security groups from running instances using the "Actions" drop down menu on the "Instances" page of the web interface or the nova add-secgroup command.
To define rules:
In the web interface select the Access & Security tab, there should be
a button on the right-hand side of the interface called "+ Create
Security Group".
Click that, and give the group a name and description. Once
it's in the list of security groups, click the "Manage Rules"
button.
By default, groups that you create will automatically allow all
outgoing (Egress) traffic to all ports and IPs. To set up a similar
incoming (Ingress) rule, click the "+ Add Rule" button in the rules
management interface.
There are a number of predefined rules available in the "Rules" drop
down. Most likely you will find what you need there either a specific
application such as "ssh" or open all of a given protocol "all tcp",
"all udp", or "all icmp" for example. It is possible to add multiple
rules in a group and apply multiple groups to an instance.
It is also possible to add specific custom ports or port ranges.
Since all egress traffic is allowed by default in a new rule set you
will most likely want to set "Ingress" rules.
Note that all rules open new ports so as you stack rules togather they
always become more premissive. For example if you want some instance
to have more restrictions than the 'default' rule; you will need to
remove it from the list of security groups for that instance before
applying your more restrictive rules.
Using Fixed IP Addresses
Instances are dynamically assigned publicly routable IP addresses and this is sufficient in many cases.
If you need a fixed IP address one may be reserved using WebDNS and selecting "OpenStack public network (128.52.128.0/18)" as the subnet. DO NOT override the address WebDNS assigns you — this will be your new permanent public IP address (once you click the "Commit changes" button). Take a note of your new IP address.
Once your fixed IP is registered in WebDNS, you should assign it to your instance in the WebUI. When launching an instance, fill in your fixed IP in the eth0 Fixed IP field under the Details tab. Currently the web interface for this only works if you are booting a single instance with a single connected network. That is by far the most common case but if you need to specify a fixed IP and require multiple networks you will need to use the CLI.
Load Balancer as a Service (LBaaS)
LBaaS is a relatively new feature as such it is not well known or tested in our environment yet, feel free to try it out and if you do please share your experiences and expand this documentation.
To use LBaaS you will need a Fixed IP for the load balancer pool's virtual IP this will be the public address of your application (well unless you put a reverse proxy in front of the load balancer, but let's not get that crazy just yet...) The instances that are balanced behind it can have dynamic IPs (though dynamic or fixed you need to manually add them to the pool) and may be dynamically added and removed from the pool.
The basic steps to creating a load balancer pool are:
- Get IP and DNS as mentioned above
- Create the pool
- Add servers to the pool
- Create a healthmonitor & associate it with the pool to make sure the servers are up
- Create a virtual IP (vip) and associate it with the pool using the address obtained in step 1
http://docs.openstack.org/admin-guide-cloud/content/lbaas_workflow.html provides a very bare bones walk through of the CLI work flow. The Dashboard also provides an interface to setting up loadbalancer pools which should be better documented here...but knowing the steps above you can probably find them.
Defining your own networks
Quantum/Neutron makes it easy for us to allow groups to create their own private networks, so we did. There is no particular use case in mind so if you have one let us know and we can help configure the best solution. Here is the current state of things if you want to play with this part.
Right now you can definitely define your own networks that are private to your project. These are implemented as GRE tunnels, which you needn't think about except that it means they are isolated and you don't have to worry about coordinating IP addressing with other projects. If you have defined your own networks it is possible to add any or all of them to a given instance.
You should be able to use OpenStack's built in DHCP service to serve dynamic addresses from a ranch of your own choosing to hosts on that network. This is the intended case though not well tested
You might be able to use OpenStack's "router" implementation to build routers between multiple private networks. I don't know why you would want to and this is not at all tested but the documentation suggests it is possible.
You cannot use OpenStack's "router" implementation to provide NAT and connection to the public IP network, if you want this functionality you'd need to build an instance with multiple interfaces and configure that system manually to do routing and NAT. If you want to do this we might be able to configure things so it works but there are a number of performance bottlenecks that make it seem like a bad idea.
Understanding Storage
What stays and what goes away
The root disk and ephemeral disk of an instance are ephemeral, when you shut down they go away anything you want to save needs to be written to either a persistent volume or to network Storage.
Persistent volumes, as currently implemented, are iSCSI volumes on Ceph RBD storage (though they appear as local storage when attached to VMs). While highly redundant this storage is not backed up. We only have 10T of storage in this pool for everyone to share so it is best for relatively small volumes such as bootable operating system images for persistent (rather then ephemeral) virtual machines. These volumes also can only be connected to one instance at a time, they are not shared storage. So if you need to share files among instances or have larger data needs TIG hosted NFS Storage which is backed up and support concurrent access from multiple clients is the better choice.
Volume types
ALWAYS USE THE production VOLUME TYPE, this is the default so you can also just skip specifying anything at all.
The production volume type is now backed by Ceph RBD storage as mentioned above.
This is a recent change and most existing volumes are on the end_of_life volume type. These are store on an EqualLogic SAN. As the name suggest this equipment is end of life and we will stop paying support on it July 2015. NO NEW VOLUMES should be created here.
Right now the only way to move your bits from end_of_life volume type is to create a new production volume and copy your bits over from the old one, see MovingVolumeTypes for details.
Occasionally we may also have an experimental volume type, don't use this one either except on advice from TIG. this is generally for experiments and may involve wiping all experimental volumes without notice. Though sometimes it is being staged for promotion to production so if you did create experimental volumes on the advice of TIG in early 2015, these have all been promoted to production along with the storage
A note on snapshots
OpenStack allows you to snapshot your volumes, but no automated snapshots are taken. If you are about to make a potentially disruptive change it's probably a good idea to take a snapshot. If you want periodic snapshots that's also possible but you will need to script it yourself. Snapshots are copy on write so they don't consume any space until changes are made to the base volume. A 50% additional space allotment is automatically made for snapshots so a 20G volume will actually reserve 30G of space on the storage server. Snapshots are not backups when your snapshot space usage exceeds this reserve the system automatically frees space by deleting the oldest snapshot(s), if you need backed up storage please look at TIG hosted network Storage options.
Preview/Testing Features
As we roll out new features on the production cloud they will first be
available through the alternative Web UI at
https://horizon-test.csail.mit.edu (using your usual credentials). Programmatic access via API and CLI access via standard clients for preview services are also available (provided the services are actually up).
This system is also used for UI testing so it may not always be
available.
It exposes all the standard features of the production interface plus
any addition feature we are testing. While these run on the
production cloud and have access to production resources they may
change or go away unexpectedly.
Do not rely on preview features for important work. If features you
see here are important to you do let us know <[email protected]> and provide feed back on
how they do/don't work for you as currently configured.
current testing features or those that are coming very soon include:
Data Processing (Hadoop / Spark) as a Service
The Sahara project
provides a simple means to provision a data-intensive application
cluster (Hadoop or Spark) on top of OpenStack.
It is technically 'integrated' in upsteam OpenStack but still very
new with no Ubuntu packaging and little upstream documentation.
There has been some local demand for this type of service & TIG is
currently looking for testers as we don't have enough internal
expertise in Hadoop or Spark to really do meaningful functional
testing.
Available upstream engines include:
- Vanilla Plugin - deploys Vanilla Apache Hadoop
- Hortonworks Data Platform Plugin - deploys Hortonworks Data Platform
- Spark Plugin - deploys Apache Spark with Cloudera HDFS
- MapR Distribution Plugin - deploys MapR plugin with MapR File System
- Cloudera Plugin - deploys Cloudera Hadoop
Currently implementing:
- Hortonworks Data Platform Plugin
- Spark Plugin
If you want outhers speak up & volunteer to test...
Presentation Materials
- CSAIL OpenStack Beta Announcement (Video) Nov 14, 2012.
- CSAIL-Openstack-Beta.svg: Slides from above presentation. These will render in your browser use left and right arrows to navigate with animated transitions, use up and down arrows if the animations make you sea sick...not sure how helpful they are without the commentary but people have asked for them
Upstream Documentation
WTFM
Please help Write The Fine Manual ... this is a wiki go for it add use cases, command line example, fix my spelling and grammar, have fun!
-- JonProulx - 28 Aug 2013
Topic revision: 17 May 2016, adamyala