IPython Cluster Plugin

Note

These docs are for IPython 0.13+ which is installed in the latest StarCluster 12.04 Ubuntu-based AMIs. See starcluster listpublic for a list of available AMIs.

To configure your cluster as an interactive IPython cluster you must first define the ipcluster plugin in your config file:

[plugin ipcluster]
setup_class = starcluster.plugins.ipcluster.IPCluster

If you’d like to use the new IPython web notebook (highly recommended!) you’ll also want to add the following settings:

[plugin ipcluster]
setup_class = starcluster.plugins.ipcluster.IPCluster
enable_notebook = True
notebook_directory = notebooks
# set a password for the notebook for increased security
notebook_passwd = a-secret-password

After defining the plugin in your config, add the ipcluster plugin to the list of plugins in one of your cluster templates:

[cluster smallcluster]
plugins = ipcluster

Using the IPython Cluster

To use your new IPython cluster log in directly to the master node of the cluster as the CLUSTER_USER and create a parallel client:

$ starcluster sshmaster mycluster -u myuser
$ ipython
[~]> from IPython.parallel import Client
[~]> rc = Client()

Once the client has been started, create a ‘view’ over the entire cluster and begin running parallel tasks. Below is an example of performing a parallel map across all nodes in the cluster:

[~]> view = rc[:]
[~]> results = view.map_async(lambda x: x**30, range(8))
[~]> print results.get()
[0,
 1,
 1073741824,
 205891132094649L,
 1152921504606846976L,
 931322574615478515625L,
 221073919720733357899776L,
 22539340290692258087863249L]

See also

See the IPython parallel docs (0.13+) to learn more about the IPython parallel API

Connecting from your Local IPython Installation

Note

You must have IPython 0.13+ installed to use this feature

If you’d rather control the cluster from your local IPython installation use the shell command and pass the --ipcluster option:

$ starcluster shell --ipcluster=mycluster

This will start StarCluster’s development shell and configure a remote parallel session for you automatically. StarCluster will create a parallel client in a variable named ipclient and a corresponding view of the entire cluster in a variable named ipview which you can use to run parallel tasks on the remote cluster:

$ starcluster shell --ipcluster=mycluster
[~]> ipclient.ids
[0, 1, 2, 3]
[~]> res = ipview.map_async(lambda x: x**30, range(8))
[~]> print res.get()

Using IPython Parallel Scripts with StarCluster

If you wish to run parallel IPython scripts from your local machine that run on the remote cluster you will need to use the following configuration when creating the parallel client in your code:

from IPython.parallel import Client
rc = Client('~/.starcluster/ipcluster/<cluster>-<region>.json'
            sshkey='/path/to/cluster/keypair.rsa')

For example, let’s say we started a cluster called ‘mycluster’ in region ‘us-east-1’ with keypair ‘mykey’ stored in /home/user/.ssh/mykey.rsa. In this case the above config should be updated to:

from IPython.parallel import Client
rc = Client('/home/user/.starcluster/ipcluster/mycluster-us-east-1.json'
            sshkey='/home/user/.ssh/mykey.rsa')

Note: it is possible to dynamically add new nodes with the starcluster addnode command to a pre-existing cluster. New IPython engines will automatically be started and connected to the controller process running on master. This means that existing Client and LoadBalancedView instance will automatically be able to leverage the new computing resources to speed-up ongoing computation.

Configuring a custom packer

The default message packer for IPython.parallel is based on the JSON format which is quite slow but will work out of the box. It is possible to instead configure the faster 'pickle' packer:

[plugin ipcluster]
setup_class = starcluster.plugins.ipcluster.IPCluster
enable_notebook = True
notebook_directory = notebooks
# set a password for the notebook for increased security
notebook_passwd = a-secret-password
packer = pickle

When using IPython 0.13 this will require to pass an additional packer='pickle'. For instance if running the client directly from the master node:

$ starcluster sshmaster mycluster -u myuser
$ ipython
[~]> from IPython.parallel import Client
[~]> rc = Client(packer='pickle')

If the msgpack-python package is installed on all the cluster nodes and on the client, is is possible to get even faster serialization of the messages with:

[plugin ipcluster]
setup_class = starcluster.plugins.ipcluster.IPCluster
enable_notebook = True
notebook_directory = notebooks
# set a password for the notebook for increased security
notebook_passwd = a-secret-password
packer = msgpack

And then from the client:

$ starcluster sshmaster mycluster -u myuser
$ ipython
[~]> from IPython.parallel import Client
[~]> rc = Client(packer='msgpack.packb', unpacker='msgpack.unpackb')

Note: from IPython 0.14 and on the client will automatically fetch the packer configuration from the controller configuration without passing an additional constuctor argument to the Client class.

Restarting All the Engines at Once

Sometimes some IPython engine processes become unstable (non-interruptable, long running computation or memory leaks in compiled extension code for instance).

In such a case it is possible to kill all running engine processes and start new ones automatically connected to the existing controller by adding a some configuration for the the IPClusterRestartEngines plugin in your .starcluster/config file:

[plugin ipclusterrestart]
SETUP_CLASS = starcluster.plugins.ipcluster.IPClusterRestartEngines

You can then trigger the restart manually using:

$ starcluster runplugin ipclusterrestart iptest
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

>>> Running plugin ipclusterrestart
>>> Restarting 23 engines on 3 nodes
3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%

Using the IPython HTML Notebook

The IPython cluster plugin comes with support for the new IPython web notebook. As mentioned in the intro section, you will need to specify a few extra settings in the IPython cluster plugin’s config in order to use the web notebook:

[plugin ipcluster]
setup_class = starcluster.plugins.ipcluster.IPCluster
enable_notebook = True
notebook_directory = notebooks
# set a password for the notebook for increased security
notebook_passwd = a-secret-password

The notebook_passwd setting specifies the password to set on the remote IPython notebook server. If you do not specify the notebook_passwd setting the plugin will randomly generate a password for you. You will be required to enter this password in order to login and use the notebook server on the cluster. In addition to enforcing a notebook password, StarCluster also enables SSL in the notebook server in order to secure the transmission of your password when logging in.

The notebook_directory setting makes it possible to use a custom folder on the master node. The path can be relative to the user home folder or be absolute. If left blank, the notebooks are stored directly in the home folder. If notebook_directory does not exist it automatically created at cluster start-up time.

Once you have these settings in the plugin’s config simply start a cluster and let the plugin configure your IPython cluster:

$ starcluster start -s 3 iptest
StarCluster - (http://star.mit.edu/cluster)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

... (abbreviated output)
>>> Running plugin ipcluster
>>> Writing IPython cluster config files
>>> Starting the IPython controller and 7 engines on master
>>> Waiting for JSON connector file...
/home/user/.starcluster/ipcluster/SecurityGroup:@sc-iptest-us-east-1.json 100% || Time: 00:00:00  37.55 M/s
>>> Authorizing tcp ports [1000-65535] on 0.0.0.0/0 for: IPython controller
>>> Adding 16 engines on 2 nodes
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up IPython web notebook for user: myuser
>>> Creating SSL certificate for user myuser
>>> Authorizing tcp ports [8888-8888] on 0.0.0.0/0 for: notebook
>>> IPython notebook URL: https://ec2-184-72-131-236.compute-1.amazonaws.com:8888
>>> The notebook password is: XXXXXXXXXXX
*** WARNING - Please check your local firewall settings if you're having
*** WARNING - issues connecting to the IPython notebook
>>> IPCluster has been started on SecurityGroup:@sc-iptest for user 'myuser'
with 23 engines on 3 nodes.

To connect to cluster from your local machine use:

from IPython.parallel import Client
client = Client('/home/user/.starcluster/ipcluster/SecurityGroup:@sc-iptest-us-east-1.json', sshkey='/home/user/.ssh/mykey.rsa')

See the IPCluster plugin doc for usage details:
http://star.mit.edu/cluster/docs/latest/plugins/ipython.html
>>> IPCluster took 0.738 mins

Pay special attention to the following two lines as you’ll need them to login to the cluster’s IPython notebook server from your web browser:

>>> IPython notebook URL: https://ec2-XXXX.compute-1.amazonaws.com:8888
>>> The notebook password is: XXXXXXXXX

Navigate to the given https address and use the password to login:

../_images/ipnotebooklogin.png

After you’ve logged in you should be looking at IPython’s dashboard page:

../_images/ipnotebookdashboard.png

Since this is a brand new cluster there aren’t any existing IPython notebook’s to play with. Click the New Notebook button to create a new IPython notebook:

../_images/ipnotebooknew.png

This will create a new blank IPython notebook. To begin using the notebook, click inside the first input cell and begin typing some Python code. You can enter multiple lines of code in one cell if you like. When you’re ready to execute your code press shift-enter. This will execute the code in the current cell and show any output in a new output cell below.

You can modify existing cells simply by clicking in the cell, changing some text, and pressing shift-enter again to re-run the cell. While a cell is being executed you will notice that the IPython notebook goes into a busy mode:

../_images/ipnotebookbusy.png

You can keep adding and executing more cells to the notebook while in busy mode, however, the cells will run in the order they were executed one after the other. Only one cell can be running at a time.

Once you’ve finished adding content to your notebook you can save your work to the cluster by pressing the save button. Since this is a new notebook you should also change the name before saving which will temporarily change the save button to rename:

../_images/ipnotebookrename.png

This will save the notebook to <notebook title>.ipynb in your CLUSTER_USER‘s home folder. If you’ve configured StarCluster to mount an EBS volume on /home then these notebook files will automatically be saved to the EBS volume when the cluster shuts down. If this is not the case you will want to download the notebook files before you terminate the cluster if you wish to save them:

../_images/ipnotebookdownload.png

Press ctrl-m h within the web notebook to see all available keyboard shortcuts and commands

See also

See the official IPython notebook docs for more details on using the IPython notebook

Using Parallel IPython in the IPython Notebook

It’s also very easy to combine the notebook with IPython’s parallel framework running on StarCluster to create an HPC-powered notebook. Simply use the same commands described in the Using the IPython Cluster section to set up a parallel client and view in the notebook:

../_images/ipnotebookparallel.png