StarCluster - Mailing List Archive

Fwd: StarCluster comments

From: Antonio González Peña <no email>
Date: Tue, 22 May 2012 09:10:03 -0600

Hi Again,

Just to send you another bug report:
-------------------------------------------------------------
default_cluster_new (security group: _at_sc-default_cluster_new)
-------------------------------------------------------------
Traceback (most recent call last):
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/cli.py",
line 255, in main
    sc.execute(args)
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/commands/listclusters.py",
line 19, in execute
    show_ssh_status=self.opts.show_ssh_status)
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/cluster.py",
line 247, in list_clusters
    nodes = cl.nodes
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/cluster.py",
line 674, in nodes
    if n.is_master():
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/node.py",
line 764, in is_master
    return self.alias == "master"
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/node.py",
line 108, in alias
    user_data = self._get_user_data(tries=5)
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/node.py",
line 88, in _get_user_data
    user_data = self.ec2.get_instance_user_data(self.id)
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/awsutils.py",
line 450, in get_instance_user_data
    return base64.b64decode(user_data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/base64.py",
line 73, in b64decode
    return binascii.a2b_base64(s)
TypeError: a2b_base64() argument 1 must be string or read-only
character buffer, not None

!!! ERROR - Oops! Looks like you've found a bug in StarCluster
!!! ERROR - Crash report written to:
/Users/antoniog/.starcluster/logs/crash-report-41284.txt
!!! ERROR - Please remove any sensitive data from the crash report
!!! ERROR - and submit it to starcluster_at_mit.edu
Antonio-Gonzalez-Pena:~ antoniog$ more
/Users/antoniog/.starcluster/logs/crash-report-41284.txt
---------- CRASH DETAILS ----------
COMMAND: starcluster lc
2012-05-22 09:08:38,822 PID: 41284 config.py:551 - DEBUG - Loading config
2012-05-22 09:08:38,823 PID: 41284 config.py:118 - DEBUG - Loading
file: /Users/antoniog/.starcluster/config
2012-05-22 09:08:38,825 PID: 41284 awsutils.py:54 - DEBUG - creating
self._conn w/ connection_authenticator kwargs = {'proxy_user': None,
'proxy_pass': None, 'proxy_port': None, 'proxy': None, 'is_secure':
True, 'path': '/', 'region': None, 'port': None}
2012-05-22 09:08:39,746 PID: 41284 cluster.py:664 - DEBUG - existing nodes: {}
2012-05-22 09:08:39,746 PID: 41284 cluster.py:672 - DEBUG - adding
node i-19b1fa7f to self._nodes list
2012-05-22 09:08:39,746 PID: 41284 cluster.py:680 - DEBUG - returning
self._nodes = [<Node: master (i-19b1fa7f)>]
2012-05-22 09:08:40,405 PID: 41284 cluster.py:664 - DEBUG - existing nodes: {}
2012-05-22 09:08:40,405 PID: 41284 cluster.py:672 - DEBUG - adding
node i-2908654f to self._nodes list
2012-05-22 09:08:40,406 PID: 41284 cluster.py:672 - DEBUG - adding
node i-37086551 to self._nodes list
2012-05-22 09:08:40,406 PID: 41284 cluster.py:672 - DEBUG - adding
node i-35086553 to self._nodes list
2012-05-22 09:08:40,406 PID: 41284 cluster.py:672 - DEBUG - adding
node i-33086555 to self._nodes list
2012-05-22 09:08:40,406 PID: 41284 cluster.py:672 - DEBUG - adding
node i-31086557 to self._nodes list
2012-05-22 09:08:40,407 PID: 41284 cluster.py:672 - DEBUG - adding
node i-3f086559 to self._nodes list
2012-05-22 09:08:40,407 PID: 41284 cluster.py:672 - DEBUG - adding
node i-3d08655b to self._nodes list
2012-05-22 09:08:40,407 PID: 41284 cluster.py:672 - DEBUG - adding
node i-3b08655d to self._nodes list
2012-05-22 09:08:40,407 PID: 41284 cluster.py:672 - DEBUG - adding
node i-3908655f to self._nodes list
2012-05-22 09:08:40,408 PID: 41284 cluster.py:672 - DEBUG - adding
node i-07086561 to self._nodes list
2012-05-22 09:08:40,408 PID: 41284 cluster.py:672 - DEBUG - adding
node i-05086563 to self._nodes list
2012-05-22 09:08:40,408 PID: 41284 cluster.py:672 - DEBUG - adding
node i-03086565 to self._nodes list
2012-05-22 09:08:40,409 PID: 41284 cluster.py:672 - DEBUG - adding
node i-01086567 to self._nodes list
2012-05-22 09:08:40,409 PID: 41284 cluster.py:672 - DEBUG - adding
node i-0f086569 to self._nodes list
2012-05-22 09:08:40,409 PID: 41284 cluster.py:672 - DEBUG - adding
node i-0d08656b to self._nodes list
2012-05-22 09:08:40,409 PID: 41284 cluster.py:672 - DEBUG - adding
node i-0b08656d to self._nodes list
2012-05-22 09:08:40,410 PID: 41284 cluster.py:672 - DEBUG - adding
node i-0908656f to self._nodes list
2012-05-22 09:08:40,410 PID: 41284 cluster.py:672 - DEBUG - adding
node i-17086571 to self._nodes list
2012-05-22 09:08:40,410 PID: 41284 cluster.py:672 - DEBUG - adding
node i-15086573 to self._nodes list
2012-05-22 09:08:40,410 PID: 41284 cluster.py:672 - DEBUG - adding
node i-13086575 to self._nodes list
2012-05-22 09:08:40,411 PID: 41284 cluster.py:672 - DEBUG - adding
node i-11086577 to self._nodes list
2012-05-22 09:08:40,411 PID: 41284 cluster.py:672 - DEBUG - adding
node i-1f086579 to self._nodes list
2012-05-22 09:08:40,411 PID: 41284 cluster.py:672 - DEBUG - adding
node i-1d08657b to self._nodes list
2012-05-22 09:08:40,412 PID: 41284 cluster.py:672 - DEBUG - adding
node i-1b08657d to self._nodes list
2012-05-22 09:08:40,412 PID: 41284 cluster.py:672 - DEBUG - adding
node i-1908657f to self._nodes list
2012-05-22 09:08:40,412 PID: 41284 cluster.py:672 - DEBUG - adding
node i-e7086581 to self._nodes list
2012-05-22 09:08:40,412 PID: 41284 cluster.py:672 - DEBUG - adding
node i-e5086583 to self._nodes list
2012-05-22 09:08:40,413 PID: 41284 cluster.py:672 - DEBUG - adding
node i-e3086585 to self._nodes list
2012-05-22 09:08:40,413 PID: 41284 cluster.py:672 - DEBUG - adding
node i-e1086587 to self._nodes list
2012-05-22 09:08:40,413 PID: 41284 cluster.py:672 - DEBUG - adding
node i-ef086589 to self._nodes list
2012-05-22 09:08:40,413 PID: 41284 cluster.py:672 - DEBUG - adding
node i-abfdbccd to self._nodes list
2012-05-22 09:08:40,577 PID: 41284 cli.py:287 - DEBUG - Traceback
(most recent call last):
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/cli.py",
line 255, in main
    sc.execute(args)
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/commands/listclusters.py",
line 19, in execute
    show_ssh_status=self.opts.show_ssh_status)
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/cluster.py",
line 247, in list_clusters
    nodes = cl.nodes
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/cluster.py",
line 674, in nodes
    if n.is_master():
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/node.py",
line 764, in is_master
    return self.alias == "master"
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/node.py",
line 108, in alias
    user_data = self._get_user_data(tries=5)
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/node.py",
line 88, in _get_user_data
    user_data = self.ec2.get_instance_user_data(self.id)
  File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/awsutils.py",
line 450, in get_instance_user_data
    return base64.b64decode(user_data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/base64.py",
line 73, in b64decode
    return binascii.a2b_base64(s)
TypeError: a2b_base64() argument 1 must be string or read-only
character buffer, not None

---------- SYSTEM INFO ----------
StarCluster: 0.93.3
Python: 2.6.1 (r261:67515, Jun 24 2010, 21:47:49) [GCC 4.2.1 (Apple
Inc. build 5646)]
Platform: Darwin-10.8.0-i386-64bit
boto: 2.3.0
ssh: 1.7.13
Crypto: 2.5
jinja2: 2.6
decorator: 3.3.1



On Fri, May 18, 2012 at 11:16 AM, Antonio González Peña
<antgonza_at_gmail.com> wrote:
> Hi Justin,
>
> Since our last meeting, over a week ago, I have been working with
> StarCluster and I have to say that it is a really nice piece of
> software and a really cool and exciting application to work with.
> Anyway, in these last days I have encounter some issues, bugs and came
> up with some suggestions. Sorry for the long list but I rather have
> everything in one singe email.
>
> Suggestions/Comments:
> - Adding new nodes is really cool but when you try to add 100 nodes
> sometimes it fails as it stays "looping" and never moves forward.
> Something else is that is a linear process and dependent on the
> client, this implies constant internet connectivity from the client to
> have success, which in some cases is difficult. One option will be to
> submit the request to the master and then let the master finish the
> process, what do you think?
> - Sometimes when adding nodes fails the instances are created but the
> configuration fails so I had to remove the nodes by hand (what has
> really painful, 100 instances by hand). Will it be possible to resume
> at any point the addition of instances?
> - Currently if a cluster is created it seems that changing the config
> will not change its configuration. For example, if I created a cluster
> with 2 medium instances and I want to add new instances but now xlarge
> I can not do that, is this right?
> - Something really useful will be to have/manage different config
> files, this will allow us to have different credentials (we have
> different EC2-accounts/credit-cards based on projects) and having to
> move the config file is troublesome. Other possibility is that the
> config file could have different accounts and you could set which
> account to use within each cluster.
> - Something else that will be nice to have is a "test" option for a
> cluster. Basically a print statement that tells you what are you going
> to create with the command that you are running, to test that the
> configuration is fine before executing/creating.
> - I have tried several times to "extend" a configuration without
> success, do you have documentation abou this? Mainly which constraints
> does it have?
> - One final suggestion, at the beginning by mistake I ran terminate on
> my cluster (got confuse with my bash_history) vs. remove a node and
> lost all my work. Is there a way to make it more "friendly", maybe ask
> the user to first terminate all the workers and then finish the
> master?
>
> Bugs:
> - Got: "!!! ERROR - InsufficientInstanceCapacity: Insufficient
> capacity." No idea what it means
> - Got: "!!! ERROR - command 'mount /home/ubuntu/datasets' failed with
> status 32". This happens when trying to mount a disk that hasn't being
> mounted before so it has no partition: "Disk identifier: 0x00000000
> Disk /dev/xvdz doesn't contain a valid partition table mount
> configuration" Is there a way to auto detect this and create a
> partition for it? If this is not possible what is the easiest way to
> do this? Do we have to start an instance for the partition first and
> then use it with StarCluster?
> - Attached some logs of these and other errors.
>
> Cheers,
>
> --
> Antonio González Peña
> Research Assistant, Knight Lab
> University of Colorado at Boulder
> https://chem.colorado.edu/knightgroup/



-- 
Antonio González Peña
Research Assistant, Knight Lab
University of Colorado at Boulder
https://chem.colorado.edu/knightgroup/



Received on Tue May 22 2012 - 11:10:28 EDT
This archive was generated by hypermail 2.3.0.

Search:

Sort all by:

Date

Month

Thread

Author

Subject