Skip to Content

Cluster News

Announcements

10/24/17 – ACISS is being retired ahead of schedule due to the persistent storage system problem (see status below for details).  Please contact ACISS support at cas-aciss@ithelp.uoregon.edu if you have data to retrieve from ACISS.

10/25/17 – Numerous requests for file access are being received.  Please bear with us while we process these requests for data recovery.

11/3/17 – Due to the recent outage on Talapas, the data transfer requests are further delayed, but should be ready the week starting 11/6/17.

9/22/17 – ACISS is near the end of its life and various hardware (especially storage) is failing at an increasing rate.  No new accounts will be created and no new software requests will be accepted on ACISS.  It is highly recommended that all users back up their data ASAP and start transitioning to the new HPC Talapas.  You may request an account on Talapas here:

https://hpcrcf.atlassian.net/servicedesk/

You will need to be sponsored by a PI (i.e. belong to a Primary Investigator Research Group or PIRG) to obtain an account on Talapas.

 

 

Current Status

9/22/17 – The ACISS run load is being restored to normal levels.  ACISS is sluggish at this time due to a high volume of file transfers.

10/5/17 – Part of the ACISS storage system went down this morning.  One of the seven controllers had a major failure, causing some files and directories (roughly 1 in 7) to become unavailable, resulting in input/output errors and failed login attempts for some users.  The Cluster has been taken offline for emergency storage system maintenance to try to make this controller available again.

10/6/17 – The ACISS storage system is being taken completely offline.  All jobs will be terminated and all connections to ACISS lost.  A full reboot was performed on the storage system, but the problem persists.  ACISS will remain offline until further notice.

10/9/17 – Continued attempts to revive the segments on the ailing storage controller have failed to this point, ACISS remains offline.

10/13/17 – A filesystem check was started on Tuesday, Oct 10, on each of the 12 volumes impacted by the ailing controller.  It is taking about one day per volume, so it looks like the filesystem will remain offline until Monday the 23rd.

10/23/17 – The filesystem check has completed, but the segments are still unavailable due to input/output errors between the storage nodes and the controller, which suggests the controller itself is not healthy.

 

Recent Changes

9/3/14 – Due to increased interest in having fat nodes with more memory, the fat nodes were reconfigured to 8 nodes with 256GB RAM and 8 nodes with 512GB RAM.  If you need a node with 512GB RAM, please reserve using this resource option: “-l nodes=1:ppn=32:xmem”.

1/5/15 – The “xlong” queues have been disabled since they are no longer necessary – use the “long” queues instead.  To reserve more than the default number of hours on the long queues, use the “-l walltime=N:00:00” option, where N is the number of hours that you want.

1/28/15 – Two more login nodes have been added to take the load off of the head node hn1.  To connect to them, please log in via “ssh username@login.aciss.uoregon.edu“.

9/23/15 – The head node hn1 is being reserved for administrative use, so sessions to hn1 are limited.  Please use login.aciss.uoregon.edu instead.

 

Software Updates

Various software and modules are often updated and tested.  Stay tuned for more!

 

New software installs as of 1/29/16:

GCC 5.3

Python 2.7.11

Python 3.4.4

 

New software installs as of 12/29/15:

Mathematica 10.3.1

R 3.2.3

Cuda 7.5

 

New software installs as of 9/23/15:

MATLAB 2015b:   matlab/r2015b

boost 1.59.0:         boost/1.59

Mathematica 10.2:   Mathematica/10.2

gcc 5.2:                  gcc/5.2

cmake 3.3.2:          cmake/3.3.2