- Error message when running jobs
- What happened to my job, why did it fail?
There are multiple possible reasons for a job to fail. To assist us with the investigation, please provide the jobid or node that the job crashed on, or at least the approximate time of failure so we can locate it and find out what happened. It would also be helpful if you could point us to the batch script you used for the failed job.
Also, make sure you are not running jobs on a login node, as that is likely to fail.
If your output gives error messages like “unable to copy” or “error from copy”, then there is a good chance there is something wrong with your ssh keys. To fix them, run the “regen_ssh_key” command from the head node (hn1). If this problem persists after running this command, let us know at firstname.lastname@example.org.
- I cannot get such-and-such program to work, or I am getting strange error messages when running such-and-such program on ACISS. Can you fix this please!
Sure, we’ll be happy to try and troubleshoot the problem, but we will likely need to be able to reproduce the problem in order to work on fixing it. The following information would be very helpful to us in diagnosing the problem:
1) your jobid and/or node you are running on
2) the list of modules you currently have loaded (do “module list”)
3) the sequence of commands you use to produce the error
4) the actual error message itself
These can all be copy-and-pasted in an email, or send us a screenshot.
- How can I change my password on ACISS?
If you have an active duckid, change your password here: https://duckid.uoregon.edu/
If you are a collaborator and do not have a duckid, log into the ACISS head node (hn1) and change your password using the “passwd” command.
- I cannot log into ACISS anymore, what happened?
If you are receiving “permission denied” messages, then check that your UO account (duckid) is still active and that your duckid password has not expired. If your password has expired, please see above. If your account is active and your password is current, it is possible that there is a temporary hiccup in the authentication system. In this case, email us at email@example.com and we’ll get that fixed.
Also, ACISS blacklists and blocks login attempts after several repeated login failures. If you receive messages like “Operation timed out” or “Connection reset by peer” then this likely applies to you. Try connecting to the UO network using VPN, then try logging into ACISS again. Check out this web site for more information on VPN: https://it.uoregon.edu/vpn
An alternative to VPN is to email ACISS Support (firstname.lastname@example.org) and provide us with the public IP address you are trying to connect from. If you do not know how to determine this, please provide the result by visiting one of the web sites below:
- Oh darn, I just accidently deleted some stuff from one of my directories. Can you please retrieve or recover it?
Sorry, no, we cannot recover anything that has been deleted. ACISS does not have any backup or snapshot mechanism implemented. This is something that we really want to change in the future. We STRONGLY recommend that you backup your data regularly.
- Can I get my job extended?
Yes, probably – please provide the jobid of the job that you want to extend, and how many additional hours that you need. Also, be sure to allow sufficient time during regular business hours (M-F, 8am-5pm) to request an extension before your job is set to expire. There is a good chance that these types of requests made after 5pm or on weekends will not be caught on time and may be missed altogether.
- Can I get software installed?
We are happy to install legally licensed software for our users, just let us know at email@example.com. To assist us, please provide the link to the software and version that you want installed, or upload the source or binaries into your home directory. Also, let us know any customization that you require.
The wait time for the software install depends on the complexity of the software (number of dependencies and ease of build) and how busy the staff is. The average wait time is a few days.
If you are proficient at installing software on a Linux system and would prefer to do some custom installs yourself (for general use) rather than waiting for the staff, let us know.
- Can I run graphical applications on ACISS?
Yes. You need to have an X11 server application running on your computer. If you are using a Mac, download the latest version of XQuartz from here: http://xquartz.macosforge.org/landing/
Here are some options if you are running Windows:
If you are using Linux, you should be set.
After launching your X11 server app, ssh to ACISS with X11 forwarding enabled. Then reserve a node with X11 forwarding enabled (qsub -IX).
- Why are my jobs just sitting in the queue and not running?
There are many possible reasons for this:
1) The resources that you have requested are not currently available.
2) The combination of resources that you requested do not exist.
3) You have reached your limit on either the number of reserved cores or reserved walltime. For more information on limits, go here:
4) The queue may be stuck. There are many reasons why this can happen, but if you feel #1 and #2 do not apply to you and think one or more of your jobs are stuck, let us know and we will investigate.
- I cannot delete my own job, what’s going on?
Most likely the compute node that your job ran on crashed, causing the Server to lose contact with that node. When this happens, there is nothing you can do except notify us at firstname.lastname@example.org and let us know your jobid so that we can fix the problem.