Please resubmit your jobs if you were affected.
There was a problem with one of the power distribution units.It caused an overload on the circuit which brought down several nodes and the pigeon head node.
The batch queue has gained 1280 more CPU cores today.Enjoy!
We have upgraded our distributed filesystem, GPFS. The new version of GPFS no longer supports CNFS, which was used by the following services:
1. Biocluster .html directories (ie. http://biocluster.ucr.edu/~username/)
2. Biocluster DBs (ie. mysql -h bioclusterdb.bioinfo.ucr.edu)
3. Biocluster dashboard (ie. https://dashboard.bioinfo.ucr.edu)
4. Rstudio (https://rstudio.bioinfo.ucr.edu)
5. Illumina data download page (http://illumina.bioinfo.ucr.edu)
These services will be resurrected as soon as possible.
Thank you for your understanding.
All services have been restored.
The Biocluster requires a scheduled shutdown of ALL services: Torque/Maui (PBS queuing system), Websites, virtual environments/machines, storage access, backup systems and network services. We ask that you please make sure that you do not have any jobs running in the queue, and that you completly logout of Biocluster (pigeon.bioinfo.ucr.edu) before the shutdown.
The scheduled time for this shutdown is as follows (48 hours):
Start - Friday, September 11th, 2015 @ 12:00am
End - Saturday, September 12th, 2015 @ 11:59pm
If you have any questions or concerns, please contact us at email@example.com
The AC units in our server room have both failed. We have executed an emergency shutdown of all our systems.
Physical Plant is scheduled to repair the units tomorrow (Saturday August 30th), at which point we will then restore our systems.
Shutdown of ALL services has started. This includes, but is not limited to; Torque/Maui (PBS queuing system), Websites, virtual environments/machines, storage access, backup systems and network services.
The scheduled time for this shutdown will be:
12:00am, Fri, May 23, 2014
11:59pm, Sat, May 24, 2014
An additional post here, and email will be sent, when Biocluster is fully operational.
Thank you for your patience and understanding.
We are currently aware of the situation and are working rigorously to resolve this issue.
At approximately 2:30pm today there were some storage connectivity issues.
These issues were resolved, and Biocluster is now operating normally.
We are currently experiencing intermittent login/connection issues with Biocluster.
We are working hard to resolve this issues.
Biocluster systems have been restored.We will continue to closely monitor the situation.
We have set the default memory reservation to 1GB for the batch queue (nodes 1-34). If you do not specify otherwise, the cluster will now assume that your job only needs 1GB of RAM. You will need to make a specific reservation if your job needs more memory than 1GB or the cluster will automatically kill the job for resource violations. For information on making a reservation, please consult the cluster manual: http://manuals.bioinformatics.ucr.edu/home/hpc#TOC-Advanced-Usage.
These changes have increase the stability of the cluster by preventing nodes from being over-allocated. This greatly reduces the chances of a node crashing. It will also help prevent cases where jobs run too slowly or not at all because they do not have the resources they need. If you can any questions about this, please contact us at firstname.lastname@example.org.
1-10 of 22