Thursday, December 3, 2015

Where stuff is on isbrae, and general guidelines

So, isbrae is our shared server machine.  Among other things it has a large disk array on which we can store data, and is backed up each night to an off-site (over in the Math department) machine (called epica).  If you're logged into isbrae, you probably have access to some of these shared files.

As a user of shared space, you should be aware of and adhere to a few guidelines on where things should go.

One principle I request is do not mix data and code.  Data is large, after we collect it does not change, and thus does not take person-time to generate.  Code, on the other hand, takes lots of person time but is small, therefore easy to back up in multiple places (see my post on using unison to do backups- unison is awesome).

Another possibly obvious one is do not store intermediate files (ie partially processed data) unless you have to.  The way to avoid this is to do your data processing programmatically- I generally start with raw data, and in one or more matlab/python/shell scripts, get to publication-ready figures.  This not only avoids clutter and wasted disk space, but also puts the documentation of how you got from data to figure right in the code, where you can easily find it later.  Another way to put this is to avoid data clutter.

If you want to hack around and don't want to have to worry about cleaning up- ie you need some temporary space, use /bigtmp/.  It will be cleaned up automatically for you each week.  More on that in a minute.

Ok- that's it for general guidelines- here's where to keep things:
1) your home directory: you can of course store things in your home directory- but this is by default not readable by anyone else.  I don't have any restrictions on what you put in there, ie you can put your own personal stuff in there, but please nothing illegal, and you should know that Ed and I can use administrative privileges to look anywhere.  Note that your home directory is on a disk with finite space, so if you have a lot of stuff, please don't store it there.

2) /bigtmp/: This is a space that is great for passing files back and forth between users (see my posts on moving things using the command line and using a gui for this), or for using as scratch space- I often set my output to /bigtmp/ when I am developing a workflow, so that the stuff I produce won't hang around.  THIS GETS WIPED EVERY WEEKEND- that's the point of the 'tmp' part.

3) /code/: everyone can have space to store code in the /code directory (I may have to make you a directory there, just ask).  This is a great place to put code especially that you might want to share with others- just direct them to your code directory!

4) /data1, /data2, /data3, /data4: these are the main (really big) drive array, divided into 4 partitions.  If you want to put something here, talk to me about where it should go and I can give guidelines.  In general right now, /data1 and /data2 are in active use for multiple projects and still have space available.  /data3 is all SIRAL data, and is completely full (a problem I need to solve), and /data4 is currently unused.

5) /doc: this is a place where you can put manuals, or other types of documentation that might be of interest to the general user base.  Let me know if you want to put something here.