Tuesday, July 16, 2013

Digital lab book using pmwiki on isbrae

So, documentation is of course very important.  In our work, we don't always have a lab book, as much of our stuff happens at the computer....

But we can use that computer for logging what we do- it's really useful.  To that end, I've copied Dan Breton's workflow using pmwiki, an open source package for making wiki pages.  This makes it pretty straightforward to just use the web page for a notebook.  

To get yourself started with pmwiki on isbrae:

First you need an account.  Once you've done that, you should have a public_html folder under your home directory:

[bo@isbrae public_html]$ pwd
[bo@isbrae public_html]$ 

If you haven't already got one, make that first.  Then, copy the wiki-skeleton set of files that Dan set up in his public_html directory:

[bo@isbrae ~]$ cp -r /home/dbreton/public_html/wiki_skeleton ~/public_html/hawley_wiki

(substitute your favorite name for 'hawley_wiki' above)

Now, you need to momentarily make a quick opening for pmwiki to access the directory, so exectute

[bo@isbrae public_html]$ chmod 2777 hawley_wiki/

And then open your favorite browser on your desktop/laptop/whatever computer and navigate to:


(where you replace <username> and <wiki_name> with your username and the wikiname you chose earlier, respectively).

Now that pmwiki has been initialized, you can close up the permissions again:

[bo@isbrae public_html]$ chmod 755 hawley_wiki/

Your wiki page is now active on the link to which you just navigated.  

And then you can start wiki'ing!  See Dan's wiki at http://isbrae.dartmouth.edu/pmwiki/pmwiki.php for more documentation on how to work with pmwiki and the special things Dan's set up to help make our lives easier.  Also see the pmwiki site and all the support pages.  It's not that hard once you get it up and running!

Hope this helps!

Monday, March 11, 2013

Backups, Backups! Using unison.

I've had a few conversations with folks about backups.  In particular, how you REALLY NEED TO HAVE THEM.  You know how in climbing, people say "never climb unroped higher than you are willing to fall"...  Well, the corollary in computing is "never work longer between backups than you'd want to repeat".  For me that's no more than a day, tops.  Sometimes an hour.

Even backups that are provided by your institution are not completely safe, and it's a good idea to roll your own, that way you will know they are working.  This is illustrated by a very short story from grad school...

We were all working on a Solaris cluster, with several of us having linux workstations as well.  All of our data was centralized on a RAID disk array.  It was RAID, so it was safe, right?  Anyway, the department had a tape backup that ran every night, so everything was cool.  Weeeellll.....

One morning we came in to find that a) the amazing RAID array had suffered a catastrophic failure, and b) the tape backup hadn't been working for months.  Our IT person resigned soon after this, but many people lost a great deal of work.  Fortunately for me, I'd been making backups of my stuff to multiple locations, none of which were affected by this double-failure.  And I'd convinced many of the ice group to do the same, so many of them were unaffected.

I currently use two tools for backups; rsync and unison.  In addition, I have several scripts configured to automate parts of the process.  All of this can be tunneled through SSH, so in order to automate it, you need to configure passwordless ssh.  But no need to go there yet.  We'll start with using unison, as that gives great bang for the buck.  Maybe if there's interest I'll post about rsync another time.

First, it's important to note that my computing environment is simultaneously more complex and simpler than that of most users.  It's more complex because I have several computers on which I actively work, but it's more simple because I have ALL THE SAME FILES on every one of them.  How?  By using unison.

So- I use a hub-and-spoke topology for my primary working computers, and use unison to synchronize files between them, and a tree topology for backing up.

Sorry for the long preamble.  The first thing to do is choose what you want to back up.  Do not waste your backup space (we're going to make multiple copies) filling it with your music collection, your movies, pictures of your dog, or other things that you can easily recover from elsewhere (this includes large datasets that you can re-download from online archives).  SO that means don't just back up your home directory.  Choose.

What's most important are things that take your time, not computer time, so things like code, papers, presentations, but not things like model output that could be regenerated easily.  You will find that this makes your backup take a lot less space than you'd otherwise think.

When you know what you want to back up, install unison on your machine, and start configuring.  Here's the text of my config file (stored in ~/.unison/hawleymbp.prf); and I'll annotate to show what are the important points:

# this is the contents of the file .unison/hawleymbp.prf (and
#  others that I have modeled after it):

# the text-only interface (old skool)


# trying to get rid of properties issues

  perms = 0
  rsrc = false
# where unison is on the remote server (it's in a nonstandard 
# location on firn; do not use this on isbrae)
#  servercmd=/opt/local/bin/unison

This is the most important piece of the configuration; where you are syncronizing!  The first root should be your local machine, and the second should be where you want to sync to; isbrae is a good choice.  Note the syntax of the second root; that is what you need to tunnel unison through ssh.  

# Roots of the synchronization
   root = /Users/bo/
   root = ssh://bo@firn.dartmouth.edu//Users/bo/

Now the directories within the roots you've just listed, that you want to sync.  So these are all directories just below my home folder.  
# Paths to synchronize 
     path = local 
     path = spri_local 
     path = dartmouth_local 
     path = STUFF 
     path = Desktop 
     path = bin

If you have anything big and easily replaceable in your directories that you're interested in backing up, well, that's a tactical error in directory construction, but you can use the ignore directive to keep them from being synchronized (I also use the ignore to keep from sync'ing the auto-backup file that emacs creates)
# paths and names to ignore:

     ignore = Name *~
     ignore = Name .*~
     ignore = Name .DS_Store
#     ignore = Name SCENARIO
#     ignore = Name data/ICESAT
#     ignore = Name data/ASIRAS

Use these last options with caution, as they can get you in trouble by "automatically" resolving conflicting file changes (if you changed a file in both places).
# Stuff for speed of use.
#      prefer = ssh://bo@firn.dartmouth.edu//Users/bo/
#      batch = true
      auto = true

And that's that.  Place the file in a directory named ~/.unison/ (in your home folder- note that the leading dot makes the directory invisible unless you use "ls -a" to see it).  The file should have a name that you don't mind typing (in this example mine is hawleymbp.prf), and the extension ".prf".

Then, at the command line, type:

HawleyMBP:~ bo$ unison -ui text hawleymbp

The first time you run it, it will mention that it's got to check through everything to make an index file, and that's just fine.  After that, you'll see it scroll through a lot of things as it scans, and then it will present you with something that looks like:

Reconciling changes

local          firn.dart...       
deleted  ---->            Desktop/McMurdoStationGuide.pdf  
new file ---->            Desktop/hawleymbp.prf  
changed  ---->            Desktop/progect_files/GTD.org  
changed  ---->            dartmouth_local/papers/asiras_accum_svalbard/asiras_accum_svalbard.aux  
changed  ---->            dartmouth_local/papers/asiras_accum_svalbard/asiras_accum_svalbard.fff  
changed  ---->            dartmouth_local/papers/asiras_accum_svalbard/asiras_accum_svalbard.lof  
changed  ---->            dartmouth_local/papers/asiras_accum_svalbard/asiras_accum_svalbard.log  
changed  ---->            dartmouth_local/papers/asiras_accum_svalbard/asiras_accum_svalbard.pdf  
changed  ---->            dartmouth_local/papers/asiras_accum_svalbard/asiras_accum_svalbard.tex  

Proceed with propagating updates? [] 

We're seeing that things have only changed on the local machine (which is what I'd expect, as I've been working on this machine only recently).  If anything had changed on firn.dartmouth.edu, we would see arrows going the other direction.  If there was a conflict, arrows would go in both directions, and you would be prompted to choose the copy to propagate.  

At this point, you have the option of hitting ctrl-c and leaving everything as it was.  Or you can type "y" and the changes will be propagated.  Perhaps obviously it's a good idea to create a simple test directory with a couple of files to practice with, before you do this with your own stuff.  

Once you start the process of propagating the changes, unison will keep you posted as to what it's doing and how long it's likely to continue doing it:

[END] Updating file dartmouth_local/papers/asiras_accum_svalbard/asiras_accum_svalbard_preprint_style.pdf
 81%  00:59 ETA

And when it finishes, it'll let you know how things went (this way you don't have to sit there and watch it every time):

UNISON 2.27.57 finished propagating changes at 15:29:53 on 11 Mar 2013

Saving synchronizer state
Synchronization complete  (17 items transferred, 0 skipped, 0 failures)
HawleyMBP:~ bo$ 

So here we see that all 17 items transferred successfully, no failures.  It would skip a file if there were a conflict and you'd chosen the "batch" option (which doesn't ask you if you want to proceed).  Note that unison makes all file copies atomic which means that it keeps the original file in a third place until the updated copy finishes transferring; if there's an error in the transfer, it puts the original back and notes that no copy happened at all, so the file will be updated the next time unison is run.  

Ok, that's enough for now.  Hopefully that got you synchronized.  My advice is to play around with a directory that you create solely for learning how this works to start with, find out how you can trip yourself up by modifying a file in two different places at once, and how to get out of it.  And then get it working on your important files.  Note that all you have to do is get it up onto isbrae, and isbrae will propagate it to several other backup locations- and as a former boss once told me, you need 3 backups:  one because you're stupid, one because equipment gets #%$*&'d up, and one, well, just because.  

And if it seems like this whole process will take too long to set up, consider how much time I just spent typing all of this out!  And be assured that once you get it configured, your backups will be so fast and easy it'll just be a part of your daily workflow.

Happy sync'ing!

Wednesday, May 23, 2012

Permissions on Isbrae

So, another thing that always comes back to haunt people (myself included) is permissions.  You copied something to /bigtmp (see other posts on that topic), and told someone where to find it, but they came back and said they couldn't copy it, read it, or whatever.  The problem is probably permissions.

Unix (linux and Mac os X are unix variants) file permissions work on three levels:  there are separate permissions for the owner of the file, the group to whom the owner belongs, and the entire world, which means everyone else on the system.  The owner of a file is the one who sets these permissions.

To see the permissions on a file or directory, you need to add the "-l" flag to the common "ls" command for listing files (for "long listing"):

[bo@isbrae ~]$ ls -l /bigtmp/
total 193860
-rw------- 1 bo    glaciology   2720463 May 23 13:26 katie_bos_radar.mat
-rw-r--r-- 1 bo    glaciology      1573 May 23 13:26 katie_density_core.txt
-rw------- 1 bo    glaciology      6385 May 23 13:14 summit_optimization.m

drwxr-xr-x 2 bo    glaciology      4096 May 23 14:30 temp_dir
-rwxrwxrwx 1 bo    glaciology         0 May 23 14:32 all_permissions_file

Here we see a few files I copied to /bigtmp for Laura to check out.  Each line is for a file.  The first thing you see in the line is 10 characters, which contain the file type and permissions.  The first character determines file type- it's a '-' for a regular file, and 'd' for a directory.  There are other characters you might see in that spot, but don't worry about that for now.  Next there are three sets of three characters, indicating the file permissions for, again, the owner, group, and world.  Directly after these permissions characters the owner is listed (in this case me, 'bo'), the group, the file size in bytes, date and time last modified, and finally the file name.

So lets look at what's here.  I've created an empty file (file size 0 bytes) called "all_permissions_file" to illustrate the permissions possibilities.  Each of the three groups gets to read, write, or execute (rwx) this file.  In general, it's not a good idea to give anyone write permissions on a file unless you are sure you want to, because it'd make it easy for people to accidentally delete your work.  So note that for all the other files and directories in this example, the owner is the only one with write permission.

Now take a look at the directory "temp_dir".  All users have read and execute permission, but only the owner has write permission.  This means that only the owner can put things in this directory, and only the owner can rename or delete the directory.  Execute permissions in this case are important, because though others have read permission, to list the contents of a directory requires execute permission.

Next look at a file that is all set to be shared, "katie_density_core.txt".  This file has read permissions for everyone, and write permissions for the owner.  So now Laura can copy this file to her space, giving her ownership over it and thus control.  

There are two files, though, that will thwart any efforts to view, copy, move, delete, rename, or otherwise mess with them, "katie_bos_radar.mat" and "summit_optimization.m".  Only the owner has any permissions over them.  Note that as owner you can even withdraw these permissions from yourself, though it would make the file less useful (one exception is that if you want to keep yourself from accidentally deleting a file you could take away your own write permissions).  But now I, as owner, need to change the permissions so that Laura can copy these files too.  To do this you use the command "chmod" (think "change mode").  There's a lot to this command, and I'm only going to give you the simplest way to use it- give all users read permissions, using the "a+r" flag:

[bo@isbrae ~]$ chmod a+r /bigtmp/katie_bos_radar.mat 
[bo@isbrae ~]$ chmod a+r /bigtmp/summit_optimization.m 

Now look to see that everything is as it should be, 

[bo@isbrae ~]$ ls -l /bigtmp/
total 193860
-rw-r--r-- 1 bo    glaciology   2720463 May 23 13:26 katie_bos_radar.mat
-rw-r--r-- 1 bo    glaciology      1573 May 23 13:26 katie_density_core.txt
-rw-r--r-- 1 bo    glaciology      6385 May 23 13:14 summit_optimization.m

drwxr-xr-x 2 bo    glaciology      4096 May 23 14:30 temp_dir

-rwxrwxrwx 1 bo    glaciology         0 May 23 14:32 all_permissions_file

[bo@isbrae ~]$ 

And you can now safely tell your colleague that the file is available.  Note that if there is a problem with permissions, only the owner can fix it, although as the "root" user, a system administrator can also come in and fix things as well.

Hope this helps!

Wednesday, May 16, 2012

Logging into isbrae

So, someone's told you to find something on isbrae.  Huh?

isbrae is our group linux server- it's got many CPU cores, and more importantly, many Tb of data storage space.  So it's a good place to store our data, and if you want to, since it runs matlab, python, and has most of the popular compilers and many graphical visualization packages you can do all of your computing there as well. 

But first you have to get there.  From a mac, open the "terminal" application, which is under Applications -> Utilities.  From Windows, you can use a freeware program called "Putty". 

Once you open the terminal, you can type your command to ssh or "secure shell" into isbrae:

HawleyMBP:~ bo$ ssh bo@isbrae.dartmouth.edu
Last login: Tue May 15 09:57:34 2012 from hawleymbp.kiewit.dartmouth.edu
[bo@isbrae ~]$

Note that in this case, my username on isbrae is 'bo' so you will need to replace this command with "ssh your-username@isbrae.dartmouth.edu"  You will be asked for your password, which you will have set when you got your account. 

If you want to run programs on isbrae that contain graphical output (like matlab running in graphics mode, for example), you need to "pipe" the graphical display from isbrae to your local machine.  This can be done easily by using the "-X" switch on ssh:

HawleyMBP:~ bo$ ssh -X bo@isbrae.dartmouth.edu
Last login: Wed May 16 10:36:46 2012 from c-66-31-143-92.hsd1.nh.comcast.net
[bo@isbrae ~]$

It's got to be a capital X, not lowercase, because lowercase means explicitly "do _not_ forward graphics". 

Note that you need to have "X-windows" installed on your local machine.  For macs, this comes with the developer tools and extras and may not have been installed by default, but should be on your install CD.  For windows, you will need an "X emulator", and will need more help than this blog post can provide...

Graphical SCP to and from /bigtmp/

So, although I prefer to use the command line, I realize that for some, pointing and clicking is an easier way to do things.  It is for these folks that this post is written.
SCP is simply a protocol for copying files securely from one networked machine to another.  The command line is to my mind the simplest way to execute this protocol, but there are other ways as well.

For those who wish to use a point-and-click interface, there's fugu for the mac:

or FileZilla for mac, windows, or linux:


Since I don't personally use either of these much (though I have used them occasionally in the past), I won't go into a tutorial here- but as they are both open source projects, google will surely help if you need guidance.  Happy clicking!

Tuesday, May 15, 2012

Moving things to and from /bigtmp/

/bigtmp/ is our temporary swap directory on Isbrae.  There are lots of ways to get thigns to and from /bigtmp, depending on what platform you use, where you are (whether you are hard-wired to the Fairchild subnet), and your personal preference.

First off, take a look at bigtmp by logging into isbrae, to see what's there:  Open a terminal (I'm on HawleyMBP, my macbook pro) and at your prompt:

HawleyMBP:~ bo$ ssh bo@isbrae
Last login: Mon May 14 17:38:03 2012 from hawleymbp.kiewit.dartmouth.edu
[bo@isbrae ~]$ ls /bigtmp/
[bo@isbrae ~]$

Looks like there's nothing there now- which is not surprising, since bigtmp is wiped clean every weekend (otherwise it wouldn't be very tmp, would it?).

So lets find a file we'd like to put up onto isbrae.  I'll use a matlab file I created for Blythe:

HawleyMBP:~ bo$ ls -lh for_blythe.mat
-rw-r--r--  1 bo  glaciology   458M May 15 09:53 for_blythe.mat
HawleyMBP:~ bo$

Note that this file would never be possible to email, as it's 485Mb! 

So, to copy is up to isbrae, I use "secure copy" or scp:

HawleyMBP:~ bo$ scp for_blythe.mat bo@isbrae:/bigtmp/
for_blythe.mat                                100%  458MB  11.5MB/s   00:40   
HawleyMBP:~ bo$

And then log in and check that the permissions are correct:

HawleyMBP:~ bo$ ssh isbrae
Last login: Tue May 15 09:57:20 2012 from hawleymbp.kiewit.dartmouth.edu
[bo@isbrae ~]$ ls -lh /bigtmp/
total 459M
-rw-r--r-- 1 bo glaciology 459M May 15 09:55 for_blythe.mat
[bo@isbrae ~]$

and those last two 'r's in the permissions line show that both the workgroup (glaciology) and in fact any user on isbrae has read permissions.  So you can send your colleague the path, and he or she should be able to pick it up!  Next time- doing it in another way!