Sunday, June 22, 2008

The death of a supercomputer



In Feburary of 2005, Purdue was given the hardware that used to be Blue Horizon, the San Diego Supercomputing Center's old IBM RS/6000 SP supercomputer, which was purchased through funding from NSF. When it was new in 2000, it placed 8th on the list of the top 500 supercomputers in the world. When Purdue acquired it, the system was well off the bottom of the chart. The system was a set of 144, 8-processor 375MHz POWER3 "SP high node" (9076-N81) systems with 4GB of ram each.

The people in charge of my department, named the Rosen Center for Advanced Computing had decided that the price of the system (free plus shipping) was good enough to send two of our hardware guys out to condense the system down, maxing out the systems, and going from two 4-processor modules to four modules (16 processors) and from 4GB to 16GB of memory per system. At the time, we had the free power and floorspace, so it seemed like it could be a reasonable idea, and the systems were still computationally useful for a year or so after we got them. We condensed the system down from nearly 40 racks of machines to just 10, each one somewhere around twice as fast as my dual-G5 in doing a Linux kernel compile (one of my standard metrics for testing speed; I did this under a Debian Linux install on both systems).

Unfortunately. the amount of time and effort necessary to set up an IBM SP and AIX to be a useful compute resource is non-trivial. Adding to the problem, we lost two of our senior systems administrators in the Summer of 2005, one of which was our AIX guru and had set up our existing IBM SP systems. I had played around a bit with our testing SP cluster, including reinstalling it, and discovered how much effort is required to make a useful system. Just the software necessary to do a base OS install on an SP has an install manual that's over an inch thick.

So, by Summer of 2006, our management finally decided that we should get rid of the system, which meant that some of it would be coming home with me. So, I purchased the system from Purdue's surplus store for somewhere around $500.

The first rack of nodes (there's four nodes per rack) went onto ebay, and sold to a researcher in China for about enough money to pay for my endeavor. I shipped one more rack to a computer collector in New Jersey, and a Saturday evening after the annual Vintage Computer Festival/Midwest show that I ran, one more rack of systems was loaded into the back of a Toyota minivan and headed up with a lawnmower to Ontario, Canada. Later, a second rack would go to Canada, a few would get scrapped for parts, some nodes were stripped for ram to sell to a reseller, another rack to a company in Minnesota, and the rest sat around in my warehouse until they got scrapped, or used for parts. All that remain are two of the original nodes, and a few boxes of boards, heat sinks, memory and other parts that need to go a scrapper, so that they can be recycled.

I'm still keeping the two nodes, and 8 or so SP thin nodes (9076-270) , partly because the POWER architecture is neat, and partly because a machine with 16GB of ram in it is still kinda pricy. Plus, in a bit more than a month, we'll be retiring our remaining SP system, which has memory modules that can push my two nodes to 64GB each. Sure, you can buy faster machines with the same amount of memory, which are smaller and use less power, from people like Sun, but I can also pay for a lot of power with the amount of money that one new machine would cost.

I've actually managed to get Debian Linux running on them; they don't work with many kernel version, but a 2.6.8 kernel that came with a past Debian installer seemed to work OK with them. I'm trying to revive the nodes I have, but I don't seem to be able to acquire the bootable installer anymore from the usual Debian places, and I'm not sure if I have it archived off somewhere accessible. Still, I should be able to take an installed copy running on a different machine, clone it and rebuild and install the correct kernel version, and boot that on the system. I just haven't had enough time or round tuits to do that yet.

So it seems, even in the death of a supercomputer, the machine still lives on, dissected, and disseminated to other countries, providing what help it can to further science, maybe just become an interesting conversation piece, or even become recycled into parts for tomorrow's supercomputing hardware.

No comments: