6/10/2005 – Matt Croydon::Postneo

During the last semester I wrote two papers for my Computer Architectures class. I spent quite a bit of time on them and have been thinking about posting them on my weblog for quite some time. I’m a bit worried about plagarism though, and I’m not sure what to do about it. I’m pretty sure that I can submit it to the auto-plagarism-detector service that my university subscribes to, and I’m probably going to do that now that this paper is posted.

Secondly, I’m releasing this paper under the by-nc-sa (Attribution NonCommercial ShareAlike 2.0) license, so unless you can turn in your paper to your teacher with a by-nc-sa license displayed on it, you can’t include it in your paper without proper citation.

PLEASE NOTE: If you are considering plagarising this, please don’t. If your teacher allows you to cite non-academic internet sources, then by all means borrow my ideas and cite me. What I would really suggest doing is taking a look at my primary sources and then heading to your university library or computer system to consult them yourself. All of the ACM journal sources that I cited are available online if your university subscribes to the ACM Portal. This paper was thoroughly researched but there were some late nights involved in the production of it so it is provided WITHOUT WARRANTY against correctness or anything like that. My RAID paper is probably a little better than this one.

This work is licensed under a Creative Commons License.

Clusters:

Open Source Meets

Commodity Hardware

Matt Croydon

CMSC 311

May 6, 2005

Before the mid-1980's, most supercomputers were large, monolithic machines. Over time the Top 500 Supercomputers list has seen clusters go from non-existent to being the dominant architecture[1], currently representing 296 out of the top 500 slots (almost 60%)[2]. Compared to monolithic supercomputers such as those from Cray Research, clusters are extremely cheap for the amount of performance realized. When lower cost is combined with cheap off-the-shelf hardware and open source software platforms, clusters can't help but improve and gain popularity.

Different Tools for Different Jobs

The definition of the word â€œclusterâ€ varies greatly depending on the context in which it is used. A cluster is commonly used in high availability situations when, for example, equipment must gracefully fail over or requests must be divided among the available hardware. For clustered application servers, this can be accomplished by simple round robin DNS entries or more complex load balancing hardware or software.

Clusters are also used when data needs to be stored on multiple machines for redundancy or performance sake. MySQL, an open source database program, can use a cluster of computers for both data replication as well as load balancing[3] in several configurations.

This paper will focus on the most common and most popular form of clustering: clusters used for parallel computing, scientific computing, and simulation in educational, professional, and government organizations. More specifically, it will focus on open source software that is available to make the construction and administration of clusters easier and more powerful.

A Brief History

Cluster computing has its roots in the mid 1980's when developers wanted to tie together multiple computers in order to harness their collective power. In 1985, Amnon Barak developed the first predecessor to Mosix called MOS that ran on a cluster of four Digital Equipment Corporation (DEC) PDP/11 computers[4]. In 1986, DEC decided try try clustering for themselves with VAXCluster[5]. At the time, VAXCluster was able to take advantage of a much higher data rate of 70mbit/sec[5] but because of the proprietary interconnect used, VAXCluster remained much more tightly coupled, while MOS and Mosix decided to use token ring LAN technology[4]. As Mosix was ported to other platforms and improved, it was also available to take advantage of advances in networking technology without its looser coupling being effected. Mosix relied on patches to the Unix kernel in order to allow processes to migrate among nodes in the cluster. Mosix was later ported to the Linux kernel by Moshe Bar[6], where it thrives as an open source project.

Beowulf came on to the scene in the early to mid 1990's with a huge splash. Beowulf allowed users to tie together large numbers of lower cost desktop hardware (486 DXes at the time) rather than the specialized hardware used by Mosix and VAXCluster.[7]. The project originated at NASA's Goddard Space Flight Center and quickly led to a successful open source project[8] as well as a successful business for some of the original developers called Scyld Software.

Beowulf Internals

PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) are standards for preforming parallel operations. Both frameworks have language bindings (for example FORTRAN, C, Perl, and other languages) that abstract the underlying standard in to something easier to work with. The direction of MPI is steered by a group called the MPI Forum. Since the release of the initial specification, the MPI Forum have updated the specification to MPI 2.0, adding features and clarifying issues that were deemed important[9] Each cluster software, tool, or operating system vendor implements their own version of MPI, so cross-platform compability is not guaranteed, but porting between MPI implementations is quite possible.

In contrast to MPI, PVM software is usually provided at the PVM website[10] in either source or binary form. From there users can call the PVM library directly or though third party bindings. PVM provides binaries for Windows, which can allow users to program parallel applications on a platform that they may be more familiar with. However, most Beowulf clusters run standard Linux or some variant thereof. PVM also supports monolithic parallel computers such as Crays and other specialized machines. Further differences and similarities between MPI and PVM can be found in the paper Goals Guiding Design: PVM and MPI by William Gropp and Ewing Lusk[11].

In recent years, another tool called BProc (the Beowulf Distributed Process Space) has expanded the abilities of parallel processing and management of data between nodes. BProc allows a parallel job to be started on the main controlling node and parallel-capable processes are automatically migrated to child nodes[12]. This paradigm is also used by Mosix and OpenMosix, which will be discussed later. BProc is an open source project available at bproc.sourceforge.net.

Parallel processes also need to take in to consideration the amount of time that will be needed for preparation, cleanup, and merging of parallel data. Amdahl's law[26] stipulates that the total execution time for a parallel program is equal to the parallel part of the problem divided by the number of nodes plus the serial part of the program. Even if a cluster contains thousands of nodes, the amount of time ti takes to execute the serial code is going to remain constant.

How to Build a Beowulf[7]

Large Beowulf clusters run complex simulations and crunch teraflops of information per second. At the same time, small 4-16 node clusters are often used in educational settings to teach parallel processing design paradigms to Computer Science students as well as cluster design and implementation to Computer Engineering students.

College Beowulf clusters are often (but not always) comprised of outdated computers and hand-me-down hardware. While extremely fast speeds cannot be obtained with these antiquated clusters, they are valuable in teaching and observing the differences between a program or algorithm written for a single processor machine and the same program/algorithm written for and run on a cluster.

There are several tools available for deploying a Beowulf cluster, but almost all require a basic installation of a compatible Linux distribution on either the mater node or the master and all child nodes. Scyld software makes what is widely considered the easiest to install Beowulf software. All that is needed is a $3 CD[14] containing an unsupported Scyld distribution for the master and each child node. Official copies with commercial support are also available directly from Scyld. Once the CD is booted on the mater note, a simple installation menu is presented. After installing and configuring Scyld on the master node, insert a Scyld CD in each child node and they automatically get their configuration information from the master node and the child nodes can run directly from CD.

Another popular package that runs on top of many modern RPM-based Linux distributions is OSCAR, the Open Source Cluster Application Resources project[15]. OSCAR offers a very simple user interface to install and configure cluster software on the master node. Once that is accomplished, client nodes can be Network booted and the client software automatically installed. OSCAR also supports other installation and boot methods.

While many colleges take the small cluster approach, Virginia Tech has taken advantage of the modern Macintosh platform and created a top 10 supercomputer for a fraction of traditional costs. Virginia Tech started out with desktop machines, but now maintains a cluster of 1100 Apple XServe 1U servers running Mac OS X server (based on an open source BSD-derived core called Darwin).

Another Approach to Clustering: OpenMosix

While most Beowulf clusters are dedicated to cluster-related tasks all the time, clustering does not have to be that way. OpenMosix is a set of patches to the Linux kernel and a few userland monitoring tools for keeping track of where processes are running and how efficiently. OpenMosix is extremely flexible. Nodes can join or leave a cluster whenever they wish. Many programs and algorithms can take advantage of clustering with the automatic node migration built in to OpenMosix. Whenever a new process is spawned or forked (as is common in traditional Unix-like software design) OpenMosix may choose to execute that process locally or on another node.

Many OpenMosix clusters are implemented in a head/client node configuration much like Beowulf clusters, but they are not limited to such configurations. Because OpenMosix is just a patch to the standard kernel, machines in a cluster can have multiple uses. They can run standard graphical window managers and be used as desktop machines while processes are migrated to them if they have computing cycles to spare. OpenMosix does an excellent job at making sure that client nodes still have enough resources to do whatever else they are doing in addition to cluster process execution.

In addition to the mutli-use scenario, OpenMosix cluster nodes can run as true peers. For example, if there are 20 computers currently connected to a dynamic cluster and all but a few of them are idle, processes from the machines being actively used can be automatically migrated for execution throughout the cluster. Similarly, if all computers are heavily used, virtually no process migration will occur since execution will be quicker on the local machine. Also, if 400MHz desktop machine needs to do some complex calculations, as long as the program is written in a way that can take advantage of process migration, those calculations could be run extremely quickly on an idle 3GHz machine. Many of the scenarios above are described in a Linux Journal article entitled Clusters for Nothing and Nodes for Free[16], but also come from my experiences building and experimenting with a 2-3 node OpenMosix cluster a few years ago[17].

Recently the OpenMosix community has embraced â€œinstant clusters,â€ or the idea that any hardware with local network connections can become a cluster without interfering with its other uses. The OpenMosix website lists a page[18] with several open source â€œinstant clusterâ€ software projects. The most popular project is called ClusterKnoppix[19], a Linux distribution with OpenMosix installed on it that runs directly from CD-ROM. With a minimum of one CD burned on a master node, a 30 seat computer lab can instantly become a 30 node cluster without disturbing the operating system installed on the hard drives.

To share data among nodes, OpenMosix uses the Cluster File System, a concept originally developed for the Mosix project called the Mosix File System[4]. The file system was renamed after the Mosix project closed its source code and Moshe Bar and others began working on the GPL-licensed[20] code which would become OpenMosix between 2001 and 2002. This cluster file system along with the ability to run a cluster as peer-nodes gives OpenMosix quite an advantage over traditional monolithic and cluster systems.

How Open Source Helps Clusters

While some computational clusters run on Windows, the vast majority run on top of an open-source Linux distribution. The Linux Kernel itself is open source and depending on the Linux distribution, all, most, or at least some of the operating system is open source. Sometimes Linux distributions can be open source without being free (as in no cost) such as Red Hat Enterprise Linux. There are many excellent free (open source and no cost) Linux distributions to run Beowulf, OpenMosix, or any other type of clustering software on.

There are many open source applications that help users install, configure, and maintin clusters; many have been mentioned before. These include OSCAR, Beowulf and OpenMosix themselves, various PVM and MPI implmentations, BProc, and more. In addition to the tools already mentioned, there is a suite of open source utilies for OpenMosix called openMosixView[21]. The various programs included in the suite allow for visualization as well as graphical management of the cluster, visual feedback for processes, process migration, load per node, and also allow for logging and analysis of cluster performance.

There are many other interesting open source clustering projects that don't require a Beowulf or OpenMosix frame to run on. One of the most popular examples of this is distcc[22], a program that allows for distributed compilation of C or C++ code. Distcc is quite lightweight and does not require a shared filesystem, it just requires child nodes to be running distcc in daemon mode.

The Future of Clusters

While Robert Lucke considers openMosix the next generation of clustering software because of its flexibility[23], some of the most stunning advances are happening in the world of grid and distributed computing[24]. Grid computing can mean different things to different people, but generally extends computing platforms beyond location and geography.

The SETI@home project[25] has managed to create a very powerful supercomputer by utilizing the spare CPU cycles of thousands of desktop machines spread throughout the world. The program usually runs as a screen saver so that it does not consume computing resources while the machine is being actively utilized. SETI@home and other projects are pushing the envelope of using spare processor cycles to tackle a task that would otherwise require large dedicated clusters or supercomputers.

While grid and distributed computing may take away part of the supercomputing market share that clusters (and particularly those built on open source software using commodity hardware), I believe that clusters are here to stay. Individual component prices continues to drop, network throughput is improving, and cluster software continues to evolve. Expect to hear even more about clusters over the next several years.

References

[1] Top500 Supercomputer Sites, â€œCharts for November 2004 â€“ Clusters (NOW),â€ April 2005, http://top500.org/lists/2004/11/overtime.php?c=5.

[2] Top500 Supercomputer Sites, â€œHighlights from Top500 List for November 2004,â€ April 2005, http://top500.org/lists/2004/11/trends.php.

[3] J. Zawodny and D. Balling, High Performance MySQL, Sebastapol: O'Reilly and Associates, 2004, chaps. 7 and 8.

[4] A. Barak et al, The Mosix Distributed Operating System: Load Balancing for Unix, Berlin: Springer-Verlag, 1993, pp. 1-18.

[5] N. Kronenberg et al, â€œVAXcluster: a closely-coupled distributed system,â€ in ACM Transactions on Computer Systems (TOCS), 1986, pp. 130-146.

[6] The openMosix Project, â€œopenMosix, an Open Source Linux Cluster Project,â€ April 2005, http://openmosix.sourceforge.net/.

[7] T. Sterling et al, How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters, Cambridge, Mass: The MIT Press, 1999.

[8] The Beowulf Project, â€œBeowulf.org: The Beowulf Cluster Site,â€ April 2005, http://www.beowulf.org/.

[9] The MPI Forum, â€œMessage Passing Interface,â€ April 2005, http://www-unix.mcs.anl.gov/mpi/.

[10] Computer Science and Mathematics Divison, Oak Ridge National Laboratory, â€œPVM: Parallel Virtual Machine,â€ April 2005, http://www.csm.ornl.gov/pvm/pvm_home.html.

[11] W. Gropp and E. Lusk, â€œGoals Guiding Design: PVM and MPI,â€ in IEEE International Conference on Cluster Computing (CLUSTER'02), 2002, pp. 257-268.

[12] E. Hendricks, â€œBProc: The Beowulf Distributed Process Spaceâ€ in Proceedings of the 16
th
international conference on Supercomputing, 2002, pp. 129-136.

[13] P. Prins, â€œTeaching Parallel Computing Using Beowulf Clusters: A Laboratory Approach,â€in Journal of Computing Sciences in College, 2004, pp. 55-61.

[14] Linux Central, â€œCDROM with Scyld Beowulf,â€ April 2005, http://linuxcentral.com/catalog/index.php3?prod_code=L000-089.

[15] Open Source Cluster Application Resources, â€œOSCAR: Open Source Cluster Application Resourcesâ€, April 2005, http://oscar.openclustergroup.org/.

[16] A. Perry et al, â€œClusters for Nothing and Nodes for Free,â€ Linux Journal, Vol 2004, Issue 123, July, 2004.

[17] Matt Croydon, â€œOpenMosix Success,â€ April 2005, http://www.postneo.com/2002/11/20/openmosix-success.

[18] openMosix, â€œInstant openMosix, The Fast Path to an openMosix Cluster,â€ April 2005, http://openmosix.sourceforge.net/instant_openmosix_clusters.html.

[19] ClusterKnoppix, â€œClusterKnoppix: Main Page,â€ April 2005 http://bofh.be/clusterknoppix/.

[20] Open Source Initiative, â€œOpen Source Initiative – The GPL:Licensingâ€ April 2005 http://www.opensource.org/licenses/gpl-license.php.

[21] openMosixView, â€œopenMosixView: a cluster-management GUI,â€ April 2005, http://www.openmosixview.com/index.html.

[22] Martin Pool, â€œdistcc: a fast, free distributed C/C++ compiler,â€ April 2005,

http://distcc.samba.org/.

[23] R. Lucke, Building Clustered Linux Systems (Hewlett-Packard Professional Books), Upper Saddle River, New Jersey: Prentice Hall, 2004.

[24] M. Holliday et al, â€œA Geographically-distributed, Assignment-structured, Undergraduate Grid Computing Courseâ€ in Proceedings of the 36th SIGCSE technical symposium on Computer science education, 2005, pp 206-210.

[25] The SETI@home Project, â€œSETI@home: Search for Extraterrestrial Intelligence at Home,â€ April 2005, http://setiathome.ssl.berkeley.edu/.

[26] G. Pfister, In Search of Clusters: The Coming Battle in Lowly Parallel Computing. Upper Saddle River, New Jersey: Prentice Hall, 1998. pp. 184-185.

Day: June 10, 2005

A9 Search Beta Rocks!

Clusters: Open Source Meets Commodity Hardware