Author: Matt Croydon

  • Clusters: Open Source Meets Commodity Hardware

    During the last semester I wrote two papers for my Computer Architectures class. I spent quite a bit of time on them and have been thinking about posting them on my weblog for quite some time. I’m a bit worried about plagarism though, and I’m not sure what to do about it. I’m pretty sure that I can submit it to the auto-plagarism-detector service that my university subscribes to, and I’m probably going to do that now that this paper is posted.

    Secondly, I’m releasing this paper under the by-nc-sa (Attribution NonCommercial ShareAlike 2.0) license, so unless you can turn in your paper to your teacher with a by-nc-sa license displayed on it, you can’t include it in your paper without proper citation.

    PLEASE NOTE: If you are considering plagarising this, please don’t. If your teacher allows you to cite non-academic internet sources, then by all means borrow my ideas and cite me. What I would really suggest doing is taking a look at my primary sources and then heading to your university library or computer system to consult them yourself. All of the ACM journal sources that I cited are available online if your university subscribes to the ACM Portal. This paper was thoroughly researched but there were some late nights involved in the production of it so it is provided WITHOUT WARRANTY against correctness or anything like that. My RAID paper is probably a little better than this one.

    Creative Commons License
    This work is licensed under a Creative Commons License.

    Clusters:

    Open Source Meets

    Commodity Hardware

    Matt Croydon

    CMSC 311

    May 6, 2005

    Before the mid-1980's, most supercomputers were large, monolithic machines. Over time the Top 500 Supercomputers list has seen clusters go from non-existent to being the dominant architecture[1], currently representing 296 out of the top 500 slots (almost 60%)[2]. Compared to monolithic supercomputers such as those from Cray Research, clusters are extremely cheap for the amount of performance realized. When lower cost is combined with cheap off-the-shelf hardware and open source software platforms, clusters can't help but improve and gain popularity.

    Different Tools for Different Jobs

    The definition of the word “cluster” varies greatly depending on the context in which it is used. A cluster is commonly used in high availability situations when, for example, equipment must gracefully fail over or requests must be divided among the available hardware. For clustered application servers, this can be accomplished by simple round robin DNS entries or more complex load balancing hardware or software.

    Clusters are also used when data needs to be stored on multiple machines for redundancy or performance sake. MySQL, an open source database program, can use a cluster of computers for both data replication as well as load balancing[3] in several configurations.

    This paper will focus on the most common and most popular form of clustering: clusters used for parallel computing, scientific computing, and simulation in educational, professional, and government organizations. More specifically, it will focus on open source software that is available to make the construction and administration of clusters easier and more powerful.

    A Brief History

    Cluster computing has its roots in the mid 1980's when developers wanted to tie together multiple computers in order to harness their collective power. In 1985, Amnon Barak developed the first predecessor to Mosix called MOS that ran on a cluster of four Digital Equipment Corporation (DEC) PDP/11 computers[4]. In 1986, DEC decided try try clustering for themselves with VAXCluster[5]. At the time, VAXCluster was able to take advantage of a much higher data rate of 70mbit/sec[5] but because of the proprietary interconnect used, VAXCluster remained much more tightly coupled, while MOS and Mosix decided to use token ring LAN technology[4]. As Mosix was ported to other platforms and improved, it was also available to take advantage of advances in networking technology without its looser coupling being effected. Mosix relied on patches to the Unix kernel in order to allow processes to migrate among nodes in the cluster. Mosix was later ported to the Linux kernel by Moshe Bar[6], where it thrives as an open source project.

    Beowulf came on to the scene in the early to mid 1990's with a huge splash. Beowulf allowed users to tie together large numbers of lower cost desktop hardware (486 DXes at the time) rather than the specialized hardware used by Mosix and VAXCluster.[7]. The project originated at NASA's Goddard Space Flight Center and quickly led to a successful open source project[8] as well as a successful business for some of the original developers called Scyld Software.

    Beowulf Internals

    PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) are standards for preforming parallel operations. Both frameworks have language bindings (for example FORTRAN, C, Perl, and other languages) that abstract the underlying standard in to something easier to work with. The direction of MPI is steered by a group called the MPI Forum. Since the release of the initial specification, the MPI Forum have updated the specification to MPI 2.0, adding features and clarifying issues that were deemed important[9] Each cluster software, tool, or operating system vendor implements their own version of MPI, so cross-platform compability is not guaranteed, but porting between MPI implementations is quite possible.

    In contrast to MPI, PVM software is usually provided at the PVM website[10] in either source or binary form. From there users can call the PVM library directly or though third party bindings. PVM provides binaries for Windows, which can allow users to program parallel applications on a platform that they may be more familiar with. However, most Beowulf clusters run standard Linux or some variant thereof. PVM also supports monolithic parallel computers such as Crays and other specialized machines. Further differences and similarities between MPI and PVM can be found in the paper Goals Guiding Design: PVM and MPI by William Gropp and Ewing Lusk[11].

    In recent years, another tool called BProc (the Beowulf Distributed Process Space) has expanded the abilities of parallel processing and management of data between nodes. BProc allows a parallel job to be started on the main controlling node and parallel-capable processes are automatically migrated to child nodes[12]. This paradigm is also used by Mosix and OpenMosix, which will be discussed later. BProc is an open source project available at bproc.sourceforge.net.

    Parallel processes also need to take in to consideration the amount of time that will be needed for preparation, cleanup, and merging of parallel data. Amdahl's law[26] stipulates that the total execution time for a parallel program is equal to the parallel part of the problem divided by the number of nodes plus the serial part of the program. Even if a cluster contains thousands of nodes, the amount of time ti takes to execute the serial code is going to remain constant.

    How to Build a Beowulf[7]

    Large Beowulf clusters run complex simulations and crunch teraflops of information per second. At the same time, small 4-16 node clusters are often used in educational settings to teach parallel processing design paradigms to Computer Science students as well as cluster design and implementation to Computer Engineering students.

    College Beowulf clusters are often (but not always) comprised of outdated computers and hand-me-down hardware. While extremely fast speeds cannot be obtained with these antiquated clusters, they are valuable in teaching and observing the differences between a program or algorithm written for a single processor machine and the same program/algorithm written for and run on a cluster.

    There are several tools available for deploying a Beowulf cluster, but almost all require a basic installation of a compatible Linux distribution on either the mater node or the master and all child nodes. Scyld software makes what is widely considered the easiest to install Beowulf software. All that is needed is a $3 CD[14] containing an unsupported Scyld distribution for the master and each child node. Official copies with commercial support are also available directly from Scyld. Once the CD is booted on the mater note, a simple installation menu is presented. After installing and configuring Scyld on the master node, insert a Scyld CD in each child node and they automatically get their configuration information from the master node and the child nodes can run directly from CD.

    Another popular package that runs on top of many modern RPM-based Linux distributions is OSCAR, the Open Source Cluster Application Resources project[15]. OSCAR offers a very simple user interface to install and configure cluster software on the master node. Once that is accomplished, client nodes can be Network booted and the client software automatically installed. OSCAR also supports other installation and boot methods.

    While many colleges take the small cluster approach, Virginia Tech has taken advantage of the modern Macintosh platform and created a top 10 supercomputer for a fraction of traditional costs. Virginia Tech started out with desktop machines, but now maintains a cluster of 1100 Apple XServe 1U servers running Mac OS X server (based on an open source BSD-derived core called Darwin).

    Another Approach to Clustering: OpenMosix

    While most Beowulf clusters are dedicated to cluster-related tasks all the time, clustering does not have to be that way. OpenMosix is a set of patches to the Linux kernel and a few userland monitoring tools for keeping track of where processes are running and how efficiently. OpenMosix is extremely flexible. Nodes can join or leave a cluster whenever they wish. Many programs and algorithms can take advantage of clustering with the automatic node migration built in to OpenMosix. Whenever a new process is spawned or forked (as is common in traditional Unix-like software design) OpenMosix may choose to execute that process locally or on another node.

    Many OpenMosix clusters are implemented in a head/client node configuration much like Beowulf clusters, but they are not limited to such configurations. Because OpenMosix is just a patch to the standard kernel, machines in a cluster can have multiple uses. They can run standard graphical window managers and be used as desktop machines while processes are migrated to them if they have computing cycles to spare. OpenMosix does an excellent job at making sure that client nodes still have enough resources to do whatever else they are doing in addition to cluster process execution.

    In addition to the mutli-use scenario, OpenMosix cluster nodes can run as true peers. For example, if there are 20 computers currently connected to a dynamic cluster and all but a few of them are idle, processes from the machines being actively used can be automatically migrated for execution throughout the cluster. Similarly, if all computers are heavily used, virtually no process migration will occur since execution will be quicker on the local machine. Also, if 400MHz desktop machine needs to do some complex calculations, as long as the program is written in a way that can take advantage of process migration, those calculations could be run extremely quickly on an idle 3GHz machine. Many of the scenarios above are described in a Linux Journal article entitled Clusters for Nothing and Nodes for Free[16], but also come from my experiences building and experimenting with a 2-3 node OpenMosix cluster a few years ago[17].

    Recently the OpenMosix community has embraced “instant clusters,” or the idea that any hardware with local network connections can become a cluster without interfering with its other uses. The OpenMosix website lists a page[18] with several open source “instant cluster” software projects. The most popular project is called ClusterKnoppix[19], a Linux distribution with OpenMosix installed on it that runs directly from CD-ROM. With a minimum of one CD burned on a master node, a 30 seat computer lab can instantly become a 30 node cluster without disturbing the operating system installed on the hard drives.

    To share data among nodes, OpenMosix uses the Cluster File System, a concept originally developed for the Mosix project called the Mosix File System[4]. The file system was renamed after the Mosix project closed its source code and Moshe Bar and others began working on the GPL-licensed[20] code which would become OpenMosix between 2001 and 2002. This cluster file system along with the ability to run a cluster as peer-nodes gives OpenMosix quite an advantage over traditional monolithic and cluster systems.

    How Open Source Helps Clusters

    While some computational clusters run on Windows, the vast majority run on top of an open-source Linux distribution. The Linux Kernel itself is open source and depending on the Linux distribution, all, most, or at least some of the operating system is open source. Sometimes Linux distributions can be open source without being free (as in no cost) such as Red Hat Enterprise Linux. There are many excellent free (open source and no cost) Linux distributions to run Beowulf, OpenMosix, or any other type of clustering software on.

    There are many open source applications that help users install, configure, and maintin clusters; many have been mentioned before. These include OSCAR, Beowulf and OpenMosix themselves, various PVM and MPI implmentations, BProc, and more. In addition to the tools already mentioned, there is a suite of open source utilies for OpenMosix called openMosixView[21]. The various programs included in the suite allow for visualization as well as graphical management of the cluster, visual feedback for processes, process migration, load per node, and also allow for logging and analysis of cluster performance.

    There are many other interesting open source clustering projects that don't require a Beowulf or OpenMosix frame to run on. One of the most popular examples of this is distcc[22], a program that allows for distributed compilation of C or C++ code. Distcc is quite lightweight and does not require a shared filesystem, it just requires child nodes to be running distcc in daemon mode.

    The Future of Clusters

    While Robert Lucke considers openMosix the next generation of clustering software because of its flexibility[23], some of the most stunning advances are happening in the world of grid and distributed computing[24]. Grid computing can mean different things to different people, but generally extends computing platforms beyond location and geography.

    The SETI@home project[25] has managed to create a very powerful supercomputer by utilizing the spare CPU cycles of thousands of desktop machines spread throughout the world. The program usually runs as a screen saver so that it does not consume computing resources while the machine is being actively utilized. SETI@home and other projects are pushing the envelope of using spare processor cycles to tackle a task that would otherwise require large dedicated clusters or supercomputers.

    While grid and distributed computing may take away part of the supercomputing market share that clusters (and particularly those built on open source software using commodity hardware), I believe that clusters are here to stay. Individual component prices continues to drop, network throughput is improving, and cluster software continues to evolve. Expect to hear even more about clusters over the next several years.

    References

    [1] Top500 Supercomputer Sites, “Charts for November 2004 – Clusters (NOW),” April 2005, http://top500.org/lists/2004/11/overtime.php?c=5.

    [2] Top500 Supercomputer Sites, “Highlights from Top500 List for November 2004,” April 2005, http://top500.org/lists/2004/11/trends.php.

    [3] J. Zawodny and D. Balling, High Performance MySQL, Sebastapol: O'Reilly and Associates, 2004, chaps. 7 and 8.

    [4] A. Barak et al, The Mosix Distributed Operating System: Load Balancing for Unix, Berlin: Springer-Verlag, 1993, pp. 1-18.

    [5] N. Kronenberg et al, “VAXcluster: a closely-coupled distributed system,” in ACM Transactions on Computer Systems (TOCS), 1986, pp. 130-146.

    [6] The openMosix Project, “openMosix, an Open Source Linux Cluster Project,” April 2005, http://openmosix.sourceforge.net/.

    [7] T. Sterling et al, How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters, Cambridge, Mass: The MIT Press, 1999.

    [8] The Beowulf Project, “Beowulf.org: The Beowulf Cluster Site,” April 2005, http://www.beowulf.org/.

    [9] The MPI Forum, “Message Passing Interface,” April 2005, http://www-unix.mcs.anl.gov/mpi/.

    [10] Computer Science and Mathematics Divison, Oak Ridge National Laboratory, “PVM: Parallel Virtual Machine,” April 2005, http://www.csm.ornl.gov/pvm/pvm_home.html.

    [11] W. Gropp and E. Lusk, “Goals Guiding Design: PVM and MPI,” in IEEE International Conference on Cluster Computing (CLUSTER'02), 2002, pp. 257-268.

    [12] E. Hendricks, “BProc: The Beowulf Distributed Process Space” in Proceedings of the 16
    th
    international conference on Supercomputing, 2002, pp. 129-136.

    [13] P. Prins, “Teaching Parallel Computing Using Beowulf Clusters: A Laboratory Approach,”in Journal of Computing Sciences in College, 2004, pp. 55-61.

    [14] Linux Central, “CDROM with Scyld Beowulf,” April 2005, http://linuxcentral.com/catalog/index.php3?prod_code=L000-089.

    [15] Open Source Cluster Application Resources, “OSCAR: Open Source Cluster Application Resources”, April 2005, http://oscar.openclustergroup.org/.

    [16] A. Perry et al, “Clusters for Nothing and Nodes for Free,” Linux Journal, Vol 2004, Issue 123, July, 2004.

    [17] Matt Croydon, “OpenMosix Success,” April 2005, http://www.postneo.com/2002/11/20/openmosix-success.

    [18] openMosix, “Instant openMosix, The Fast Path to an openMosix Cluster,” April 2005, http://openmosix.sourceforge.net/instant_openmosix_clusters.html.

    [19] ClusterKnoppix, “ClusterKnoppix: Main Page,” April 2005 http://bofh.be/clusterknoppix/.

    [20] Open Source Initiative, “Open Source Initiative – The GPL:Licensing” April 2005 http://www.opensource.org/licenses/gpl-license.php.

    [21] openMosixView, “openMosixView: a cluster-management GUI,” April 2005, http://www.openmosixview.com/index.html.

    [22] Martin Pool, “distcc: a fast, free distributed C/C++ compiler,” April 2005,

    http://distcc.samba.org/.

    [23] R. Lucke, Building Clustered Linux Systems (Hewlett-Packard Professional Books), Upper Saddle River, New Jersey: Prentice Hall, 2004.

    [24] M. Holliday et al, “A Geographically-distributed, Assignment-structured, Undergraduate Grid Computing Course” in Proceedings of the 36th SIGCSE technical symposium on Computer science education, 2005, pp 206-210.

    [25] The SETI@home Project, “SETI@home: Search for Extraterrestrial Intelligence at Home,” April 2005, http://setiathome.ssl.berkeley.edu/.

    [26] G. Pfister, In Search of Clusters: The Coming Battle in Lowly Parallel Computing. Upper Saddle River, New Jersey: Prentice Hall, 1998. pp. 184-185.

  • Marklar

    MacMerc:

    01:27 PM – HOT Here comes Intel: talking about processor transitions now – from 68k to PC. Apple is switching to Intel from PPC. “Time for a brain transplant.” 2006-2006 cited.

  • PalmOne LifeDrive: Cool But not $500 Cool

    LifeDrive

    Last week I played with a PalmOne LifeDrive at CompUSA and I’ve gotta say that it’s a nice little device. The screen resolution is definitely a little nicer than budget devices, but at 320×480 isn’t exactly bleeding edge. The design is quite pleasing too; it almost has the feel of a G5 desktop. I also like the ability of switching between portrait and desktop mode quickly. I can do that on my Dell Axim X30 but it’s definitely not as quick as on the LifeDrive. Landscape mode is also perfect for the included Blazer browser, which when coupled with bluetooth or Wi-Fi makes for a good pocket-sized browser. Combine all that with a 4 gig microdrive and you’ve got a pretty nice little platform.

    It’s a pretty nice little platform, but is it worth $500? I don’t think so. If you’re really worried about hauling around a bunch of data in your pocket, $500 can get you a 60 gig iPod photo and $50 to spare or a 20 gig Archos AV420. Granted neither of those offer PIM functions, but I’m not sure how compelling PIM + 4 gigs to spare is. I was also a little bummed at how pokey the responsiveness on the LifeDrive was. Click something, wait just a little bit, and there it is. I know that the OS doesn’t run from the MicroDrive but accessing photos and stuff requires a bit of drive spinning. It’s not that the LifeDrive felt slow when accessing photos or media, it felt a bit slow in general, even when doing something that didn’t involve the MicroDrive at all.

    Another thing that got me is that it looks like the battery is internal and not user replacable. Having been screwed by a CompUSA warranty and the flaky battery on the Tungsten E, I don’t think I would ever consider buying a Palm device without a user replacable battery.

    I could be wrong, Palm could have a big hit on their hands with the LifeDrive, but my guess is that they’ll have to drop the price point a hundred bucks or so before they really start moving units.

  • Apple Going Intel?

    I’ve got to say that I won’t belive this one until the man in jeans and a black turtle neck says so. Can the Mac really surive another platform jump? Then again, I’d probably subscribe to the $129 yearly operating system plan if it meant I could run OSX on my x86 hardware.

    I’m inclined to believe it more now that Scoble says he got confirmation on the story. I can’t imagine the move being received very well by the developers paying top dollar to attend WWDC. It isn’t over till Sir Steve keynotes, but I’ll definitely be refreshing several Mac news pages like a madman.

    We shall see…

  • RAID: Redundant Array of [Independent|Inexpensive] Disks

    During the last semester I wrote two papers for my Computer Architectures class. I spent quite a bit of time on them and have been thinking about posting them on my weblog for quite some time. I’m a bit worried about plagarism though, and I’m not sure what to do about it. I’m pretty sure that I can submit it to the auto-plagarism-detector service that my university subscribes to, and I’m probably going to do that now that this paper is posted.

    Secondly, I’m releasing this paper under the by-nc-sa (Attribution NonCommercial ShareAlike 2.0) license, so unless you can turn in your paper to your teacher with a by-nc-sa license displayed on it, you can’t include it in your paper without proper citation.

    PLEASE NOTE: If you are considering plagarising this, please don’t. If your teacher allows you to cite non-academic internet sources, then by all means borrow my ideas and cite me. What I would really suggest doing is taking a look at my primary sources and then heading to your university library or computer system to consult them yourself. All of the ACM journal sources that I cited are available online if your university subscribes to the ACM Portal. This paper was thoroughly researched but there were some late nights involved in the production of it so it is provided WITHOUT WARRANTY against correctness or anything like that.

    Creative Commons License
    This work is licensed under a Creative Commons License.

    Matt Croydon

    CMSC 311

    March 9, 2005

    The term RAID originally stood for “Redundant Arrays of Inexpensive Disks” [1], although an effort has been made to replace Inexpensive with Independent [2] in order to deemphasize the importance of cost. In modern practice, the words can be used interchangeably, and in most computer-oriented contexts the meaning is commonly understood. RAID technology was developed to improve upon monolithic SLED (Single Large Expensive Disks) [1] devices. In addition to being large and expensive, these drives have fixed input and output levels and in the late 80’s and early 90’s were not keeping pace with the rest of semiconductor technology [2].

    There are several discreet configurations or levels of RAID, each with its advantages and disadvantages. The various levels are conceptual, and not necessarily tied to a specific implementation. RAID can be accomplished on either the hardware or software level. Hardware-based RAID tends to provide higher overall performance while software-based RAID offers lower cost and greater flexibility.

    Redundancy is required because as more disks are added to an array, the MTTF (Mean Time to Failure) [1] decreases sharply. For example, if each individual drive is rated for 30,000 hours and if there are 100 disks in the array, the MTTF for the array is the MTTF of each individual drive divided by the number of drives. The MTTF of the 100 drive array is 30 hours, a long cry from the 30,000 hours that each unit is rated for [2].

    RAID Level 0 and JBOD

    RAID 0 is not part of the original specification [3] and provides absolutely no redundancy; however it does employ data striping. Data striping is an important concept in some RAID configurations. RAID 0 is often implemented in hardware controllers that also support other levels of RAID. RAID 0 allows extremely write performance but does not significantly improve on read access time [2].

    The other non-redundant RAID technology is JBOD, which stands for “Just a Bunch of Disks” [4]. JBOD uses either RAID hardware or software to combine multiple disks so that they appear as one logical device to the operating system. JBOD allows for easy storage capacity expansion and is in common usage on both Windows and Linux platforms among others.

    RAID Level 1

    RAID 1 uses mirroring in order to achieve redundancy [3]. For every disk of data, there is a mirrored disk that contains an exact copy of the original disk [5]. While every write to the array has to be performed twice (fist on the original drive, then to the mirrored drive), read speeds can be improved. Because there are two copies of the data, the drive that can retrieve the data quickest can be used. Both drives may also simultaneously serve read requests thereby increasing the read speed. If one drive in a two drive array fails, the remaining drive can be used for reading and writing until the defective disk can be replaced. Once a new drive is placed in the array, data can be copied over and eventually mirroring once again takes place in real time.

    RAID Level 2

    RAID 2 uses the same ECC (Error Correcting Code) as ECC memory [2]. In addition to the data disks, a number of check disks are used to store the ECC data. If Hammering ECC is used, an array of 10 data disks would need 4 check disks and an array of 25 data disks would require 5 check disks [1]. The extra disks are required to be able to detect and repair an unrecoverable error. In RAID 2, data is striped bit by bit across the data disks while the ECC data is written to the check disks [1].

    RAID Level 3

    The next level of RAID assumes that most hardware or software RAID controllers will be able to detect an error. A single check disk can be used to recover from an error, so if we leave the job of error detection to the controller and eliminate all but one of the check disks as compared to RAID 2 [1]. This strategy cuts down on cost without sacrificing redundancy as long as every bit on all of th other data disks and the check disk can be successfully read. The contents of the bad disk can be obtained by finding the parity of the disks that have not failed and comparing each bit to the parity of all of the disks as stored on the check disk. If the values are identical, the bad disk originally held a 0 in that position. If the values differ, it held a 1 [1].

    RAID Level 4

    RAID 4 also only uses one check disk but stripes data across the data drives in chunks rather than bit by bit. The check disk stores the parity information for each chunk of data. RAID 4 is very efficient for systems such as transaction processing that require many very small reads from the disk array. If the data is smaller than the storage chunk size, the array can furnish multiple request simultaneously [6].

    RAID Level 5

    RAID 5 is the most commonly deployed configuration [7] in commercial settings and distributes the parity blocks evenly across all disks [2]. Because the data and parity are spread across all disks, RAID 5 excels at both small and large reads, and large writes. RAID 5 requires a “read-modify-write” [2] cycle to calculate and write parity information, so RAID 5 is less than optimal when it comes to many small writes. [2]

    Advanced RAID Configurations

    There are several hybrid RAID configurations that while not in the original RAID specification, can improve reliability and redundancy in certain situations. RAID 6 employs two distinct parity calculations for each chunk of data stored [4]. RAID 6 appears to be more theoretical than practical; as there are no guidelines for implementing it. RAID 6 differs from most RAID configurations in that it can recover from two unrecoverable errors, as long as the rest of the data and parity information can be read successfully.

    While many combinations of RAID components are possible, only a few are common. These include RAID 10, RAID 50, and RAID 0+1. RAID 10 significantly improves reliability by providing “a stripe set across mirrored pairs” [7]. This means that RAID 10 can recover from two total failures as long as the failures are on opposite sides of the mirror. Similarly, RAID 50 combines two RAID 5 arrays. RAID 50 is extremely redundant and not practical for most purposes. RAID 0+1 simply constructs a RAID 1 array out of several RAID 0 arrays. In RAID 0+1, one disk failure brings down the mirror half of the array until the bad disk is replaced [7].

    Increasing RAID Throughput

    Many modern hardware RAID controllers contain onboard memory caches to speed up input and output. Caching of data and parity blocks was found to increase throughput in the early to mid 90’s [9]. The physical location of parity blocks in RAID 5 has been proven to influence throughput [10]. In their study, Lee and Katz determined that left-symmetric, extended-left-symmetric, and flat-left-symmetric parity configurations were the best for overall use [10]. The absolute best parity configuration for RAID 5 drives depends on the size and number of both reads and writes.

    Strategies for Increased Reliability

    There are several ways to increase RAID reliability, even in simple arrays. Because the different RAID levels are merely suggestions for how to accomplish redundancy, specific implementations may vary. For a simple 2 disk RAID 1 array, you have the option of placing both disks on one hardware controller or (if supported) you may place each disk on its own controller and have the two controllers coordinate mirroring [7]. In this configuration, the failure of any one RAID controller does not bring down the entire array.

    Hybrid arrays (as discussed in the Advanced RAID Configurations section above) can also increase reliability by creating mutli-tiered or multi-leveled arrays. Advanced configurations need to be used with caution, since the MTTF decreases exponentially as the total number of disks increases.

    As per-disk capacity increases, it is possible to implement RAIDs with identical storage capacity while using fewer overall disks. If fewer disks are used, the MTTF increases. Unfortunately with increased storage capacity comes an increased need for storage, so decreasing the total number of disks in a RAID may not be possible.

    RAID Today

    In the early days of RAID research, SCSI was the only technology that easily allowed for RAID configurations. Today that is changing rapidly with the introduction of extremely large capacity IDE and Serial ATA drives as well as lower cost hardware controller cards for them. These lower costs to entry have allowed RAID to spread from university research labs and large corporations all the way down to home users seeking data protection. Many mid-range to high end motherboards have a built-in IDE or Serial ATA RAID controller built in.

    RAID technology is also being used extensively in large server farms and storage facilities. Elaborate collections of RAID arrays are often combined with network technology such as SAN (storage area networks) and NAS (network attached storage) to meet the always-on accessible-anywhere needs of today’s customers.

    RAID has also become an built-in part of Microsoft’s Windows operating system and has also been incorporated in to the Linux Kernel [11]. Software-based RAID further reduces entry costs, though generic IDE RAID controllers can be found in stores for well below $50. A more well known hardware RAID controller from Adaptec or others can rage from $100 for IDE to several hundred dollars for advanced SCSI Ultra 160 controllers.

    Conclusion

    Using a RAID may lull users in to a false sense of security. Most RAID configurations protect against only one unrecoverable error and usually require that every other bit be read successfully in order to recover the data. Just because a RAID is in use does not mean that users are invincible. Rigorous and recoverable backups should also be implemented in addition to the use of RAID technology.

    With that caution in mind, RAID can provide redundancy that would not otherwise be available. If a specific RAID configuration is tailored to a specific profile (many small writes, continuous large reads, etc) a significant increase in throughput can be realized.

    RAID, a technology that started out as graduate and Doctoral research projects, now powers a wide array of technology from home computers to large datacenters. RAID allows advanced research facilities and corporate databanks alike to achieve redundancy on collections of data that commonly reach terabytes and petabytes [12].

    References

    [1] D. Patterson, G. Gibson, and R. Katz, “A Case for Redundant Arrays of Inexpensive Disks (RAID),” in Proceedings of the 1988 ACM SIGMOD international conference on Management of data, 1988, pp. 109-116.

    [2] P. Chen et al, “RAID: High-Performance, Reliable Secondary Storage,” ACM Computing Surveys, Vol 26, pp. 145-185, June 1994.

    [3] M. Scnier, Ed., Dictionary of PC Hardware and Data Communications Terms, Sebastopol: O’Reilly and Associates, 1996, pp.362-363.

    [4] M. Shooman, Reliability of Computer Systems and Networks, New York: John Wiley and Sons, 2002, pp.119-126.

    [5] G. Gibson, Redundant Disk Arrays: Reliable, Parallel Secondary Storage, Cambridge: MIT Press, 1992.

    [6] R. Jain et al. Eds., Input/Output in Parallel and Distributed Computing Systems, Boston: Kluwer Academic Publishers, 1996, pp.106-108.

    [7] C. Zacker and J. Rourke, PC Hardware: The Complete Reference, Berkeley: Osborne/McGraw Hill, 2001, pp.606-613.

    [8] PC Guide, “Multiple (Nested) RAID Levels”, March 2005, http://www.pcguide.com/ref/hdd/perf/raid/levels/mult.htm.

    [9] J. Menon and J. Cortney, “The Architecture of a fault-tolerant cached RAID controller,” in Proceedings of the 20th annual international symposium on Computer architecture, 1993, pp.76-87

    [10] E. Lee and R. Katz, “Performance consequences of parity placement in disk arrays,” in Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, 1991, pp.190-199.

    [11] I. Molnar, G. Oxman, and M. de Icaza, “Kernel Korner: The New Linux RAID Code,” Linux Journal, Vol 1997, Article No. 25, December, 1997.

    [12] Los Alamos National Laboratories Networked Systems Research Team, “Announcements,” March 2005, http://public.lanl.gov/netsys/.

  • DLP on the Big Screen

    Last weekend I saw Star Wars: Revenge of the Sith in one of the handful of theatres (I count 4) in the DC-Virginia-Frederick-Baltimore metro area equipped with DLP (Digital Light Processing) technology. Just like Mike Washlesky at The Mac Observer, I was blown away. I first noticed the crispness and clarity when the first preview splash screen came up and was blown away by the effects and their digital projection throughout the movie. The movie won’t top my greats list but it was a lot of fun and great to see in digital.

    Further reading:

    • Episode III Digital Theater List: Make sure you find the one digital theater, buy your tickets online, and show up early for a good seat.
    • DLPMovies: An excellent place to find your local DLP theater (if there is one). DLPMovies found more theaters in my area than the ones showing Star Wars, including one that is currently showing Madagascar in DLP.
    • DLP.com: A lot of marketing, but it boils down to amazing picture quality and an insane contrast ratio.
    • DLP Wikipedia entry: Excellent information as always.
  • SDLQuake on Maemo

    SDLQuake on Maemo x86

    Yep, it had to be done. Above you can see SDLQuake running on Maemo x86. I haven’t tried it on the ARM target but I heard that it or a port of it should run just fine on ARM. Between various emulators and game engines, it shouldn’t be hard at all to amuse yourself with a Nokia 770.

    No changes were required for this x86 build. ./configure, make and run-standalone.sh ./sdlquake.

  • Python For Series 60: 1.1.0 Pre-Alpha

    There’s a new version out, 1.1.0 Pre-Alpha. Grab the .SIS installer for first edition devices (3650, N-Gage, etc) or for 2nd edition devices. Don’t forget to pick up the first edition or second edition SDK

    I’ll read over the new API docs tonight and hope to find all kinds of juicy morsels.

    Update: Erik Smartt fills in some details on his weblog. Thanks again to the whole Python for Series 60 team for all the hard work.

  • Embedded D-BUS

    I’ve written about D-BUS before, but I just wanted to say that I love what I’ve seen with what Maemo does with D-BUS. All kinds of great stuff from application launching to state change notification is done with D-BUS. I strongly believe that D-BUS is going to rock both on the desktop and on mobile devices. D-BUS provides the infrastructure needed to build something like Growl for localhost and should allow apps to communicate with each other without having to worry about the fine details. I expect to see lots of advancements involving D-BUS in the next year and it will definitely improve the Linux/Gnome experience.

  • High Tech Baltimore

    Baltimore Emerging Technology CenterA few weeks ago I saw a spot on TV about the Baltimore Emerging Technology Center, an early stage incubator for local high tech startups. They appear to house a wide range of high tech startups at 3 different locations in Baltimore. The current list of participants shows quite a bit of promise from biotech to IT services.

    The ETC is funded by The Baltimore Development Corporation. Other cool stuff can be found at the Greater Baltimore Technology Council.

    Looking around these sites definitely gave me a feel for the state of the art (so to speak) in high tech startups in Baltimore. It didn’t get the press that Northern Virginia did back in the dot com days, but things are definitely happening up there.

  • More Maemo Madness

    Calcoo
    Calcoo, an RPN and algebraic calculator
    Gnuboy 3x zoom
    Gnuboy, zoomed 3x using xgnuboy.
    VTE
    VTE terminal emulator.

    More successful builds on Maemo x86 today. I’m still in the information gathering stage, trying to find projects that are worth spending more time on doing “proper” hildonization to. All of the above screenshots were derived from downloading a source tarball and running ./configure and make, nothing more. VTE was exciting because it didn’t fail out on dependencies that I can’t easily provide and with run-standalone.sh the soft keyboard just popped up. Having a decent usable terminal emulator is going to be a key item for a physical 770 device. The error-free build is encouraging.

    When I have some more time I would like to package some of the apps that I have been tinkering with up in .debs for distribution, but I’d like to stress that everything I’ve posted in the last few days builds on x86 with little or no modification. They’re far from well integrated Maemo apps and I’ve only tested a handful on an ARM target (I’m waiting for the next scratchbox/qemu release to do any real testing), but it’s definitely a start.

  • More Maemo Success

    I managed to get a few more things compiled and running on Maemo (mostly on x86) over the weekend. Proper Maemo ports are also starting to come in from new sources. This CPU/Memory usage meter is hildonized and designed to fit in the top statusbar. Great little hack! There are a bunch of gnome-applet style things that would do great in that status bar.

    Two of the best places for fully-featured Maemo apps are INDT’s Maemo Apps pageand the Kernel Concepts Maemo page.

    With that said, on to some more low-hanging fruit of apps that compiled with little or no modification (on most of these ./configure and make “just worked”):

    MPlayer playing Rocketboom in a window
    MPlayer windowed playing the intro to Rocketboom
    Fullscreen MPlayer
    MPlayer fullscreen, zoomed 2x.
    GView
    GView: A very lightweight image viewer.
    Glock
    Glock: an analog clock.

    The smaller apps, GView and Glock could really rock if they were ported with a basic hildonized interface. Over the weekend I also got XChat built and running on x86 (here’s a screenshot) with everything but in-channel text input working perfectly. I still think that a properly ported Gaim would be the best graphical IRC interface for Maemo right now.

    The weekend is over but I hope to do some proper maemo hacking during my downtime this week.

    Update: MPlayer chewed up a lot of CPU and as such is probably not going to run very well on the device itself, especially since so much time and effort has been put in to tweaking GStreamer for the platform. I’ve grown accustomed to the “throw anything at MPlyaer” approach to Linux multimedia, so I had to try…

  • Maemo Emulation

    For those of you looking to kick it oldskool when the Nokia 770 comes out, there might be a few open source projects that fit the bill. Frodo 4.1b, a Commodore 64 emulator, and Atari800 built and ran without modification on the x86 Maemo development environment. Frodo chewed through almost all of the CPU on my Athlon XP 2500+, so getting that to run on the actual device might be a bit iffy. I saw some .asm files in the Atari800 source, which may be the downfall of Atari800 on Arm.

    We’ll see though. I tried a few more modern emulators (gameboy, nes, snes, etc) but nothing looked like it was going to be easy to get work at first glance. Here are a few screenshots of what I was able to get running this evening:

    Atari800 on Maemo x86
    Atari800 on Maemo x86
    Frodo C64 Emulator on Maemo x86
    Frodo (C64 emulator) on Maemo x86

  • GKrellM on Maemo!

    GKrellM running in Maemo

    This totally rocks! I’ve been tinkering around with Maemo this evening and I’ve been hunting around for apps to run on it. After a few misses I decided to try GKrellM. I was amazed when it “just worked,” compiled and ran in my Maemo x86 scratchbox! If you’d like to play for yourself, grab the source (I used 2.2.7), unpack it, run make, then go to the src dir and run run-standalone.sh ./gkrellm

    I haven’t set up an Arm environment using QEMU yet to test it there, but that’s next.

    Update: Here are some more screenshots showing how maemo handles the configuration dialogs.

    GKrellM Config: right click
    When you right click on the top of the GKrellM window.
    GKrellM Config: main config menu
    The main configuration menu.
    GKrellM Config: license dialog
    License dialog.

    Update: GKrellM isn’t quite as happy running on ARM under QEMU:

    GKrellM on Arm: not so happy
    Not so happy.
    GKrellM on Arm: menus
    No fonts in the menus.
    GKrellM on Arm: after restarting
    After running af-sb-init.sh restart.

    I’m told that changing to the ARM target while still running the X session is just asking for trouble. Oops! I’m not sure if the display issue is due to QEMU or if it will be a problem on an actual device. If anyone gets a chance to run it on a real live device I’d love to hear how it goes. Here’s the error I get over and over when running GKrellM on SDK_ARM using QEMU 0.6.1:

    (gkrellm:4068): Pango-CRITICAL **: pango_context_load_font: assertion `pango_font_description_get_size (desc) != 0' failed

  • Agile Web Development with Rails

    Loud Thinking:

    Dave has released the Agile Web Development with Rails for beta consumption. The demand within the first hour has already been huge, so if your download takes a little longer than the 5 minutes, that’s why. I’m really proud to see this happen. Many thanks to Dave and the people who’ve contributed. End of brief transmission from Brazil.

    My beta book/dead tree combo pack has been ordered and little elves are preparing my PDF as we speak. Beta books rock! I have a copy of Thinking in C# that I picked up when it was being offered as a $5 PDF download. Good thing too, since it never made it to print.

    I have no problem paying cover price for a book if I can get near-instant access to it in beta form. Sure I could have waited till July and snagged it for $35 from Amazon, but what’s the fun in that? Kudos to the PragProg folk and the authors of the book for getting this Beta Book thing together.

  • Hello, Maemo!

    Hello, Maemo!

    Woohoo! After some help from the fine folks in #maemo on irc.freenode.net (special thanks to czr), I’m up and running the Maemo SDK. The biggest gotcha is that Maemo is expecting a 16 bit color depth and dies a horrible death if you try to run Xnest in 24 bit mode. This can be rectified by editing /etc/X11/xorg.conf (if you’re running XOrg) and changing DefaultDepth to 16 under the Screen section. If you’re feeling adventurous, Xephyr allows you to run a 16 bit window in a 24 bit environment.

    Maemo still complains a bit when I run af-sb-init.sh start, but clicking on the title bar actually does stuff rather than crash. Next step is to get one of the sample apps to compile. Thanks again to the crew in #maemo for helping me out!

  • The Nokia 770 and Maemo: Totally Amazed

    There’s been tons of coverage of the Nokia 770 and the open source Linux platform Maemo that goes with it. Everyone is excited, with reason, but I don’t think that the significance of the 770 and the platform have sunk in yet.

    We’re talking about a Linux based tablet with good resolution (800×480) running a custom GTK-based UI similar to Series 90 on a device with Wi-Fi and Bluetooth, 64 megs of ram, 128 megs of flash memory, and about a 3 hours battery life. The size is right too. Nokia have also been working with open source developers to adapt gstreamer for their uses, and the development environment is right there for you, enjoy.

    I’m really excited by this page detailing porting gaim to maemo. The port isn’t trivial, but it’s not far from it looks like a lot of the effort went in to making it look and behave better rather than getting it to actually run. Here’s a screenshot that just knocked my socks off:

    Gaim on Maemo

    I have been (sort of) following these instructions and now have a (sort of) working maemo environment up and running. I’m psyched to see Python (2.3.3) as well as Perl (5.8.4) in the development environment. I hope that both of these end up on the device or at least installable as an option. I’d also really like to see Mono or Java running on this little thing.

    I’m going to go tinker a bit more with the development environment (which uses scratchbox) and see if I can’t figure out what I’ve done wrong on the X side of things. It also looks like you can use QEMU to run apps compiled for ARM.

    Related reading:

    I get some nasty messages when I run af-sb-init.sh start and end up with the following almost (but not quite) working environment:

    Almost Maemo

    I used install.sh and then followed the installation instructions from there. I used this tarball as the rootstrap since I couldn’t find the file mentioned in the docs. I’ve got a shell, which is better than nothing, but I’m still working on a working X session…

  • Tab Clutter del.icio.us Dump (Using Ruby)

    In honor of cleaning out the tab clutter brought on by Crash Recovery, here is a del.icio.us dump of a bunch of tabs that I posted and cleared out this evening:

    I used Rubylicious to download and display the links. I’m a Pythonic kinda guy, so Ruby synax still throws me, but I’m excited that I can throw together a list of links in just a few lines of (ugly) Ruby:


    #!/usr/bin/ruby
    # a hack that loads the gem path stuff correctly on my Ubuntu box
    require 'rubygems'
    require 'rubilicious'
    r = Rubilicious.new('your_username', 'your_password')
    puts '<ul>'
    # Get 40 entries and write them as an unordered list
    recent_links = r.recent(nil, 40).map { |post| puts '<li><a href="' +
    post['href'] + '">' + post['description'] + '</a>: ' +
    post['extended'] + '</li>'}
    puts '</ul>'

    I still have a lot of Ruby syntax to learn, but aside from syntax errors in my head, Ruby has been awesome. I’ve still only done some basic tinkering with Ruby and Rails, but I hope to have time to do more in the future.

  • Things I can\’t Live Without: Firefox Crash Recovery

    I really don’t know what I did before I discovered the Firefox extension that changed my life: Crash Recovery. I started using it after mplayer-plugin made Firefox on Ubuntu Hoary a bit, er, unstable. Shortly after installing it I discovered that it is far more useful than its intended use (you know, recovering from crashes and all).

    I quite literally haven’t quit firefox properly in weeks. Crash Recovery allows me to save my browser state in a way that I’ve never been able to before. On Linux I’ll just issue a shutdown command, which will kill Firefox in the process. When I boot up and start Firefox again, I’m exactly where I was when I shut down, after Firefox eats my processor and sucks bandwidth for a minute or two. If I need to restart in a hurry (perhaps after installing an extension), I can just find out the PID and kill it from the command line.

    I also have Crash Recovery running on my desktop at work so I can get a jump on whatever was in my browser while I download email and fire up a bunch of shells. Same thing goes there, I have Windows kill it at the end of the day or if I need it dead quick I find the “end process” button.

    Of course after a few weeks of this I tend to end up with tab clutter that I just can’t seem to get rid of. The first half of my tabs are things that I’ve been meaning to read for a few days or more but just haven’t had time to check out. I’m hoping that this del.icio.us extension might make bookmarking stuff a little easier, as the process of using the experimental bookmarklet just takes more time than I’d like if I have tabs running a foot to the right of the end of my computer screen.

    Tab clutter aside this plugin has really changed the way I do things across platforms. I would strongly suggest that anyone interested in a) something that can let you recover from a crash or b) save your browser state in a wicked way should check this puppy out.

  • Gratuitous Use of Technology

    John Resig:

    If all you have is a hammer, everything looks like a nail.

    That’s exactly what I’ve been thinking for a day or two since I read Rael’s radar post. I’ve been screaming to myself “there has to be a better way!” While the end product is pretty and polished, the public data isn’t easily human readable and isn’t easily machine readable. It’s sort of nothing at all really.

    At the same time, I’m not sure what the best solution is. Would this be a good use of RDF? Would one construct a DOAB (Description of a Book) file or possibly add DOAB information to a FOAF file? Would the information best be stored in an XML document with a custom schema? RSS? Atom? Something completely different? What’s the best way to attack this? My spidey sense tells me that the answer is not a Backpack list with a particular syntax, but I’m clueless as to what the right answer is.