Thursday | 8 January, 2009
Australian Biotechnology News
Explainer: Clusters and candy wrappers
Mark Hall (PC World) 21/08/2002 14:25:53

In a bold declaration to US Congress on May 25, 1961, President John F Kennedy stated, "I believe that this nation should commit itself to achieving the goal, before this decade is out, of landing a man on the moon and returning him safely to the Earth." Eight years later, following an unprecedented technology explosion, the Apollo 11 lunar module set astronauts Neil Armstrong and Buzz Aldrin onto the lunar surface. Along the way, we gained myriad spin-off technologies that were not even dreamed of before the race to the moon began.

Our lives are shaped by those byproducts of the space race, from CAT scans and kidney dialysis, to satellite communications, advanced weather forecasting, and fuel cells. The same challenge that inspired stirring feats in rocketry and space flight is responsible for candy wrappers and cordless power tools, not to mention the dubious cook/chill concept for serving delicious aeroplane food.

Our generation is challenged with a mission no less ambitious (or outrageous) than the moon: to leverage the full potential of genomic knowledge to revolutionise how we cure disease. There is every reason to believe that we, too, will meet our goal and benefit from scientific and technical innovations that must necessarily come of such an endeavor.

The IT industry will clearly be one of the primary beneficiaries of this race to exploit the genome. The emerging computation requirements and data complexity related to R&D are pushing the current boundaries of many different technologies within the IT industry. Following are illustrations of just some of the technology areas in which IDC expects to see accelerated growth.

Large shared-memory server architectures: Much of genomic and proteomic computing makes fewer demands on raw processor speed than it does on the input/output required to move very large amounts of data (terabytes and more) from one place to the next. As soon as a researcher goes beyond microarray analysis and into metabolic and signal transduction pathway simulations, for example, the complexity of the modelling task increases precipitously.

Many researchers say that a new class of large shared-memory supercomputers is critical for these types of genomic comparison and assembly research applications, casting doubt on the premise that "commodity clusters" will suffice for all future biological approaches. IDC believes that both computational environments will likely play a role in bioscience workloads. The real question is whether the current economics will allow high-end niche supercomputer suppliers to invest in new classes of systems.

Clusters and grid computing: That said, a significant portion of life science computational workloads remain "embarrassingly parallel," ie., they tend to consist of running a series of mostly independent jobs with little or no internal communication requirements. The modest node-to-node communication needed for protein threading and microarray analysis, for instance, qualifies them as embarrassingly parallel and as such makes them particularly suitable for distributed computing environments. Subsequently we see increasingly large installations of massively parallel processing (MPP) systems to solve advanced biological and chemical problems.

On the far extreme, IBM's Blue Gene supercomputer -- a 64,000-processor, petaflop (quadrillion floating-point operations per second) behemoth -- will aid the study of protein folding. MPP computers of this size and scale give systems designers an opportunity to work on the core problems of large-scale systems design, such as the use of cellular designs for massively parallel systems, integrated processor-memory logic, error recovery, algorithms, and new programming models and tools.

Heterogeneous data integration and query: The heterogeneity and changeable nature of the data intrinsic to modern research has led to extreme integration and query challenges. In response, many IT vendors are creating integrated analysis, data-mining and data-integration tools, ranging from adaptive, warehouse-based data query schemas, to platforms capable of parsing hundreds of public and private databases.

Unlike high-performance computing, where bigger is always better, it is difficult to speculate the technologies that will emerge from this exercise of turning raw data into knowledge. One possible scenario is a computing utility services model in which the full complement of Web services converges with large-scale grid installations, resulting in a highly specialised Internet search engine surpassing anything we know today. We are likely years away from this type of infrastructure. Much can happen in the meantime.

These are just some of the IT areas likely to benefit from the converging life and information sciences. Other benefits will emerge that are not technological. Much ado has been made of the economic upside to the bio-IT industry from this convergence. Unlike the space race, however, there is no great US governmental patron funding the enterprise. Sadly, our "patron" can rightly be seen as the worsening economics of drug R&D. The drug industry knows the need for IT requires immediate attention if it is to survive.

Kennedy continued his 1961 speech: "No single ... project in this period will be more exciting, or more impressive to mankind, or more important ... and none will be so difficult or expensive to accomplish."

More about Apollo, IBM, IDC
Additional Resources
Newsletter Subscription
Sign up for our Australian Life Scientist newsletters!
 
Sponsored Links