Volunteering for the Cure:
Finding Cancer Fighting Drugs Through Massively Distributed Virtual
Screening
According to Davin Potts, Chief Scientist
at United Devices, Inc., (UD) in Austin Texas, the majority of personal
computers are idle about 95% of the time and they can be harnessed and
aggregated to perform useful work as distributed virtual supercomputers.
To that end, United Devices, a provider of Internet and intranet
distributed software and services, develops and manages infrastructure
required to aggregate idle computation, storage and bandwidth resources
on the Internet and corporate intranets.
The company's Global MetaProcessor
platform is an Internet-based grid, which can be used by organizations
to create a computing backbone by harnessing and aggregating idle cycle
times of networked PCs, servers and workstations. It is analogous to an
electrical grid of high-tension cables through which electrical power is
distributed regionally. This public grid, which resides at Grid.org,
allows database files to be accessed, applications to be distributed and
shared, and individuals and organizations to collaborate on a massive
scale comparable to super computing power. Reasons for deploying a grid
architecture include: speed of using parallel processing over serial
computing; time and cost savings for drug development or for large
computing-intensive projects otherwise unfeasible, and better return on
investment from existing assets including human capital and computing
equipment.
For example, the not-for-profit Cancer
Research Project launched in 2001 uses United Device's Global
MetaProcessor platform to perform massive scale research and analysis.
This large scale, distributed public grid project is the brainchild of
Dr. Graham Richards, Chair of the University of Oxford Chemistry
Department.
Volunteers can download United Devices'
free software program and run it following the installation
instructions. Once installed, the two-megabyte program runs
unobtrusively in the background. The program works on small parts of the
over-all problem that have been divided and distributed to various
devices across the public grid. Project participant's machines are sent
a "unit of molecules" to analyze. Each unit typically contains
100 molecules that are downloaded. Analysis was done using THINK
software, (To Have Information aNd Knowledge). The software attempted to
generate 100 suitable derivatives by making small changes to these
molecules, yielding a maximum of 10,100 per work unit. Thus starting
with a database of 35 million molecules means that 3.5 billion molecules
can be analyzed.
The software, developed by Keith Davies
of Treweren Consultants and the University of Oxford, analyzes molecular
data by creating a three-dimensional model, which changes its shape or
conformation as it attempts to dock into a protein binding site. When a
conformation docks successfully, it triggers an interaction with the
protein and registers as a "hit." The Cancer project depends
upon these hits, since any one hit may lead to a cure. All hits are
recorded, ranked by strength of conformation, and filed for the
project's next stage. Each resultant data set contains the
three-dimensional molecular structures and their corresponding scores
generated during screening. These data are important for the
post-processing phase of the project.
Once processing is complete, which takes
about a day, the program sends results back to a server and requests a
new data packet. If the participant is not online when the processing is
done, his computer will wait to send and receive data packets the next
time he connects to the Internet. The program only runs when computing
resources are idle. If another application needs computing power, the
research program backs down, so computing performance is unimpeded.
"When the Cancer Research project
was originally planned, its goals were purposely kept flexible. The
number of compounds to be screened could be ratcheted up or down
depending upon projected volunteer participation. The project has
produced results far beyond our expectations," said Potts.
The initial scope of project was to use
two protein targets related to different forms of cancer and to screen
them against a library of 200 million drug compounds. These compounds
were previously synthesized and evaluated for drug-like physical
characteristics, such as those likely to be soluble, reactive or easily
metabolized.
Project goals changed over time to meet
the demand in volunteer participation. Now 12 protein targets have been
screened against 3.5 billion molecules. Even at its original scope, it
was still the largest computational chemistry project ever undertaken
thus far, and as Potts notes, "a real world 'torture test' of the
software." From United Devices' end, the project is managed
internally at the company by one full-time equivalent; a database
administrator and a systems' administrator each split project
responsibilities along with their regular duties.
The project is now moving into the second
phase where 'hits' (the molecules) from phase one are put through
another virtual screening process. For the second phase a drug discovery
software program designed by Accelrys Software called LigandFit refines
this data to produce a more manageable list of promising drug candidates
for synthesis and testing. LigandFit helps researchers characterize
therapeutic targets and identify and assess drug candidates by
performing automated docking of flexible ligands to a protein's binding
site. This application runs on project participants' computer screens as
it evaluates the potential of a ligand library to interact with one of
the protein targets.
This second phase of the Cancer Project
is being run in parallel with the Smallpox Project. Volunteers can opt
in for either one project or both projects to run on their PCs.
"These large Internet "public grid" projects differ
managerially from an intranet project managed internally at an
enterprise," said Potts. In the latter case, the IT department
would control what runs on the grid and who can access the data. IT
would also work with internal management on project prioritization and
resource allocation.
For the Cancer Project, it is common for
volunteers who have a family member who is or was battling cancer, to
form a group and track their group's aggregate computing power over
time. "The Internet has allowed any person who wants to make a
genuine contribution to scientific research, and the size and scope of
participation has been a welcome surprise to us," states Potts.
Though massive, not-for-profit Internet
projects like the Cancer Research Project and the recently launched,
(February 5, 2003), Smallpox Project, sponsored by Department of Defense
and IBM are very important, they do not make money for the company.
United Devices generates revenue by selling its enterprise software, to
enable grid computing behind corporate firewalls of life science firms,
one of the company's primary verticals. Drug development offers many
types of problems to solve including 3-dimensional predictive protein
folding and structure determination, and virtual screening, and toxicity
property prediction. For example, Novartis, an enterprise customer of
United Devices, employs its idle cycle time to research computational
structure requirements, and the magnitude of its distributed computing
power rivals some of the world's largest supercomputers.
To measure and record computing power,
several mechanisms are in place ranging from measuring cycle time on
individual PCs, to measuring the aggregate power of groups of PCs for
statistical tracking. Each function, like ligands processed and their
structure, and number of leads identified can be monitored and tracked
for groups on the grid, and statistics can be measured in different
ways. Beyond tracking cycle time, data mining tools can be used to
customize reports.
Creating a grid by aggregating disparate
PCs scattered across the world raises legitimate questions about
protecting IT assets, intellectual property, and individual privacy.
Thus, security audits at United Devices from potential sponsors and
volunteering businesses and individuals are justifiably rigorous. Some
security precautions used by United Devices include scanning all of its
build environments for viruses and digitally signing information sent to
the UD Agent. Files stored locally and files sent to the UD Agent are
encrypted, and there is biometric access control to the UD servers. No
personally identifiable information is required to run the UD Agent,
though some location information is required to have points and CPU time
included on some statistics pages.
Participants are told at every instance
where their e-mail addresses might be used and can allow or forbid
specific e-mail uses. Volunteers also decide what news and information
they want to receive from United Devices and they can view and change
their preferences any time. The UD Agent itself does not read
information beyond its specific directory, except for occasional use of
the Windows temporary directory during processing of work units. Beyond
this, the only information taken from a volunteer's computer by the UD
Agent is the system information required to determine the individual
computer's contribution. All transactions involving this information
exchange go through secure servers to protect the data.
Notes Potts, "As part of a security
audit by Intel we set up a conference call and found out during the call
that the number of security experts from Intel on the line asking
questions outnumbered all of the employees of United Devices. "We
passed their audit," said Potts, "and both Intel and IBM are
encouraging employees to download our software to run inside their
firewalls for the Smallpox Project."