|  | 
 
| Systems Biology: Toward System-level Understanding of Biological Systems
 Hiroaki Kitano
 Systems biology is a new field in biology that aims at system-level understanding
 of biological systems. While molecular biology has led to remarkable
 progress in our understanding of biological systems, the current
 focus is mainly on identification of genes and functions of their products
 which are components of the system. The next major challenge is to
 understand at the system level biological systems that are composed of
 components revealed by molecular biology. This is not the first attempt at
 system-level understanding, since it is a recurrent theme in the scientific
 community. Nevertheless, it is the first time that we may be able to understand
 biological systems grounded in the molecular level as a consistent
 framework of knowledge. Now is a golden opportunity to uncover the essential
 principles of biological systems and applications backed up by indepth
 understanding of system behaviors. In order to grasp this opportunity,
 it is essential to establish methodologies and techniques to enable us
 to understand biological systems in their entirety by investigating: (1) the
 structure of the systems, such as genes, metabolism, and signal transduction
 networks and physical structures, (2) the dynamics of such systems,
 (3) methods to control systems, and (4) methods to design and modify systems
 for desired properties. This chapter gives an overview of the field of
 systems biology that will provide a system-level understanding of life.
 INTRODUCTION
 The ultimate goal of biology is to understand every detail and principle of
 biological systems. Almost fifty years ago,Watson and Crick identified the
 structure of DNA (Watson and Crick, 1953), thus revolutionizing the way
 biology is pursued. The beauty of their work was that they grounded biological
 phenomena on a molecular basis. This made it possible to describe
 every aspect of biology, such as heredity, development, disease, and evolution,
 on a solid theoretical ground. Biology became part of a consistent
 framework of knowledge based on fundamental laws of physics.
 Since then, the field of molecular biology has emerged and enormous
 progress has been made. Molecular biology enables us to understand biological
 systems as molecular machines. Today, we have in-depth understanding
 of elementary processes behind heredity, evolution, development,
 and disease. Such mechanisms include replication, transcription,
 translation, and so forth.
 Large numbers of genes and the functions of their transcriptional
 products have been identified, with the symbolic accomplishment of the
 complete sequencing of DNA. DNA sequences have been fully identi-
 fied for various organisms such as mycoplasma, Escherichia coli (E. coli),
 Caenorhabditis elegans (C. elegans), Drosophila melanogaster, and Homo sapiens.
 Methods to obtain extensive gene expression profiles are now available
 that provide comprehensive measurement at the mRNA level. Measurement
 of protein level and their interactions is also making progress
 (Ito et al., 2000" Schwikowski et al., 2000). In parallel with such efforts,
 various methods have been invented to disrupt the transcription of genes,
 such as loss-of-function knockout of specific genes and RNA interference
 (RNAi) that is particularly effective for C. elegans and is now being applied
 for other species.
 There is no doubt that our understanding of the molecular-level mechanisms
 of biological systems will accelerate. Nevertheless, such knowledge
 does not provide us with an understanding of biological systems as
 systems. Genes and proteins are components of the system. While an understanding
 of what constitutes the system is necessary for understanding
 the system, it is not sufficient.
 Systems biology is a new field of biology that aims to develop a
 system-level understanding of biological systems (Kitano, 2000). Systemlevel
 understanding requires a set of principles and methodologies that
 links the behaviors of molecules to system characteristics and functions.
 Ultimately, cells, organisms, and human beings will be described and
 understood at the system level grounded on a consistent framework of
 knowledge that is underpinned by the basic principles of physics.
 It is not the first time that system-level understanding of biological
 system has been pursued" it is a recurrent theme in the scientific community.
 Norbert Wiener was one of the early proponents of system-level understanding
 that led to the birth of cybernetics, or biological cybernetics
 (Wiener, 1948). Ludwig von Bertalanffy proposed general system theory
 (von Bertalanffy, 1968) in 1968 in an attempt to establish a general theory
 of the system, but the theory was too abstract to be well grounded. A precursor
 to such work can be found in the work of Cannon, who proposed
 the concept of “homeostasis” (Cannon, 1933).With the limited availability
 of knowledge from molecular biology, most such attempts have focused
 on the description and analysis of biological systems at the physiological
 level. The unique feature of systems biology that distinguishes it from
 past attempts is that there are opportunities to ground system-level understanding
 directly on the molecular level such as genes and proteins,
 whereas past attempts have not been able to sufficiently connect systemlevel
 description to molecular-level knowledge. Thus, although it is not
 2 Hiroaki Kitano
 the first time that system-level understanding has been pursued, it is the
 first time to have an opportunity to understand biological systems within
 the consistent framework of knowledge built up from the molecular level
 to the system level.
 The scope of systems biology is potentially very broad and different
 sets of techniques may be deployed for each research target. It requires
 collective efforts from multiple research areas, such as molecular biology,
 high-precision measurement, computer science, control theory, and
 other scientific and engineering fields. Research needs to be carried out
 in four key areas: (1) genomics and other molecular biology research, (2)
 computational studies, such as simulation, bioinformatics, and software
 tools, (3) analysis of dynamics of the system, and (4) technologies for highprecision,
 comprehensive measurements.
 This constitutes a major multi-disciplinary research effort that will
 enable us to understand biological systems as systems. But what does
 this mean? “System” is an abstract concept in itself. It is basically an
 assembly of components in a particular formation, yet it is more than a
 mere collection of components. To understand the system, it is essential
 that it can be not only to describe in detail, but also it to comprehend what
 happens when certain stimuli or disruptions occur. Ultimately, we should
 be able to design the system to meet specific functional properties. It takes
 more than a simple in-depth description" it requires more active synthesis
 to ensure that we have fully understood it.
 To be more specific, in order to understand biological systems as systems,
 we must accomplish the following.
 System Structure Identification: First of all, the structures of the system
 need to be identified, primarily such as regulatory relationships of
 genes and interactions of proteins that provide signal transduction and
 metabolism pathways, as well as the physical structures of organisms,
 cells, organella, chromatin, and other components.
 Both the topological relationship of the network of components as well
 as parameters for each relation need to be identified. The use of highthroughput
 DNA microarray, protein chips, RT-PCR, and other methods
 to monitor biological processes in bulk is critical. Nevertheless, methods
 to identify genes and metabolism networks from these data have yet to be
 established.
 Identification of gene regulatory networks1 for multicellular organisms is
 even more complex as it involves extensive cell-cell communication and
 physical configuration in three-dimensional space. Structure identification
 for multicellular organisms inevitably involves not only identifying the
 structure of gene regulatory networks and metabolism networks, but also
 understanding the physical structures of whole animals precisely at the
 1 In this article, the term “gene regulatory networks” is used to represent networks of gene
 regulations, metabolic pathways, and signal transduction cascades.
 3 Systems Biology: Toward System-level Understanding of Biological Systems
 cellular level. Obviously, new instrumentation systems need to be developed
 to collect necessary data.
 System Behavior Analysis: Once a system structure is identified to a certain
 degree, its behavior needs to be understood. Various analysis methods
 can be used. For example, one may wish to know the sensitivity of
 certain behaviors against external perturbations, and how quickly the system
 returns to its normal state after the stimuli. Such an analysis not only
 reveals system-level characteristics, but also provides important insights
 for medical treatments by discovering cell response to certain chemicals
 so that the effects can be maximized while lowering possible side effects.
 System Control: In order to apply the insights obtained by system structure
 and behavior understanding, research into establishing a method to
 control the state of biological systems is needed. How can we transform
 cells that are malfunctioning into healthy cells? How can we control cancer
 cells to turn them into normal cells or cause apoptosis? Can we control
 the differentiation status of a specific cell into a stem cell, and control it to
 differentiate into the desired cell type? Technologies to accomplish such
 control would enormously benefit human health.
 System Design: Ultimately, we would like to establish technologies that
 allow us to design biological systems with the aim of providing cures for
 diseases. One futuristic example would be to actually design and grow
 organs from the patient’s own tissue. Such an organ cloning technique
 would be enormously useful for the treatment of diseases that require
 organ transplants. There may be some engineering applications by using
 biological materials for robotics or computation. By using materials that
 have self-repair and self-sustaining capability, industrial systems will be
 revolutionized.
 This chapter discusses scientific and engineering issues to accomplish
 in-depth understanding of the system.
 MEASUREMENT TECHNOLOGIES AND EXPERIMENTAL METHODS
 Toward Comprehensive Measurements
 A comprehensive data set needs to be produced to grasp an entire picture
 of the organism of interest. For example, the entire sequence has been deduced
 for yeast, and a microarray that can measure the expression level
 of all known genes is readily available. In addition, extensive studies of
 protein-protein interactions using the two-hybrid method are being carried
 out (Ito et al., 2000" Schwikowski et al., 2000). Efforts to obtain highresolution
 spatiotemporal localization data for protein are underway.
 C. elegans is an example of an intensively measured multi-cellular
 organism. A complete cell lineage has already been identified (Sulston et
 al., 1983" Sulston and Horvitz, 1977), the topology of the neural system
 4 Hiroaki Kitano
 has been fully described (White et al., 1986), the DNA sequence has been
 fully identified (The C. elegans Sequencing Consortium, 1998), a project
 for full description of gene expression patterns during development using
 whole-mount in situ hybridization (Tabara et al., 1996) is underway, and
 the construction of a systematic and exhaustive library of mutants has
 begun. In addition, a series of new projects has started for measuring
 neural activity in vivo, and for automatic construction of cell lineage in real
 time using advanced image processing combined with special microscopy
 (Yasuda et al., 1999" Onami et al., 2001a).
 While yeast and C. elegans are examples of comprehensive and exhaustive
 understanding of biological systems, similar efforts are now being
 planned for a range of biological systems. Although these studies are currently
 limited to understanding the components of the system and their
 local relationship with other components, the combination of such exhaustive
 experimental work and computational and theoretical research would
 provide a viable foundation for systems biology.
 Measurement for Systems Biology
 Although efforts to systematically obtain comprehensive and accurate
 data sets are underway, systems biology is much more demanding for
 experimental biologists than the current practice of biology. It requires a
 comprehensive body of data and control of the quality of data produced
 so that it can be used as a reference point of simulation, modeling, and
 system identification. Eventually, many of the current experimental procedures
 must be automated to enable high-throughput experiments to be
 carried out with precise control of quality. Needless to say, not all biological
 experiments will be carried out in such an automated fashion, for
 important contributions will be made by small-scale experiments. Nevertheless,
 large-scale experiments will lay the foundation for system-level
 understanding.
 High-throughput, comprehensive, and accurate measurement is the
 most essential part of biological science. While expectations are high for
 a computational approach to overcome limitations in the traditional approach
 in biology, it will never generate serious results without experimental
 data upon which computational studies can be grounded. For the
 computational and systems approach to be successful, measurement has
 to be (1) comprehensive, (2) quantitatively accurate, and (3) systematic.
 While the requirement for quantitative accuracy is obvious, the other
 two criteria need further clarification. Comprehensiveness can be further
 classified into three types:
 Factor comprehensiveness: Comprehensiveness in terms of target factors
 that are being measured, such as numbers of genes and proteins. It is important
 that measurement is carried out intensively for the factors (genes
 5 Systems Biology: Toward System-level Understanding of Biological Systems
 and proteins) that are related to the central genes and proteins of interest.
 Unless all genes and proteins are measured, how effectively measurement
 covers the factors of interest is more important, rather than the sheer number
 of factors measured.
 Time-series comprehensiveness: In modeling and analysis of a dynamical
 system, it is important to capture its behavior with fine-grain time series.
 Traditional biological experiments tend to measure only the change
 before and after a certain event. For computational analysis, data measured
 at a constant time interval are essential in addition to traditional
 sampling points.
 Item comprehensiveness: There are cases where several features, such as
 transcription level, protein interaction, phosphorylation, localization, and
 other features, have to be measured intensively for the specific target.
 “Systematic” means that measurement is performed in such a way that
 obtained data can be consistently integrated. The ideal systematic measurement
 is simultaneous measurement of multiple features for a single
 sample. It is not sufficient to develop a sophisticated model and perform
 analysis using only the mRNA or protein level. Multiple data need to be
 integrated. Then, each data point has to be obtained using samples that are
 consistent across various measurements. If samples are prepared in substantially
 different ways, two data points cannot be integrated. Although
 this requirement sounds obvious, very few data sets meet these criteria
 today.
 These criteria are elucidated in the scenario below with some examples
 of requirements for experimental data.
 For example, to infer genetic regulatory networks from an expression
 profile, comprehensive measurement of the gene expression profile needs
 to be carried out. Expression data in which only the wild-type is measured
 is generally unusable for this purpose. The data should have a comprehensive
 set of deletion mutant and overexpression of each gene. Desirable
 data sets knock out all genes that are measured in the microarray. If only
 a limited number of genes can be knocked out due to cost and time constraints,
 it is critical that genes that are expected to be tightly coupled are
 intensively knocked out rather than knocking out genes sparsely over the
 whole possible regulatory network. This is due to computational characteristics
 of the reverse engineering algorithm that constructs the gene
 regulatory network from profile data. With such algorithms, sparse data
 points leave almost unlimited ambiguities on possible network structures.
 Even with the same number of data points, the algorithm produces much
 more reliable network hypotheses if measured genes are closely related.
 This is what is meant by factor comprehensiveness.
 Time-series comprehensiveness is required for phenomena that are
 time aligned. Time-series profile data need to be prepared with particular
 caution in terms of time synchronization of samples to be measured.
 6 Hiroaki Kitano
 It is often the case in traditional experiments that only two measurement
 points are set: one before the event and one after the event. For example,
 many studies in cellular aging research measured the expression level of
 aging-related genes for young cells, aged cells, and immortalized cells,
 without measuring changes of expression level on fine-grain time series.
 In some cases, time-series changes of expression level can be important
 information to create candidate hypotheses or eliminate possible mechanisms.
 In addition to measurements before and after a biologically interesting
 event, measurement should be carried at a constant time interval.
 Expression profile data that has reliable sample time synchrony and constant
 time interval is most useful to enable the computational algorithm
 to reliably fit models and parameters to experimental data.
 Additional information from protein-protein interactions, such as
 from yeast two-hybrid experiments, is very useful to infer protein-level
 interactions that fill the gap between regulation of genes. Both protein interactions
 and expression profiles should be measured on samples that are
 prepared identically. This systematic measurement requirement is rather
 hard to meet currently, because not many research groups are proficient
 in multiple measurement techniques.
 After obtaining gene regulatory networks, one needs to find out specific
 parameters used in the network. To understand dynamics, it is essential
 that each parameter regarding the network is obtained, so that various
 numerical simulations and analyses can be performed. Such parameters
 are binding constant, transcription rate, translation rate, chemical reaction
 rate, degradation rate, diffusion rate, speed of active transport, etc. Except
 for special cases, such as red blood cells, these constants are not readily
 available. Measurement using extracts provides certain information, but
 often these rate constants vary drastically in vivo. Ideally, comprehensive
 measurement of major parameters would be performed in vivo, but any
 measurement that gives reasonable estimates would be of great help. In
 addition to parameter measurement, it is critically important to measure
 the phosphorylation state at high resolution.
 While accuracy is important, the level of accuracy required may vary
 depending on which part of the system is to be measured. In some parts of
 the network, the system behavior is sensitive to specific parameter values,
 and thus has to be measured with high accuracy. In other parts of the
 system, the system may be robust against fluctuations of large magnitude.
 In such a case, it may often suffice to confirm that the parameter values fall
 within the range of stability, instead of obtaining highly accurate figures.
 The point is that not all parts of the system need to be tuned with the
 same precision. For example, components for jet engines may have to
 be produced with high precision, but seat belts do not have to achieve
 the same precision as jet engine components. In future, the type and
 accuracy requirements for experiments may be determined by theoretical
 requirements.
 7 Systems Biology: Toward System-level Understanding of Biological Systems
 The examples given so far have focused on the process of identification
 of network structure and parameters that enable simulation and analysis
 of biochemical networks under the simplified assumption that all materials
 are distributed homogeneously in the environment. Unfortunately, this
 is not the case in biological systems. There are subcellular structures and
 localization of transcription products that cause major diversion from a
 naive model. Multi-cellular systems require measurement of cell-cell contact,
 diffusion, cell lineage, gene expression during development, etc. For
 accurate simulation and analysis, these features have to be measured in
 a comprehensive, accurate, and systematic manner. We have not developed
 devices to obtain high-throughput measurements for any of these
 features. This is a serious issue that has to be addressed.
 Next-generation Experimental Systems
 To cope with increasing demands for comprehensive and accurate measurement,
 a set of new technologies and instruments needs to be developed
 that offers a higher level of automation and high-precision measurement.
 First, dramatic progress in the level of automation of experimental
 procedures for routine experiments is required in order to keep up
 with increasing demands for modeling and system-level analysis. Highthroughput
 experiments may turn into a labor-intensive nightmare unless
 the level of automation is drastically improved. Further automation
 of experimental procedures would greatly benefit the reliability of experiments,
 throughput, and total cost of the whole operation in the long run.
 Second, cutting-edge technologies such as micro-fluid systems, nanotechnology
 and femto-chemistry may need to be introduced to design and
 build next-generation experimental devices. The use of such technologies
 will enable us to measure and observe the activities of genes and proteins
 in a way that is not possible today. It may also drastically improve the
 speed and accuracy of measurement for existing devices.
 In those fields where there are obvious needs, such as sequencing
 and proteomics, the above goals are already pursued. Beyond the development
 of high-throughput sequencers using high-density capillary array
 electrophoresis, efforts are being made to develop integrated microfabricated
 devices that enable PCR and capillary electrophoresis in a single
 micro device (Lagally et al., 1999" Simpson et al., 1998). Such devices
 not only enable miniaturization and precision measurements, but will also
 significantly increase the level of automation.
 In the developmental biology of C. elegans, identification of cell lineage
 is one of the major issues that needs to be accomplished to assist
 analysis of the gene regulatory network for differentiation. The first attempt
 to identify cell lineage was carried out entirely manually (Sulston
 et al., 1983" Sulston and Horvitz, 1977), and it took several years to iden-
 8 Hiroaki Kitano
 tify the lineage of the wild type. Four-dimensional microscopy allowed
 us to collect multi-layer confocal images at a constant time interval, but
 lineage identification is not automatic. With the availability of exhaustive
 RNAi knockout for C. elegans, high-throughput cell lineage identifi-
 cation is essential to explore the utility of the exhaustive RNAi. Efforts are
 underway to fully automate cell lineage identification, as well as threedimensional
 nuclei position data acquisition (Onami et al., 2001a), fully
 utilizing advanced image processing algorithms and massively parallel
 supercomputers. Such devices meet some of the criteria presented earlier,
 and provide comprehensive measurement of cell positions with high
 accuracy. With automation, high-throughput data acquisition can be expected.
 If the project succeeds, it can be used to automatically identify the
 cell lineage of all RNAi knockout for early embryogenesis. The technology
 may be augmented, but with major efforts, to automatically detect cell-cell
 contact, protein localization, etc.
 Combined with whole mount in situ hybridization and possible future
 single-cell expression profiling, complete identification of the gene regulatory
 network for C. elegans may be possible in the near future.
 SYSTEM STRUCTURE IDENTIFICATION
 There are various system structures that need to be identified, such as
 the structural relationship among cells in the developmental process, detailed
 cell-cell contact configuration, membrane, intra-cellular structures,
 and gene regulatory networks. While each of these has significance in corresponding
 research in systems biology, this section focuses on how the
 structure of gene regulatory networks can be identified, primarily because
 it is a subject of growing interest due to the rapid uncovering of genomic
 information, and it is the control center of various cellular phenomena.
 In order to understand a biological system, we must first identify
 the structure of the system. For example, to identify a gene regulatory
 network, one must identify all components of the network, the function of
 each component, interactions, and all associated parameters. All possible
 experimental data must be used to accomplish this non-trivial task. At
 the same time, inference results from existing experiments should enable
 the prediction of unknown genes and interactions, which can then be
 experimentally verified.
 The difficulty is that such a network cannot be automatically inferred
 from experimental data based on some principles or universal rules, because
 biological systems evolve through stochastic processes and are not
 necessarily optimal. Also, there are multiple networks and parameter values
 that behave quite similar to the target network. One must identify the
 true network out of multiple candidates.
 This process can be divided into two major tasks: (1) network structure
 identification, and (2) parameter identification.
 9 Systems Biology: Toward System-level Understanding of Biological Systems
 Network Structure Identification
 Several attempts have already been made to identify gene regulatory
 networks from experimental data. They can be classified into two approaches.
 BOTTOM-UP APPROACH
 The bottom-up approach tries to construct a gene regulatory network
 based on the compilation of independent experimental data, mostly
 through literature searches and some specific experiments to obtain data
 of very specific aspects of the network of interest. Some of the early attempts
 of this approach are seen in the lambda phage decision circuit
 (McAdams and Shapiro, 1995), early embryogenesis of Drosophila (Reinitz
 et al., 1995" Hamahashi and Kitano, 1998" Kitano et al., 1997), leg formation
 (Kyoda and Kitano, 1999a), wing formation (Kyoda and Kitano, 1999b),
 eye formation on ommatidia clusters and R-cell differentiation (Morohashi
 and Kitano, 1998), and a reaction-diffusion based eye formation
 model (Ueda and Kitano, 1998). This approach is suitable when most of
 the genes and their regulatory relationship are relatively well understood.
 This approach is particularly suitable for the end-game scenario where
 most of the pieces are known and one is trying to find the last few pieces.
 In some cases, biochemical constants can be measured so that very precise
 simulation can be performed. When most parameters are available, the
 main purpose of the research is to build a precise simulation model which
 can be used to analyze the dynamic properties of the system by changing
 the parameters that cannot be done in the actual system, and to confirm
 that available knowledge generates simulation results that are consistent
 with available experimental data.
 There are efforts to create databases that describe gene and metabolic
 pathways from the literature. KEGG (Kanehisa and Goto, 2000) and Eco-
 Cyc (Karp et al., 1999) are typical examples. Such databases are enormously
 useful for modeling and simulation, but they must be accurate
 and represented in such a way that simulation and analysis can be done
 smoothly.
 There have been some preliminary attempts to predict unknown genes
 and their interactions (Morohashi and Kitano, 1998" Kyoda and Kitano,
 1999a,b). These attempts manually searched possible unknown interactions
 to obtain simulation results consistent with experimental data, and
 did not perform exhaustive searches of all possible spaces of network
 structures.
 TOP-DOWN APPROACH
 The top-down approach tries to make use of high-throughput data using
 DNA microarray and other new measurement technologies. Already,
 10 Hiroaki Kitano
 there have been some attempts to infer groups of genes that have a tight
 relationship based on DNA microarray data using clustering techniques
 for the yeast cell cycle (Brown and Botstein, 1999" DeRisi et al., 1997"
 Spellman et al., 1998) and development of mouse central neural systems
 (D’haeseleer et al., 1999). Clustering methods are suitable for handling
 large-scale profile data, but do not directly deduce the network structures.
 Such methods only provide clusters of genes that are co-expressed in similar
 temporal patterns. Often, easy-to-understand visualization is required
 (Michaels et al., 1998).
 Some heuristics must be imposed if we are to infer networks from such
 methods. Alternative methods are now being developed to directly infer
 network structures from expression profiles (Morohashi and Kitano, 1999"
 Liang et al., 1999) and extensive gene disruption data (Akutsu et al., 1999"
 Ideker et al., 2000). Most of the methods developed in the past translate
 expression data into binary values, so that the computing cost can be
 reduced. However, such methods seriously suffer from information loss
 in the binary translation process, and cannot obtain the accurate network
 structure. A method that can directly handle continuous-value expression
 data was proposed (Kyoda et al., 2000b" Onami et al., 2001b) and reported
 accurate performance without a serious increase in computational costs.
 An extension of this method seems to be very promising for any serious
 research on inference of gene regulatory networks.
 Genetic programming has been applied to automatically reconstruct
 pathways and parameters that fit experimental data (Koza et al., 2001).
 The approach requires extensive computing power, and an example of
 such is the 1,000 CPU cluster Beowulf-class supercomputer, but the approach
 has the potential to be practical given the expected speed up of
 processor chips.
 Such extensions include the development of a hybrid method that
 combines the bottom-up and the top-down approach. It is unlikely that
 no knowledge is available before applying any inference methods" in
 practical cases, it can be assumed that various genes and their interactions
 are partially understood, and that it is necessary to identify the rest of the
 network. By using knowledge that is sufficiently accurate, the possible
 space of network structures is significantly reduced.
 One major problem is that such methods cannot directly infer possible
 modifications and translational control. Future research needs to address
 integration of the data of the expression profile, protein-protein interactions,
 and other experimental data.
 Parameter Identification
 It is important to identify only the structure of the network, but a set of
 parameters, because all computational results have to be matched and
 tested against actual experimental results. In addition, the identified net-
 11 Systems Biology: Toward System-level Understanding of Biological Systems
 work will be used for simulating a quantitative analysis of the system’s
 response and behavioral profile.
 In most cases, the parameter set has to be estimated based on experimental
 data. Various parameter optimization methods, such as genetic
 algorithms and simulated annealing, are used to find a set of parameters
 that can generate simulation results consistent with experimental data
 (Hamahashi and Kitano, 1999). In finding a parameter set, it must be noted
 that there may be multiple parameter sets which generate simulation results
 equally fitted to experimental data. An important feature of parameter
 optimization algorithms used for this purpose is the capability to find
 as many local minima (including a global minima) as possible, rather than
 finding single global minima. This needs to be combined with a method
 to indicate specific experiments to identify which one of such parameter
 sets is the correct parameter set.
 There are several methods to find optimal parameter sets such as
 brute force exhaustive search, genetic algorithms, simulated annealing,
 etc. Most of them are computationally expensive, and have not been considered
 viable options in the past. But the situation has changed, and it
 will change in future, too.
 Although it is important to accurately measure and estimate the genuine
 parameter values, in some cases parameters are not that critical. For
 example, it was shown through an extensive simulation that the segment
 polarity network in Drosophila exhibits a high level of robustness against
 parameter change (von Dassow et al., 2000). For certain networks that are
 essential for survival the networks need to be built robust against various
 changes in parameters to cope with genetic variations and external
 disturbances. For this kind of network, the essence is embedded into the
 structure of the network, rather than specific parameters of the network.
 This is particularly the case when feedback control is used to obtain robustness
 of the circuits, as seen in bacterial chemotaxis (Yi et al., 2000).
 Thus, parameter estimation and measurement may need to be combined
 with theoretical analysis on sensitivity of certain parameters to
 maintain functionalities of the circuit.
 SYSTEM BEHAVIOR ANALYSIS
 Once we understand the structures of the system, research will focus on
 dynamic behaviors of the system. How does it adapt to changes in the
 environment, such as nutrition, and various stimuli? How does it maintain
 robustness against various potential damage to the system, such as
 DNA damage and mutation? How do specific circuits exhibit functions
 observed? To attain system-level understanding, it is essential to understand
 the mechanisms behind (1) the robustness and stability of the system,
 and (2) functionalities of the circuits.
 It is not a trivial task to understand the behaviors of complex biolog-
 12 Hiroaki Kitano
 ical networks. Computer simulation and a set of theoretical analyses are
 essential to provide in-depth understanding on the mechanisms behind
 the circuits.
 Simulation
 Simulation of the behavior of gene and metabolism networks plays an important
 role in systems biology research, and there are several ongoing
 efforts on simulator development (Mendes and Kell, 1998" Tomita et al.,
 1999" Kyoda et al., 2000a" Nagasaki et al., 1999). Due to the complexity of
 the network behavior and large number of components involved, it is almost
 impossible to intuitively understand the behaviors of such networks.
 In addition, accurate simulation models are prerequisite for analyzing the
 dynamics of the system by changing the parameters and structure of the
 gene and metabolism networks. Although such analysis is necessary for
 understanding the dynamics, these operations are not possible with actual
 biological systems. Simulation is an essential tool not only for understanding
 the behavior, but also for the design process. In the design of complex
 engineering systems, various forms of simulation are used. It is unthinkable
 today that any serious engineering systems could be designed and
 built without simulation. VLSI design requires major design simulation,
 thus creating one of the major markets for supercomputers. Commercial
 aviation is another example. The Boeing 777 was designed based almost
 entirely on simulation and digital prefabrication. Once we enter that stage
 of designing and actively controlling biological systems, simulation will
 be the core of the design process.
 For simulation to be a viable methodology for the study of biological
 systems, highly functional, accurate, and user-friendly simulator systems
 need to be developed. Simulators and associated software systems often
 require extensive computing power such that the system must run on
 highly parallel cluster machines, such as the Beowulf PC cluster (Okuno
 et al., 1999). Although there are some simulators, there is no system that
 sufficiently covers the needs of a broad range of biology research. Such
 simulators must be able to simulate gene expression, metabolism, and
 signal transduction for a single and multiple cells. It must be able to
 simulate both high concentration of proteins that can be described by
 differential equations, and low concentration of proteins that need to be
 handled by stochastic process simulation. Some efforts on simulating a
 stochastic process (McAdams and Arkin, 1998) and integrating it with
 high concentration level simulation are underway.
 In some cases, the model requires not only gene regulatory networks
 and metabolic networks, but also high-level structures of chromosomes,
 such as heterochromatin structures. In the model of aging, some attempts
 are being made to model heterochromatin dynamics (Kitano and Imai,
 1998" Imai and Kitano, 1998). Nevertheless, how to model such dynamics
 13 Systems Biology: Toward System-level Understanding of Biological Systems
 and how to estimate the structure from sparse data and the current level
 of understanding are major challenges.
 The simulator needs to be coupled with parameter optimization tools,
 a hypothesis generator, and a group of analysis tools. Nevertheless, algorithms
 behind these software systems need to be designed precisely for
 biological research. One example that has already been mentioned is that
 the parameter optimizer needs to find as many local minima (including
 global minima) as possible, because there are multiple possible solutions
 of which only one is actually used. The assumption that the most optimal
 solution is used in an actual system does not hold true in biological
 systems. Most parameter optimization methods are designed to find the
 global optima for engineering design and problem solving. While existing
 algorithms provide a solid starting point, they must be modified to suit
 biological research. Similar arguments apply to other software tools, too.
 A set of software systems needs to be developed and integrated to
 assist systems biology research. Such software includes:
 • a database for storing experimental data,
 • a cell and tissue simulator,
 • parameter optimization software,
 • bifurcation and systems analysis software,
 • hypotheses generator and experiment planning advisor software, and
 • data visualization software.
 How these modules are related and used in an actual work flow is
 illustrated in Figure 1.1. While many independent efforts are being made
 on some of this software, so far only limited efforts have been made to
 create a common platform that integrates these modules. Recently, a group
 of researchers initiated a study to define a software platform for systems
 biology. Although various issues need to be addressed for such a software
 platform, the rest of this section describes some illustrative issues.
 Efforts are being made to provide a common and versatile software
 platform for systems biology research. The Systems Biology Workbench
 project aims to provide a common middleware so that plug-in modules
 can be added to form a uniform software environment.
 Beside the software module itself, the exchange of data and the interface
 between software modules is a critical issue in data-driven research
 tools. Systems Biology Mark-up Language (SBML) is a versatile and common
 open standard that enables the exchange of data and modeling information
 among a wide variety of software systems (Hucka et al., 2000,
 2001). It is an extension of XML, and is expected to become the industrial
 and academic standard of the data and model exchange format.
 Ultimately, a group of software tools needs to be used for disease modeling
 and simulation of organ growth and control" this requires a comprehensive
 and highly integrated simulation and analysis environment.
 14 Hiroaki Kitano
 Experimental Data
 Database
 Experimental Data
 Interface
 Measurement
 Systems
 Genome/Proteome
 Database
 Simulator
 System Analysis Module
 (Bifurcation analysis,
 Flux Balance Analysis, etc.)
 System Profile
 Database
 Hypotheses Generation
 Experiment Planning
 Module
 Visualization Module
 Parameter Optimization
 Module
 System Structure
 Database
 (A) Relationship among Software Tools
 Simulator
 Expression profile data
 Two-hybrid data,
 RT-PCR data, etc.
 Parameter
 optimizer
 Hypotheses
 generator
 Gene regulation network
 Metabolic cascade network
 Signal transduction network
 A set of plausible hypotheses
 Predictions of genes and interactions
 Experiment design
 assistance system
 Experiment plans
 Biological
 experiments
 Dynamic systems analysis
 Robustness, stability,
 bifurcation, etc
 Design pattern analysis
 Design pattern extraction
 (B) Workflow and software tools
 Figure 1.1 Software tools for systems biology and their workflow
 Analysis Methods
 There have been several attempts to understand the dynamic properties
 of systems using bifurcation analysis, metabolic control analysis, and sensitivity
 analysis. For example, bifurcation analysis has been used to understand
 the Xenopus cell cycle (Borisuk and Tyson, 1998). The analysis
 creates a phase portrait based on a set of equations describing the essential
 process of the Xenopus cell cycle. A phase portrait illustrates in which
 operation point the system is acting, and how it changes behavior if some
 of the system parameters are varied. By looking at the landscape of the
 15 Systems Biology: Toward System-level Understanding of Biological Systems
 phase portrait, a crude analysis of the robustness of the system can be
 made.
 A group of analysis methods such as flux balance analysis (FBA)
 (Varma and Palsson, 1994" Edward and Palsson, 1999) and metabolic control
 analysis (MCA) (Kacser and Burns, 1973" Heinrich and Rapoport,
 1974" Fell, 1996) provides a useful method to understand system-level behaviors
 of metabolic circuits under various environments and internal disruptions.
 It has been demonstrated that such an analysis method can provide
 knowledge on the capabilities of metabolic pathways that are consistent
 with experimental data (Edward et al., 2001). While such methods are
 currently aiming at analysis of the steady-state behaviors with linear approximation,
 extention to dynamic and nonlinear analysis would certainly
 provide a powerful tool for system-level analysis of metabolic circuits.
 Several other analysis methods have already been developed for complex
 engineering systems, particularly in the area of control dynamic systems.
 One of the major challenges is to describe biological systems in the
 language of control theory, so that we can abstract essential parts of the
 system within the common language of biology and engineering.
 ROBUSTNESS OF BIOLOGICAL SYSTEMS
 Robustness is one of the essential features of biological systems. Understanding
 the mechanism behind robustness is particularly important because
 it provides in-depth understanding on how the system maintains its
 functional properties against various disturbances. Specifically,we should
 be able to understand how organisms respond to (1) changes in environment
 (deprived nutrition level, chemical attractant, exposure to various
 chemical agents that bind to receptors, temperature) and (2) internal failures
 (DNA damage, genetic malfunctions in metabolic pathways). Obviously,
 it is critically important to understand the intrinsic functions of the
 system, if we are eventually to find cures for diseases.
 Lessons from Complex Engineering Systems
 There are interesting analogies between biological systems and engineering
 systems. Both systems are designed incrementally through some sort
 of evolutionary processes, and are generally suboptimal for the given task.
 They also exhibit increased complexity to attain a higher level of robustness
 and stability.
 Consider an airplane as an example. If the atmospheric air flow is
 stable and the airplane does not need to change course, altitude, or weight
 balance, and does not need to take off and land, the airplane can be built
 using only a handful of components. The first airplane built by theWright
 brothers consisted of only a hundred or so components. The modern jet,
 such as the Boeing 747, consists of millions of components. One of the
 16 Hiroaki Kitano
 major reasons for the increased complexity is to improve stability and
 robustness. Is this also the case in biological systems?
 Mycoplasma is the smallest self-sustaining organism and has only
 about 400 genes. It can only live under specific conditions, and is very
 vulnerable to environmental fluctuations. E. coli, on the other hand, has
 over 4,000 genes and can live under varying environments. As E. coli
 evolved it acquired genetic and biochemical circuits for various stress
 responses and basic behavioral strategies such as chemotaxis (Alon et
 al., 1999" Barkai and Leibler, 1997). These response circuits form a class
 of negative feedback loop. Similar mechanisms exist even in eukaryotic
 cells2.
 A crude speculation is that further increases in complexity in multicellular
 systems toward homo sapiens may add functionalities that can cope
 with various situations in their respective ecological niche.
 In engineering systems, robustness and stability are achieved by the
 use of (1) system control, (2) redundancy, (3) modular design, and (4)
 structural stability. The hypothesis is that the use of such an approach
 is an intrinsic feature of complex systems, be they artificial or natural.
 System Control: Various control schemes used in complex engineering
 systems are also found in various aspects of biological systems. Feedforward
 control and feedback control are two major control schemes, both
 of which are found almost ubiquitously in biological systems. Feedforward
 control is an open-loop control in which a set of predefined reaction
 sequences is triggered by a certain stimulus. Feedback is a sophisticated
 control system that closes the loop of the signal circuits to attain the desired
 control of the system. A negative feedback system detects the difference
 between desired output and actual output and compensates for such
 difference by modulating the input. While there are feedforward control
 methods, feedback control is more sophisticated and ensures proper control
 of the system and it can be used with feedforward control. It is one
 of the most widely used methods in engineering systems to increase the
 stability and robustness of the system.
 Redundancy: Redundancy is a widely used method to improve the system’s
 robustness against damage to its components by using multiple
 pathways to accomplish the function. Duplicated genes and genes with
 similar functions are basic examples of redundancy. There is also circuitlevel
 redundancy, such as multiple pathways of signal transduction and
 metabolic circuits that can be functionally complementary under different
 conditions.
 Modular Design: Modular design prevents damage from spreading limitlessly,
 and also improves ease of evolutionary upgrading of some of the
 2 Discussion of similarity between complexity of engineering and biological systems as
 described in this section was first made, as far as the author is aware, by John Doyle at
 Caltech.
 17 Systems Biology: Toward System-level Understanding of Biological Systems
 Feedforward control
 Controller Effector input output
 Controller Effector input output
 Feedback control
 -
 Figure 1.2 Feedforward control and feedback control
 components. At the same time, a multi-functional module can help overcome
 system failure in a critical part by using modules in other less critical
 parts. Cellular systems are typical examples of modular systems.
 Structural Stability: Some gene regulatory circuits are built to be stable
 for a broad range of parameter variations and genetic polymorphisms.
 Such circuits often incorporate multiple attractors, each of which corresponds
 to functional state of the circuit" thus its functions are maintained
 against change in parameters and genetic polymorphisms.
 It is not clear whether such engineering wisdom is also the case in biological
 systems. However, the hypothesis is that such features are somewhat
 universal in all complex systems. It is conceivable that there are certain
 differences due to the nature of the system it is built upon, as well as
 the difference between engineering systems that are designed to exhibit
 certain functions and natural systems that have reproduction as a single
 goal where all functions are only evaluated in an integrated effect. Nevertheless,
 it is worth investigating the univerality of principles. And, if there
 are differences, what are they?
 The rest of the section focuses on how three principles of robustness
 exist also in biological systems. Of course, not all biological systems are
 robust, and it is important to know which parts of the systems are not
 robust and why. However, for this particular chapter, we will focus on
 robustness of biological systems, because it is one of the most interesting
 issues that we wish to understand.
 Control
 The use of explicit control scheme is an effective approach to improving
 robustness. Feedforward control and feedback control are two major
 methods of system control (Figure 1.2).
 Feedforward control is an open-loop control in which a sequence of
 predefined actions is triggered by a certain stimulus. This control method
 18 Hiroaki Kitano
 is the simplest method that works when possible situations and countermeasures
 are highly predictable.
 Feedback control, such as negative feedback, is a sophisticated control
 method widely used in engineering. It feeds back the sign-inverted error
 between the desired value and the actual value to the input, then the input
 signal is modulated proportional to the amount of error. In its basic form,
 it acts to minimize the output error value.
 Feedback plays a major role in various aspects of biological processes,
 such as E. coli chemotaxis and heat shock response, circadian rhythms, cell
 cycle, and various aspects of development.
 The most typical example is the integral feedback circuits involved in
 bacterial chemotaxis. Bacteria demonstrates robust adaptation to a broad
 range of chemical attractant concentrations, and so can always sense
 changes in chemical concentration to determine its behavior. This is accomplished
 by a circuit that involves a closed-loop feedback circuit (Alon
 et al., 1999" Barkai and Leibler, 1997). As shown in Figure 1.3, ligands that
 are involved in chemotaxis bind to a specific receptor MCP that forms a
 stable complex with CheA and CheW. CheA phosphorylates CheB and
 CheY. Phosphorylated CheB demethylates the MCP complex, and phosphorylated
 CheY triggers tumbling behavior. It was shown through experiments
 and simulation studies that this forms a feedback circuit which
 enables adaptation to changes in ligand concentration. Specifically, for
 any sudden change in the ligand concentration, the average activity level
 that is characterized by the tumbling frequency quickly converges to the
 steady-state value. This means that the system only detects acute changes
 of the ligand concentration that can be exploited to determine tumbling
 frequency, but is insensitive to the absolute value of ligand concentration.
 Therefore, the system can detect and control its behavior to move to a
 high attractant concentration area in the field regardless of the absolute
 concentration level without saturating its sensory system. Detailed analysis
 revealed that this circuit functions as an integral feedback (Yi et al.,
 2000) —the most typical automatic control strategy.
 In bacteria, there are many examples of sophisticated control embedded
 in the system. The circuit that copes with heat shock, for example, is
 a beautiful example of the combined use of feedforward control and feedback
 control (Figure 1.4). Upon heat shock, proteins in E. coli can no longer
 maintain their normal folding structures. The goal of the control system is
 to repair misfolding proteins by activating a heat shock protein (hsp), or
 to dissociate misfolding proteins by protease. As soon as heat shock is imposed,
 a quick translational modulation facilitates the production of σ32
 factor by affecting the three-dimensional structure of rpoH mRNA that
 encodes σ32. This leads to the formation of σ32-RNAP holo-enzyme that
 activates hsp that repair misfolded proteins. This process is feedforward
 control that pre-encodes the relationship between heat shock and proper
 course of reactions. In this process, there is no detection of misfolded pro-
 19 Systems Biology: Toward System-level Understanding of Biological Systems
 MCP
 CheW
 CheA
 MCP
 CheW
 CheA
 CheR
 m
 CheB
 CheY
 CheZ
 CheB P
 CheY P
 Figure 1.3 Bacterial chemotaxis related feedback loop
 teins to adjust the translational activity of σ32. Independently, DnaK and
 DnaJ detect misfolded proteins and release σ32 factor, that has been bound
 with DnaK and DnaJ. Free σ32 activates transcription of hsp, so that misfolded
 proteins are repaired. This process is negative feedback control,
 because the level of misfolded proteins is monitored and it controls the
 activity of σ32 factor.
 Another example demonstrating the critical role of the feedback system
 is seen in growth control of human cells. Growth control is one of the
 most critical parts of cellular functions. The feedback circuit involved in
 p53 presents a clear example of how feedback is used (Figure 1.5). When
 DNA is damaged, DNA-dependent kinase DNA-PK is activated. Also,
 ATM is phosphorylated, which makes ATM itself in an active state and
 promotes phosphorylation of the specific locus of the p53 protein. When
 this locus is phosphorylated, p53 no longer forms a complex with MDM2,
 and escapes from dissociation. The phosphorylation locus depends on
 what kind of stress is imposed on DNA. Under a certain stress, phosphorylation
 takes place at the Ser15 site of p53, and promotes transcription of
 p21 that eventually causes G1 arrest. In other cases, it promotes activation
 of apoptosis inducing genes, such as pig-3, and results in apoptosis. For
 those cells that entered G1 arrest, DNA-PK and ATM activity are lost as
 soon as DNA is repaired. The loss of DNA-PK and ATMactivity decreases
 phosphorylation of p53, so p53 will bind with MDM2 and dissolve.
 Without phosphorylation, the p53 protein promotes mdm-2 transcription.
 It is interesting to know that mdm-2 protein forms a complex to deac-
 20 Hiroaki Kitano
 Normal
 Protein
 Misfolded
 Protein rpoH
 hsp
 Heat
 Shock
 dnaK
 dnaJ
 grpE
 GroES
 GroEL
 dnaK
 dnaJ
 grpE
 GroES
 GroEL
 σ70
 σE
 σ32
 σ32 E
 σ32 dnaJ
 dnaK
 grpE
 Figure 1.4 Heat shock response with feedforward and feedback control
 tivate the p53 protein. This is another negative feedback loop embedded
 in this system.
 Redundancy
 Redundancy also plays an important role in attaining robustness of the
 system, and is critical for coping with accidental damage to components of
 the system. For example, the four independent hydraulic control systems
 in a Boeing 747 render the systems functionally normally even if one or
 two of them are damaged. In aircraft, control systems and engines are
 designed to have a high level of redundancy. In a cellular system, signal
 transduction and cell cycle are equivalent to control systems and engines.
 A typical signal transduction pathway is the MAP kinase cascade.
 The MAP kinase cascade involves extensive cross talk among collateral
 pathways. Even if one of these pathways is disabled due to mutation or
 other causes, the function of the MAP kinase pathway can be maintained
 because other pathways still transduce the signal (Figure 1.6).
 Cell cycle is the essential process of cellular activity. For example, in
 the yeast cell cycle, the Cln and Clb families play a dominant role in the
 progress of the cell cycle. They bind with Cdc28 kinase to form Cdk complex.
 Cln is redundant because knock-out of up to two of three Cln (Cln1,
 Cln2, Cln3) does not affect the cell cycle" all three Cln have to be knocked
 out to stop the cell cycle. Six Clb have very similar features, and may have
 originated in gene duplication. No single loss-of-function mutant of any
 of the six Clb affects growth of the yeast cell. The double mutants of CLB1
 21 Systems Biology: Toward System-level Understanding of Biological Systems
 p53
 DNA-PK
 mdm2
 p53
 MDM2
 p53
 MDM2
 p53
 p
 pig-3,
 etc. Apoptosis
 p53
 p
 p21 G1 arrest
 DNA damage
 DNA repair
 during G1 arrest
 ATM
 ATM
 p
 Figure 1.5 p53 related feedback loop
 and CLB2, as well as CLB2 and CLB3s are lethal, but other double mutant
 combinations do not affect phenotype. It is reasonable that the basic mechanism
 of the cell cycle has evolved to be redundant, thus robust against
 various perturbations.
 Redundancy can be exploited to cope with uncertainty involved in
 stochastic processes. McAdams and Arkin argued that duplication of
 genes and the existence of homologous genes improve reliability so that
 transcription of genes can be carried out even when only a small number
 of transcription factors are available (McAdams and Arkin, 1999). The use
 of a positive feedback loop to autoregulate a gene to maintain its own expression
 level is an effective means of ensuring the trigger is not lost in the
 noise.
 Although its functional implication has not been sufficiently investigated,
 an analysis of MAP kinase cascade revealed that it utilizes nonlinear
 properties intrinsic in each step of the cascade and positive feedback
 to constitute a stable all-or-none switch (Ferrell and Machleder, 1998).
 In the broader sense, the existence of metabolic pathways that can alternatively
 function to sustain cellular growth with changing environment
 can be viewed as redundancy. Bacteria is known to switch metabolic pathways
 if deprived of one type of nutrition, and to use other types of nutrition
 that are available. Theoretical analysis combined with experimental
 data indicate that different pathways are used to attain essentially the
 22 Hiroaki Kitano
 Raf, Mos MEKK1, MLK3 ASK1, TAK1
 MEK1,2/MKK1,2 SEK1,2/MKK4,7 MKK3,6
 MAPK/ERK SAPK/JNK p38
 Transcription
 Figure 1.6 Redundancy in MAP kinase cascade
 same objective function (Edward et al., 2001).
 Once we understand the stability and robustness of the system, we
 should be able to understand how to control and transform cells. We will
 then be ready to address such questions as how to transform cells that are
 malfunctioning into normal cells, how to predict disease risk, and how to
 preemptively treat potential diseases.
 Modular Design
 Modular design is a critical aspect of the robustness: it ensures that damage
 in one part of the system does not spread to the entire system. It may
 also ensure efficient reconfiguration throughout the evolutionary process
 to acquire new features.
 The cellular structure of the multicellular organism is a clear example.
 It physically partitions the structure so that the entire system does not
 collapse due to local damage.
 Gene regulatory circuits are considered to entail a certain level of modularity.
 Even if part of the circuit is disrupted due to mutation or injection
 of chemicals, it does not necessary affect other parts of the circuit. For example,
 mutation in p53 may destroy the cell cycle check point system that
 leads to cancer. However, it does not destroy metabolic pathways, so the
 cells continue to proliferate. How and why such modularity is maintained
 is not well understood at present.
 Modularity reflects hierarchical organization of the system that can be
 viewed as follows:
 Component: An elementary unit of the system. In electronics, transistors,
 capacitors, and resistors are components. In biological systems, genes and
 proteins, which are transcriptional products, are components.
 Device: A minimum unit of the functional assembly. NAND gates and
 23 Systems Biology: Toward System-level Understanding of Biological Systems
 flip-flops are examples of devices3. Transcription complexes and replication
 complexes are examples of devices. Some signal transduction circuits
 may be considered as devices.
 Module: A large cluster of devices. CPU, memory, and amplifiers are
 modules. In biological systems, organella and gene regulatory circuits for
 the cell cycle are examples of modules.
 System: A top-level assembly of modules. Depending on the viewpoint,
 a cell or entire animal can be considered as a system.
 In engineering wisdom, each low-level module should be sufficiently
 self-contained and encapsulated so that changes in higher-level structure
 do not affect internal dynamics of the lower-level module. Whether is this
 also the case for biological systems and how it can be accomplished are of
 major interest from a system perspective.
 Structural Stability
 Some circuits may, after various disturbances to the state of the system,
 resume as one of multiple attractors (points or periodic). Often, feedback
 loops play a major role in making this possible. However, feedback does
 not explicitly control the state of the circuit in tracking or adapting to
 stimuli. Rather, dynamics of the circuit exhibit certain functions that are
 used in the larger sub-systems.
 The most well understood example is seen in one of the simplest
 organisms, lambda phage (McAdams and Shapiro, 1995). Lambda phage
 exploits the feedback mechanism to stabilize the committed state and
 to enable switching of its pathways. When lambda phage infects E. coli,
 it chooses one of two pathways: lysogeny and lysis. While a stochastic
 process is involved in the early stage of commitment, two positive and
 negative feedback loops involving CI and Cro play a critical role in stable
 maintenance of the committed decision. In this case, whether to maintain
 feedback or not is determined by the amount of activator binding to the
 OR region, and the activator itself cuts off feedback if the amount exceeds
 a certain level. This is an interesting molecular switch that is not found
 elsewhere. Overall, the concentration mechanism of Cro is maintained
 at a certain level using positive feedback and negative feedback. It was
 reported that the fundamental properties of the lambda phage switch
 circuit are not affected even if the sequence of OR binding sites is altered
 (Little et al., 1999). This indicates that properties of the lambda phage
 decision circuit are intrinsic to the multiple feedback circuit, not specific
 parametric features of the elements, such as binding sites.
 Relative independence from specific parameters is an important fea-
 3 In electronics, “device” means transistors and other materials mentioned in “components.”
 NAND gates and flip-flops are recognized as minimum units of the circuit.
 24 Hiroaki Kitano
 ture of a robust system. Recent computational studies report that circuits
 that are robust against a broad range of parameter variations are found
 in Xenopus cell cycle (Morohashi et al., unpublished) and body segment
 formation (von Dassow et al., 2000). Using the simulation of parasegment
 formation of Drosophila, it was found that some parameters in the circuit
 accountable for pattern formation are tolerant to major parameter variations.
 This strongly suggests that the structure of the circuit that is dominantly
 responsible for pattern formation rather than specific parameter
 values (von Dassow et al., 2000).
 Such circuit features of structural stability also play important roles
 in development. A recent review article (Freeman, 2000) elucidates some
 interesting cases of feedback circuits that play a dominant role in the development
 process. Such cases include temporal arrangement of signaling
 in the JAK/STAT signaling pathway, pattern formation in Drosophila
 involving Ubx and Dpp, maintenance of patterns of expression for sonic
 hedgehog (Shh) that forms ZPA and Fgf, forming AER in limb development,
 etc. In these examples, structure of circuits play the dominant role
 rather than specific set of parameters.
 THE SYSTEOME PROJECT
 In order to promote scientific research of systems biology, it is critically
 important to create a comprehensive data resource that describes systems’
 features, as does the human genome project. This is an enormous
 challenge, and it requires significant efforts far beyond the capability of
 any single research group. Therefore, the author proposes “The Systeome
 Project” as a grand challenge in the area of systems biology.
 Systeome is an assembly of system profiles for all genetic variations
 and environmental stimuli responses. A system profile comprises a set of
 information on the properties of the system that includes the structure of
 the system and its behaviors, analysis results such as phase portfolio, and
 bifurcation diagrams. The structure of the system includes the structure
 of gene and metabolic networks and its associated constants, physical
 structures and their properties.
 Systeome is different from a simple cascade map, because it assumes
 active and dynamic simulations and profiling of various system statuses,
 not a
 | 
 |