What are complex systems? Large collections of interconnected components whose interactions lead to macroscopic behaviors in:
- Biological systems (e.g., slime molds, ant colonies, embryos)
- Physical systems (e.g., earthquakes, avalanches, forest fires)
- Social systems (e.g., transportation networks, cities, economies)
- Information systems (e.g., Internet and compute clouds)
What is the problem? No one understands how to measure, predict or control macroscopic behavior in complex information systems: (1) threatening our nation's security and (2) costing billions of dollars.
"[Despite] society's profound dependence on networks, fundamental knowledge about them is primitive. [G]lobal communication ... networks have quite advanced technological implementations but their behavior under stress still cannot be predicted reliably.... There is no science today that offers the fundamental knowledge necessary to design large complex networks [so] that their behaviors can be predicted prior to building them."
above quote from Network Science 2006, a National Research Council report
What is the new idea? Leverage models and mathematics from the physical sciences to define a systematic method to measure, understand, predict and control macroscopic behavior in the Internet and distributed software systems built on the Internet.
What are the technical objectives? Establish models and analysis methods that (1) are computationally tractable, (2) reveal macroscopic behavior and (3) establish causality. Characterize distributed control techniques, including: (1) economic mechanisms to elicit desired behaviors and (2) biological mechanisms to organize components.
Why is this hard? Valid computationally tractable models that exhibit macroscopic behavior and reveal causality are difficult to devise. Phase-transitions are difficult to predict and control.
Who would care? All designers and users of networks and distributed systems with a 25-year history of unexpected failures:
- ARPAnet congestion collapse of 1980
- Internet congestion collapse of Oct 1986
- Cascading failure of AT&T long-distance network in Jan 1990
- Collapse of AT&T frame-relay network in April 1998 ...
Businesses and customers who rely on today's information systems:
- "Cost of eBay's 22-Hour Outage Put At $2 Million", Ecommerce, Jun 1999
- "Last Week's Internet Outages Cost $1.2 Billion", Dave Murphy, Yankee Group, Feb 2000
- "...the Internet "basically collapsed" Monday", Samuel Kessler, Symantec, Oct 2003
- "Network crashes...cost medium-sized businesses a full 1% of annual revenues", Technology News, Mar 2006
- "costs to the U.S. economy...range...from $65.6 M for a 10-day [Internet] outage at an automobile parts plant to $404.76 M for ... failure ...at an oil refinery", Dartmouth study, Jun 2006
Designers and users of tomorrow's information systems that will adopt dynamic adaptation as a design principle:
- DoD to spend $13 B over the next 5 yrs on Net-Centric Enterprise Services initiative, Government Computer News, 2005
- Market derived from Web services to reach $34 billion by 2010, IDC
- Grid computing market to exceed $12 billion in revenue by 2007, IDC
- Market for wireless sensor networks to reach $5.3 billion in 2010, ONWorld
- Revenue in mobile networks market will grow to $28 billion in 2011, Global Information, Inc.
- Market for service robots to reach $24 billion by 2010, International Federation of Robotics
Hard Issues & Plausible Approaches
|Hard Issues||Plausible Approaches|
|H1. Model scale||A1. Scale-reduction techniques|
|H2. Model validation||A2. Sensitivity analysis & key comparisons|
|H3. Tractable analysis||A3. Cluster analysis and statistical analyses|
|H4. Causal analysis||A4. Evaluate analysis techniques|
Model scale – Systems of interest (e.g., Internet and compute grids) extend over large spatiotemporal extent, have global reach, consist of millions of components, and interact through many adaptive mechanisms over various timescales. Scale-reduction techniques must be employed. Which computational models can achieve sufficient spatiotemporal scaling properties? Micro-scale models are not computable at large spatiotemporal scale. Macro-scale models are computable and might exhibit global behavior, but can they reveal causality? Meso-scale models might exhibit global behavior and reveal causality, but are they computable? One plausible approach is to investigate abstract models from the physical sciences. e.g., fluid flows (from hydrodynamics), lattice automata (from gas chemistry), Boolean networks (from biology) and agent automata (from geography). We can apply parallel computing to scale to millions of components and days of simulated time. Scale reduction may also be achieved by adopting n-level experiments coupled for orthogonal fractional factorial (OFF) experiment designs.
Model validation – Scalable models from the physical sciences (e.g., differential equations, cellular automata, nk-Boolean nets) tend to be highly abstract. Can sufficient fidelity be obtained to convince domain experts of the value of insights gained from such abstract models? We can conduct sensitivity analyses to ensure the model exhibits relationships that match known relationships from other accepted models and empirical measurements. Sensitivity analysis also enables us to understand relationships between model parameters and responses. We can also conduct key comparisons along three complementary paths: (1) comparing model data against existing traffic and analysis, (2) comparing results from subsets of macro/meso-scale models against micro-scale models and (3) comparing simulations of distributed control regimes against results from implementations in test facilities, such as the Global Environment for Network Innovations.
Tractable analysis – The scale of potential measurement data is expected to be very large – O(10**15) – with millions of elements, tens of variables, and millions of seconds of simulated time. How can measurement data be analyzed tractably? We could use homogeneous models, which allow one (or a few) elements to be sampled as representative of all. This reduces data volume to 10**6 – 10**7, which is amenable to statistical analyses (e.g., power-spectral density, wavelets, entropy, Kolmogorov complexity) and to visualization. Where homogeneous models are inappropriate, we can use clustering analysis to view relationships among groups of responses. We can also exploit correlation analysis and principal components analysis to identify and exclude redundant responses from collected data. Finally, we can construct combinations of statistical tests and multidimensional data visualization techniques tailored to specific experiments and data of interest.
Causal analysis – Tractable analysis strategies yield coarse data with limited granularity of timescales, variables and spatial extents. Coarseness may reveal macroscopic behavior that is not explainable from the data. For example, an unexpected collapse in the probability density function of job completion times in a computing grid was unexplainable without more detailed data and analysis. Multidimensional analysis can represent system state as a multidimensional space and depict system dynamics through various projections (e.g., slicing, aggregation, scaling). State-space dynamics can segment system dynamics into an attractor-basin field and then monitor trajectories. Markov models providing compact, computationally efficient representations of system behavior can be subjected to perturbation analyses to identify potential failure modes and their causes.
Controlling Behavior – Large distributed systems and networks cannot be subjected to centralized control regimes because the system consists of too many elements, too many parameters, too much change, and too many policies. Can models and analysis methods be used to determine how well decentralized control regimes stimulate desirable system-wide behaviors? Use price feedback (e.g., auctions, present-value analysis or commodity markets) to modulate supply and demand for resources or services. Use biological processes to differentiate function based on environmental feedback, e.g., morphogen gradients, chemotaxis, local and lateral inhibition, polarity inversion, quorum sensing, energy exchange and reinforcement.