On Simulations and Analysis

Simulations are programs that emulate the behavior of the finished system allowing the study of certain performance aspects for optimization and performance evaluation.

Early on in my career, when I was a green systems engineer, I used to throw myself into huge end-to-end simulations that tried to mimic every aspect of the system. At some point, I wrote a TCP stack simulator and incorporated it into an end-to-end 3G cellular system. We were studying resource allocation at the base station level and wanted to understand the impact across the stacks for http and email use cases. I did obtain good results but it was a painstaking effort. Writing large detailed simulations is like developing a system with the same level of complexity as the real system. The real system will have dozens or hundreds of engineers working on it and will still be full of bugs. So, this endeavor should not be taken lightly. Developing a large simulation is doable in large companies where a decision may be made to support such a system simulator for the benefit of a large number of evaluations. The resources will then be committed to maintain the simulator. Branches can be created for specific evaluations. Still, a friend of mine used to say that the real proof that you understand a large simulation is finding a bug in it.

As I became more experienced, I drifted toward simpler simulations based on high level mathematical languages such as Matlab. And, before writing any code, I always spend time trying to identify the key mechanisms and determining the minimum level of complexity required to mimic the studied behavior. As part of this pre-study, one may find closed form solutions that can supplement the simulation results and provide insights into the tradeoffs of the parameter settings. It is surprising how many systems and development engineers are reluctant to apply simple math to analyze the work they are doing. Once, an x or a y is shown in a presentation, people feel overwhelmed. So, always explain your work as simply as possible and with a lot of figures and detailed steps. You may be asked to skip the details, but the details will be there for the curious.

So, to summarize, the best approaches to simulation and analysis are:

  1. Whittle down the mechanism you are evaluating to its simplest expression.
  2. Try to find a mathematical model that fits the mechanism.
  3. Simulate using a simple and powerful language. This step may be applied even if a mathematical model is found in step 2.
  4. If the above does not work and if a large simulator is already developed, then use that simulator after modifying it for your purposes.
  5. If all else fails, ask for resources to develop the large simulator described above. Proceed with care.

The two principles guiding this approach are ones that I learned from an old systems engineer:

  1. The KISS principle: Keep It Simple, Stupid!
  2. And, the SWP principle: Steal With Pride i.e. reuse existing work whenever possible.

Initially, I scoffed at these principles. After all, the Golden rule is that there is no Golden rule. However, these two principles can be applied to all areas of systems engineering and especially, in my case, to simulations and performance analysis.

I will give one example of a simple derivation that has served me a lot for studying the contention between two periodic procedures. Consider a primary process running every 470.69 ms for 20 ms, and a secondary process running every 320 ms for 80 ms. The primary process has priority and will thus preempt the second process. We wanted to consider the impact of raising the priority of the secondary process. Thus, the secondary process would now preempt the primary process. We ran some simulations and determined that the primary process would not execute 23% of the time. However, I noticed a simple derivation that provides more insight into the error rate and the pattern of errors.

For simplicity, assume that the primary process occurs every 470 ms. As may be seen later, the timescale of evaluation was 15 seconds and the drift of 0.7 ms every 470 ms has minimal impact.

Consider, without loss of generality, that the primary process occurs every M*470 ms for 20 ms where N is an integer, and the secondary process occurs every N*320 ms for 80 ms where M is also an integer.

For a given N, the primary process is preempted if there exists an integer M such that there is overlap between the 2 processes. The existence of M means that there is an occurrence of the primary process that overlaps with the secondary process at N. Formulaically:

An integer M exists such that N*320 – 20 <= M*470 <= N*320 +80,

or, N*32/47 – 2/47 <= M <= N*32/47 + 8/47

Consider the fractional part of M*32/47, it is equal to (M*32 mod 47)/47. Since 32 and 47 are mutually prime, N*32 mod 47 will produce the integer numbers 0..46 within a cycle of 47 consecutive values of N (i.e. within 47 occurrences of the secondary process). The fractional part will take on the values 0, 1/47, 2/47, .., 46/47. An integer M exists when the fractional part is either 39/47, 40/47, .., 46/47, 0/47, 1/47, 2/47 (this can be verified by plugging the values into the equation). Thus, within 47 consecutive values of N, the number of preemptions will be 11, i.e. 23% of the time. The pattern repeats every 47 cycles and takes 15.04 seconds.

Thus, we can easily extend the analysis to any 2 periodicities or length of activity without the need for additional simulations. Furthermore, the simple math showed us the error pattern repetition periodicity allowing us to set testing durations, develop detection algorithms, and more.

The above little problem now serves as one of my favorite interview problems. Multiple birds, one stone!

Leave a comment