Goals
- Benchmark
- Scalability
- Stability
- Resilience
- Hunting bottlenecks
Questions
- Do we have specs? WRITE DOWN acceptance criteria BEFORE running the test
- Are there bottlenecks outside?
- Has it gone through functional?
Proposed simple flow __/^^^\__
- pre-flight
- heating up
- cruising
- cooling down
- post-load observations
2 kinds of metrics
- black box (external)
- telemetry (internal, from SUT itself or probe)
Telemtry reporting
- there is always a probe
- probe footprint
Fighting noise and distortions (be a useful engine)
- reproducibility, comparability
- extra traffic, cpu, mem, IO, cache, IPC
- spikes
- measure host+system overall vs exact process
Metrics representation
- relative vs cumulative(!)
Cumulative not always works (eg median calcualtion)
Some relative metrics can be surprisingly easy derived:
curr_conn_count=total_conn-total_disconn
- latency vs throughput
Metrics normalization
- bind everything to timestamps (to mitigate delays, batch reporting etc)
- make sure clocks in sync (out of sync measurements caused a lot of issues in history, eg Ampere induction failure story)
- note network delays if no clock
- agreeing/syncing diverse data sources
- enforce and ensure identical measurement units
Alerts
- coredumps
- process restarts (PID change)
- system health (eg SSD IO degrades due to lack of free space)
Post-processing and analysis
- average is deceptive (use various statistics analysis techniques, hire someone)
but useful as noise removal-trend-exposing tool
- derivatives (partial, 1st, 2nd)
- peak vs worst vs percentiles
- correlations and inter-relations (something per something)
- cause-effects (hint: look who is first: cause sometimes observed earlier)
- extrapolations and assumptioins (it’s all about models, beware of broken ones)
Cherry-picking (bonus)
- put unique marker on each single use-case instance/client/dataflow, so that “single” chain could be easily followed
S.M.A.R.T. acronym for load testing
- Specific (target single volatile component, prepare hypothesis)
- Measurable (record input and SUT reaction metrics, keep them aligned)
- Attainable (easy to deploy and run, reach stress level w/o bottlenecks)
- Reproducible (repeatable, no noise)
- Timely (immediate visual results)