How to Use the Lync Server 2013 Stress and Performance Tool for Realistic Load Testing

Troubleshooting and Interpreting Results — Lync Server 2013 Stress and Performance Tool

Validate topology: ensure Front-End, Edge, Mediation and PSTN gateways match your production design.
Clock sync: all test machines, servers, and gateways use NTP and are within 1–2 seconds.
Certificates & DNS: service certificates valid; internal/external DNS records resolvable by test clients.
Resources: CPU, memory, disk I/O and NIC interrupts on servers and generators are not saturated.
Network: verify MTU, QoS, and sufficient bandwidth between load generators and target servers.

High SIP error rates (4xx/5xx)
- Cause: misrouted requests, authentication problems, insufficient server capacity, invalid SIP URIs.
- Fixes: check topology and routing, confirm service account credentials, increase Front-End capacity or reduce simulated user rate, inspect Snooper/centralized logs for exact SIP responses.
Call setup failures or one-way audio
- Cause: NAT/firewall blocking RTP, incorrect media ports, codec mismatches, missing SRTP keys.
- Fixes: open required RTP ports on firewall, validate media bypass and SRTP settings, confirm codecs negotiated in SIP SDP, capture media with Wireshark/Snooper.
High latency or jitter for media
- Cause: network congestion, insufficient CPU on media path, virtualization host contention.
- Fixes: measure path latency and packet loss, enable QoS, move media processors to dedicated hardware or adjust VM resources.
Address Book, ABS or UC services failing
- Cause: incorrect ABS web services URLs, auth failures, expired tokens.
- Fixes: test with Test-CsAddressBookWebQuery, examine Front-End logs and IIS logs for ⁄₄₀₄, fix certificates and URLs.
Load generator instability
- Cause: insufficient generator resources, improper provisioning, DNS/certificate issues for test accounts.
- Fixes: scale out generators, re-run provisioning with UserProvisioningTool, verify generator machine time and network access.

Centralized Logging + Snooper: primary for SIP dialog analysis and call-flow diagrams.
LyncPerfTool logs (consolidated.csv, scenario logs): use for aggregated metrics and error counts.
Windows Performance Monitor (PerfMon): CPU, Memory, Disk Queue Length, Network Interface counters on Front-End, Mediation, and edge.
Wireshark: packet-level RTP/SIP troubleshooting, measure jitter/packet loss.
IIS and Event Viewer: service-level errors, certificate problems, and event IDs.

Success rate: target ≥ 99% for call establishment/IM delivery depending on SLA.
Average call setup time: baseline from production — typical target < 500–1000 ms for SIP INVITE→200 OK in same LAN.
CPU utilization: keep < 70–80% on Front-End during steady-state.
Memory & handle usage: no steady growth (memory leak) across long runs.
RTP packet loss/jitter: packet loss < 1–2%, jitter < 30 ms for acceptable voice quality.

consolidated.csv: aggregated transaction counts, success/failure counts — sort by failure reason to find hotspots.
Scenario-level reports: compare different workload mixes (IM vs AV vs conference) to see which workload triggers failures.
SIP trace call-flow diagrams: follow failing dialog path; identify where 4xx/5xx originate.
PerfMon timelines vs test timeline: correlate spikes in CPU, disk, or NIC drops with increases in error rates.