7.2. ABC Monitor (Optional)

The ABC Monitor provides administrators with an aggregated view of user activity based on usage reporting data collected from the ABC SBC/WebRTC gateways. This highly interactive, near real-time view can be used for trending, analysis of both short-term and long-term use patterns, troubleshooting, auditing server policies and identifying misconducting users. The reporting data comes using events from inside of the SBCs. This “insider view” allows the ABC Monitor administrators to inspect SIP traffic encrypted on the way from and to the SBCs, correlate calls “separated” by topology hiding, and report internal ABC SBC context such as traffic shaping decisions.

If there are multiple SBCs organized in a hot-standby pair, or a cloud the ABC Monitor will collect data from all of them and its centralized nature provides a global view of the whole system. An ABC SBC may also send its data to two Monitors in parallel. This is useful for various organization with multiple isolated teams, easy-to-start virtualized trials and migration scenarios.

The ABC Monitor user interface is organized in several dashboards. The opening Home Dashboard shows the most important data in a single comprehensible page, such as shown in Figure Screenshot: ABC Monitor Home Dashboard. All of the data relates to the period of time chosen in the top right corner. This page can also be sent to administrator on a daily basis by email to report on previous 24 hours.

The data in the home dashboard is structured in several rows. The first row shows various call metrics, such as number of completed and attempted calls, total number of minutes, etc. The second shows how frequent were events of the various types in the observed period of time. Dark fields represent many events of a kind. In this example we see that greylisting events were dominating at a time slot, a situation that often occurs when a SIP scan is launched on a public SIP service.

The next two rows show history of number of parallel calls and registrations, also compared with data from previous day shown using a thin line. In the boxes on the right hand side there are current numbers.

The last row shows number of security-related events. The timeline is divided in buckets and the number of events relates to each of the buckets. The bucket lengths grows proportionally with the time window. The number on the right-hand side shows number of security events in the most recent bucket. The number’s background color changes with the number and is green when smaller or equal to five, orange bellow ten, and red above.

_images/mon_home.png

Figure 1: Screenshot: ABC Monitor Home Dashboard

A snapshot of the home dashboard can be produced and sent as PDF attachement by Email every morning if a recepient email address is configured under “Settings ‣ Reports: e-mail address for everyday reports”.

All other dashboards are similarly organized. While they are organized along different aspects of SIP operation and show therefore different data, the visual structure follows the same pattern. In the very top, there are filters that allow to limit the events based on various criteria. Using the filters is described later in Chapter Using Filters. In the mid part, there are various graphical elements showing either some aggreagated values, or their history over time. In the very bottom, there is a list of the actual events. The events can be clicked on to unfold and view all details.

The most frequently used dashboards are the following:

  • “Call Dashboard” shows history of calls, analysis of failures and QoS reports. This is helpful to identify call trends, volumes, reasons of call failures, and troubleshooting specific calls. It provides both aggregate view of the situation as well as the possiblity to review call details. More details are shown in Chapter Calls Dashboard.
  • “Security” shows all Security events, also organized by the most active IP addresses and geo-locations.
  • “Toplists” shows most active users by various metrics: call attempts, call minutes, number of short calls, etc.
  • “Exceeded Limits” helps to alert on abnormal situations, such as excessively many phone calls from a single IP address. See Chapter Exceeded Limits Dashboards for more details.
  • “Overview” holds all events relating to a user. There are really many. This is mostly useful when an administrator begins to suspect a problem and wants to see full user’s history. Often administrators get alerted on a user when reviewing security dashboard, toplists, or exceeded limits, set up filter for such user, and inspect then his full event history. Additional details are shown in Chapter Overview Dashboard.
  • “Connectivity CA” operates at topology layer and shows how good peering between calls from one Call Agent to another Call Agent is. This helps to identify Call Agents with poor performance, and/or Call Agents that have troubles making calls with each other. See Chapter Connectivity CA Dashboard.
  • “Registration” shows registrations: new, expired and deleted, which transport is being used, from which part of the world are the registrations coming, and what SIP equipment is being used. This dashboard is useful for troubleshooting SIP user’s connectivity. More information can be found in Chapter Registration Dashboard.

There are even more dashboards that provide less aggreagated data that is useful when trying to understand a low-level problem.

  • “Diagnostics” collects troubleshooting information. If an SBC administrator chose to record PCAP files, WAV files, produce custom event, or an unusual OS situation is reported, it appears here. See Section Diagnostics Dashboard for more information. Diagnostic events relating to layer-3 and layer-4 are separated in “Transport” dashboard and provide useful information to detect frequent retransmissions, or various TLS handshake failures.
  • “Connectivity” is a dashboard operating at URI level that shows the most active caller-callee pairs.
  • “Network Statistics” shows low-level data such as number of parallel calls, active registrations, bytes sent and received, etc. See Section Network and Statistics Dashboard. “Realms Stats” is a subset of network statics, broken down by realms.
  • “Systems” shows how SBCs are doing in terms of memory and CPU. Useful to identify overload situations. See Section System Dashboard.

7.2.1. Events (optional)

Note that producing events is an optional feature that requires an additional licensing option and the ABC Monitor software.

The events collect a detailed history of user activities. With this data, it is possible to review in detail the history of a specific user as well as the whole SIP service. The events are produced by the ABC SBC in course of processing SIP and RTP traffic whenever some relevant action occurs: a call was attempted / established / terminated, a bandwidth threshold was applied, etc. Administrators may also choose to generate their own custom events.

The events are keyed by SIP address and IP address so that history of specific users can be easily established.

All events include three types of fields: mandatory, type-specific and call variables.

Every event includes mandatory fields identifying which SBC reported on which activity and when (“timestamp”, “event-type” and “sbc” ). Events relating to a SIP message include SIP addresses (“attrs.from”, “attrs.to”) and source IP address (“attrs.source”). Call-end events include “attrs.duration” expressing call length in seconds, whereas reg-new events show the address of a newly registered SIP device in “attrs.contact”.

Last but not least, script processing variables can be passed along with the events – this can be for example useful to “label” the events with “tags” assigned to a call during call processing, such as “domestic”, “long-distance” or “emergency”. Some visualizations in ABC Monitor specifically require that an administrator sets well-known variables, as shown in Figure Screenshot: ABC Rule for Setting a Destination Country Code by Request URI Prefix and Screenshot: Setting Minute Counter Call Variable.

The ABC Monitor also enhances the events by additional fields used for security level assessment, geographical location, QoS information, and other data used for further analysis.

The following table shows content of a call-start event that is always produced when a successful INVITE transaction sets up a call. Internal fields with a leading underscore in name not shown.

Field Value Comment
@timestamp 2016-03-31T12:17:02.000Z GMT event timestamp
@version 1 internal version number
type call-start event type
attrs.call-id 65601f8e625e0fb6484 ... SIP callid of the call
attrs.dst_ca_name pstn_gateway name of Call Agent to whom the call is forwarded
attrs.dst_rlm_name sipgate name of destination realm
attrs.sbc 3e440ca4-00ee id of the reporting SBC
attrs.src_ca_name users Call Agent from which the request came
attrs.src_rlm_name public realm from which the request came
attrs.from sip:0000@172.27.10.114 SIP From URI
attrs.from-ua VQM 0.4 User Agent Client type
attrs.method INVITE SIP Request method – always INVITE for call-start
attrs.r-uri sip:echo@free.tel SIP Request URI
attrs.sip-code 200 Numerical code of SIP reply always 200 for call-start
attrs.sip-reason OK Human-reasonable reason phrase in SIP reply
attrs.receiver_ip 192.168.0.111 SBC’s IP address at which it received the INVITE
attrs.ruriip free.tel host part of request URI
attrs.scenario call this is a scripting variable chosen to be passed along with the event
attrs.source 10.0.0.10 source IP address of the request
attrs.src-port 1085 source port number of the request
attrs.to sip:echo@free.tel To URI as in the INVITE request
attrs.to-ua F-PBX 2.3 Signature of the called party’s UA Server
attrs.transport udp transport protocol used for signaling
id 483B139F-56FD153E... internal ID. useful for correlating multiple related events

The rest of this Section is structured by the event types that the ABC SBC produces:

  • Call Processing Events – these are events that describe SIP calls and are mostly used to observe user habits, reasons for call failures, and QoS
  • Registration Events – these are events that describe how SIP devices register with the SIP service. The events show the reachability of the SIP users in time.
  • Diagnostics Events – these events help to identify unusual traffic patterns, misconfiguration of the service and other irregular situations.
  • Security Events – these events report on SIP traffic which may possibly indicate attempts to compromise security of some SIP users or the SIP service as whole

7.2.1.1. Call Processing Events

These events are generated automatically at different stages of the SIP call establishment process, see Fig. SIP call processing events.

  • call-start: generated after a successful call establishment. The method is always INVITE and sip-code 200.
  • call-attempt: generated after an unsuccessful attempt to establish a call due to caller canceling the call, callee declining it, or a timeout. Failed authentication attempts are reported on in separate events. The events always include the SIP code with which the call attempt was rejected.
  • call-end: generated after an established call is terminated. They include a full report on how the call completed. The From and To event fields take the same values as call-start event – they signify who initated the call (and not who initiated the call termination). The event-specific fields include:
    • The field “originator” specifies who caused the call termination and can take the following values: “caller-terminated”, “callee-terminated”, “call-length-terminated” (SBC terminated upon exceeding the maximum call length limit), “no-ack” (SBC terminated due to missing ACK), “rtp-timer-terminated” (SBC terminated upon RTP inactivity), “session-timer-terminated” (SBC terminated upon session timer expiration), “admin-control-terminated” (administratively terminated from GUI), “internal-disconnect” (call terminated due to a internally transferred call), “reply” (negative response received on an established dialog: 404, 408, 410, 416, 480, 482, 483, 484, 485, 502, 604), “server-shutdown” (server process terminated due to a SIGUSR1 or SIGUSR2 signal), “srtp-failure” (SRTP key negotiation failure), “internal-error” (internall error).
    • The field “duration” specifies the length of call in seconds.
    • The fields “rtp-stats-a” and “rtp-stats-b” represent the RTP statistics for the media streams on each call leg. Each call leg contain one or more media streams, each of which offer incoming and outgoing information.

The following table shows the information for the incoming media streams.

Field Value Comment
ssrc 413934793 incoming SSRC value
src_ip 192.168.0.155 source IP address
src_port 37454 source port
dst_ip 192.168.0.155 destination IP addres
dst_port 46920 destination port
payload PCMU/8000 media payload
packets 52832 received packets
bytes 9087104 received bytes
last_seq_nr 54428 last received sequence number
max_delta 3 maximum delta between two packets
max_delta_seq 21397 sequence number of the packet with the maximum delta
max_burst 69 maximum number of packets per second
lost_percentage 0 lost percentage
jitter 6 jitter
dropped 0 packets dropped
seconds_since_last_received_packet 0 seconds since last received packet

The following table shows the information for the outgoing media streams.

Field Value Comment
ssrc 473964392 outgoing SSRC value
src_ip 192.168.0.149 source IP address
src_port 27054 source port
dst_ip 192.168.0.155 destination IP addres
dst_port 26120 destination port
payload PCMU/8000 media payload
packets 52832 received packets
bytes 8710432 received bytes
last_seq_nr 24326 last received sequence number
lost_percentage 0 lost percentage
rtt_min 5 minimum round trip time
rtt_max 172 maximum round trip time
rtt_avg 26 average round trip time
jitter 6 jitter
seconds_since_last_sent_packet 0 seconds since last sent packet

The call-end and call-start events have the same ID which can be used for correlation. This is however more often used for correlation with other events, like recording for example, because there is no additional data in call-start beyond call-end.

_images/10000000000003B8000000B40A21BA92.png

Figure 2: SIP call processing events

7.2.1.2. Registration Events

The registration events are generated automatically at different stages of processing SIP REGISTER requests when register caching is enabled (see Section Registration Caching and Handling).

  • reg-new. This event is produced when a SIP User Agent registers a new contact through the ABC SBC.
  • reg-del. This event is generated when a SIP User Agent deregisters a previously registered contact using the RFC3261 procedures. This is typically the case when a softphone is shut-down and it unregisters gracefully. Some clients are also implemented a way that they unregister and re-register newly instead of periodically renewing one registered binding. See an example of such in the ABC Monitor snapshot in Figure Event Timeline for a SIP Device Registering and Unregistering Periodically – unregistered bindings are immediatelly followed by new registrations.
_images/fritzbox.png

Figure 3: Event Timeline for a SIP Device Registering and Unregistering Periodically

  • reg-expired: Indicates an expired registration binding. This happens if an upstream SIP client fails to renew its contact within the window re-registration window agreed upon between the ABC SBC and downstream registrar. When this timer expires, the binding will be deleted and no incoming requests can be forwarded using the particular binding. This often happens with clients that don’t comply to RFC 3261 by not respecting the server-side-imposed registration renewal interval or vary the contacts inadequately. Example of a timeline for such a devices is shown in Figure Event Timeline for a SIP Device Failing to Re-register Timely. Such a devices remains unreachable in the periods of time between the orange expiration and green re-registration bars.
_images/communigate.png

Figure 4: Event Timeline for a SIP Device Failing to Re-register Timely

7.2.1.3. Diagnostics Events

The diagnostic events are used to identify conditions that provide additional diagnostics information and sometimes alert on conditions an administrator shall verify whether they are normal. These events are triggered by minor errors, completed playing of or recording of WAV/PCAP files, change in status of transport blacklisting, and custom events.

  • Custom events (action-log): This is one of the most important diagnostic events available in the system. It is triggered from the ABC SBC rule-base using the “Log Event” rule action. Typically it is used when an administrator wants to see if a specific rule is indeed evoked and how often. Also administrators use the custom events when they begin to suspect some undesirable traffic and don’t want to drop it yet. This action allows them to observe the suspicious traffic before making a further action. The conditions can be any that the rule definitons allow and often includes tests if a SIP device is registered, shows a suspicious User Agent type, tries to call a premium phone number or otherwise falls in the “suspicious category”. For example an administrator may choose to observe all SIP messages that come to the ABC SBC without a username in To header field URI. Example of such a “Log event” rule is shown bellow in Figure Rule for Reporting on SIP Requests with Empty Username in From.

    _images/log-3-event-rule.png

    Figure 5: Rule for Reporting on SIP Requests with Empty Username in From

    The action includes a parameter where the administrator can specify additional text describing the event. The parameter can include replacement expressions providiong additional information about the processed SIP message. These should be used only if necessary – when varying elements are present in event description ABC Monitor software cannot group the events by the same description.

  • recording events (recording): These events are generated if voice recording was enabled for a particular call. The events include HTTP reference to a file with the recorded WAV file. See also Section Audio Recording.

    _images/recording_event.png

    Figure 6: Recording Event

  • SIP traffic logging events (message-log): These events are generated only if a rule was set up to record SIP/RTP traffic using the Log received traffic action (see Section Diagnostics Dashboard). The event includes references to the recorded PCAP file and a ladder chart displaying the recorded traffic. Sometimes it may take extra time until the links point to a completed file because PCAP processing runs in background on low priority. Figure Screenshot: Ladder Diagram for the Suspicious User shows an example of such ladder-chart rendered by the ABC Monitor and showing both sides of a SIP message – incoming and outgoing.

  • Error/Alert (error): These events are always produced when no route matches for a SIP request, a TLS connection is refused, terminated or other unspecified error occurs. The ABC SBC reports these because TLS credential management is often misconfigured and needs to be fixed for TLS clients to be able to connect. Alerts may also appear if the system is underdimensioned and include messages like “/data disk usage above 80%” or a misconfiguration has been encountered in rule-base that was detected run-time (for example: routing failed: can’t parse outbound proxy URI: 192.168.0.85).

  • Notice (notice): These events are produced when some layer-3 or layer-4 conditions change. This is currently the case when a TLS connection opens or closes succesfully, or when health status of a Call Agent changes so that it is either removed from or added to a transport blacklist. (See Section IP Blacklisting: Adaptive Availability Management). Too many SIP retransmissions event notice events are generated if a SIP transaction reaches the defined number of retransmissions. The retransmission number triggering the event is globally configured under “Config ‣ Global Config ‣ Events ‣ Generate an event if a SIP transaction reaches ..”. The default value is 0, which disables the event notification.

  • Prompt: These events are always produced when a caller’s attempt is handled using local audio announcements (see Section Playing Audio Announcements).

  • dest_monit: The ABC SBC reports these events when availability monitoring is enabled for a Call Agent (see Section IP Blacklisting: Adaptive Availability Management). In ABC Monitor the events are visualized in the CA Availability diagram in the “Connectivity CA” dashboard as shown in Figure Screenshot: Call Agent Availability Lanes.

7.2.1.4. Security Events

This Section discusses events that have relevance to security of a SIP service. These security events are generated when messages are dropped because of failing to accomodate a security policy. This can be because the traffic has exceeded traffic limits, a drop action has been applied, authentication failed or an unfavorable SIP answer came from downstream in response to a SIP request.

Counter-measures to fend off security attacks are discussed in a separate Section Securing SIP Networks using ABC SBC and ABC Monitor (optional). The event types are the following:

  • limit: These events are generated if some of the traffic constrains (see Section Traffic Limiting and Shaping) has been exceeded. For example an administrator may choose to ban signaling traffic from an IP address if it sends more than 10 requests per a minute. See Figure Limit Events for example of limits reporting on traffic shaping in effect. The limit event type is also generated when current traffic volume exceeds limits set by the ABC SBC software license.

    _images/limit_events.png

    Figure 7: Limit Events

  • message-dropped. These events are generated if a message was silently discarded using the drop action in A or C rules. Sender of the discarded traffic will not see any answer to his request. Note that if he is probing the service using TCP he will still be able to find out if there is running service. (Section Manual SIP Traffic Blocking provides more details on blocking traffic using the drop action).

  • auth-failed. This event is triggered always when a SIP request authentication fails. Note that the way the SIP protocol works, an initial request is always challenged by server using the 401/407 replies. This initial challenge does NOT trigger the event. Only when the subsequently re-submitted request with credentials fails to authenticate and yields a 401/407 answer, the auth-failed event is generated. The reason why a request fails to authenticate may be multi-fold and needs deeper examinations. A SIP phone user may fail to configure his device with proper SIP URI and/or password. It may be administrative mistake on the server side such as deactivating a user account. However it may be also a password-guesing attack, such as shown in Figure ABC Monitor Displaying How a Brute-Force Password Guessing Attack Ramps Up. The sudden increase in number of authentication failure clearly indicates an attempt to breach security of the SIP service.

    _images/auth-attack-rampup.png

    Figure 8: ABC Monitor Displaying How a Brute-Force Password Guessing Attack Ramps Up

  • log-reply. The log-reply event allows an administrator of the ABC SBC to identify traffic that apparently irritates downstream SIP equipment. If for example a downstream server chooses to send a 604 for requests that use non-existent SIP URIs, the ABC SBC may be configured to report on such using the log replies action as shown in Figure Rule for Reporting All 604-replied Responses. Similarly the events can be generated on receipt of any other specific reply codes, such as 403 (Forbidden).

    _images/log_replies_action.png

    Figure 9: Rule for Reporting All 604-replied Responses

    An example of events captured during a scanning attack is shown in Figure Events Produced During a Scanning Attack. In the “To” column you actually see that the attacker was trying to register under different numbers beginning with 12: 122667, 12554, 122562, etc about every two seconds. When he hit a non-existing account (12554 for example), a 604 came back and triggered a “log-reply” event. However as he tried 12667, a 401 came back revealing to him that he had “pinged” an existing account, just without proper credentials.

    _images/scanning_attack.png

    Figure 10: Events Produced During a Scanning Attack

  • firewall-blacklist. These events identify blocked IP addresses. They are are generated when an ABC SBC choses to drop traffic from an offending IP address. See Section Automatic IP Address Blocking for configuring criteria for automated IP address blocking in an ABC SBC.

  • firewall-greylist. These events are generated when an ABC SBC choses to drop traffic from an IP address that sent the ABC SBC some initial traffic but has not managed to establish trust. See Section Automatic Proactive Blocking: Greylisting for configuring blacklisting in an ABC SBC.

7.2.2. HOWTO Find a Needle in the Haystack: Iterative Event Filtering

The ABC Monitor combines both aggregated view of event data as well as the actual data details. This is instrumental in finding problems quickly: the aggregated view helps to detect a trend or anomaly which would be hard to find in the vast amount of SIP data. Once a situation worth further investigation is detected, the administrator can apply different filters consequtively until the root cause is identified using event details. The details can go as low as bits of SIP messages.

In this chapter, we will show an example how to iteratively proceed from detecting a high-level problem to finding the low-level bits triggering it. The examples are taken from a real operation and therefore many of the elements in the screenshots are shaded.

For example, administrator may find in Call Dashboard that average call failure ratio over 50% is too high, see that most frequently occuring call failure reason is 480 (“User Offline” in SIP specification), and start nailing the problem down by applying event filters.

This is the situation shown in Figure Screenshot: Example of a High Call Failure Rate Situation. Average failure rate is at 68%, the most massive error code is 480.

_images/mon_call_failures.png

Figure 11: Screenshot: Example of a High Call Failure Rate Situation

What the administrator typically does in such a situation is to spot the unusual trend, and inspect its details. Here the abnormality is the unusual number of 480s, in fact the typical top error codes are Busy (486) and Canceled (487). Therefore the administrator will limit the events to those with the SIP code of 480. He does so by clicking the “plus magnifying glass” icon underneath the 480 code, and “pinning” the filter using the pin icon in the top bar so that the filter can be transfered to a Dashboard like “TopLists”. The filter looks like in Figure Screenshot: Filter for Further Inspection of too Many 480s. The statistics have changed because the filter limited inspected events only to call attempts failing with 480 code, and also because the time window has slightly advanced in the meantime.

_images/mon_480s.png

Figure 12: Screenshot: Filter for Further Inspection of too Many 480s

When we now switch to the Toplist, we will find out that vast majority of the 480-terminated call attempts is coming from a single domain, and inside this domain an anonymous user is dominating. (Figure Screenshot: The Top 480-er) While we do not know, if it is a user trying to desperately reach an offline called party, or someone scanning calls, we can pin a filter by caller, deactivate the filter by error code and see the full history of the suspicious user in Overview.

_images/top-480er.png

Figure 13: Screenshot: The Top 480-er

The Overview gives us a picture of a suspicious user who keeps making call attempts at a high rate without success and also without attempt to register. This is seen in Screenshot: Complete Event History of a Suspicious User.

_images/mon-the-480er.png

Figure 14: Screenshot: Complete Event History of a Suspicious User

It pays off therefore to scroll down and look at event details for this particular user. Not only is there a detailed report on the call attempt, but under “View Messages” there is a link to ladder diagram showing the actual SIP message exchange.

_images/mon-call-attempt-details.png

Figure 15: Screenshot: Complete Event Details

_images/mon-ladder.png

Figure 16: Screenshot: Ladder Diagram for the Suspicious User

The ladder diagram shows the message flow and its timing, as well as details of SIP message, including From and To identities, type of SIP User Agent, and SDP media negotiation payload. The capability to view internal perspective of traffic is very valuable – if traffic is encrypted, it is difficult for a troubleshooter to inspect its content. Also if the SIP traffic is obfuscated by use of topology hiding (see Topology Hiding), it would be difficult to relate incoming to outgoing SIP dialogs without the internal perspective.

In summary, we have shown in this example how to detect unusual situations using aggregated views (too many 480s), filter out events specific to the situation, find a user that has caused most of them and inspect in detail his gap-free history and even SIP message details. This iterative process gives every adminsitrator powerful tools to find out what is going on in a SIP service, and have good information to decide if he is dealing with an abnormally active user, malicious attack or a network misconfiguration.

7.2.3. Using Filters

As shown in the previous chapter, filters are the essential instrument for finding out what is going on. In this chapter we describe all of the filter types avalable in ABC Monitor. There are data filters, type filters, time filter, and full-text filters. Multiple data filters can be combined, in which case events will be sorted out that match ALL of them. All of the filters appear in the most upper part of the dashboards. For example, events can be restricted to all but registration events, as long as they relate to an IP address, lead to a “480” SIP failure code and are for a given SIP user, as shown in Figure ABC Monitor screenshot: Example of a Combined Filter.

_images/mon-example-filter.png

Figure 17: ABC Monitor screenshot: Example of a Combined Filter

Data filters are used to filter out all events with the same data field values. They can be created from many elements shown in the dashboards. When user hovers over most of the data elements shown in the dashboards, two magnifying glass icons with plus and minus symbol will appear. By clicking on either icon, a filter is created that restricts events to those that either have (plus icon) or do NOT have (minus icon) the same value.

For example, one can visit the Call Dashboard, review the most frequent SIP error codes, and filter out call attempts relating only to 403 (Forbidden) as shown in Figure:

_images/mon-magnifying-glass.png

Figure 18: ABC Monitor filtering out 403-ed call attempts in Call Dashboard

Every data filter can be deleted, deactivated, and importantly pinned – as shown in the example in the previous chapter, a pinned filter can be transfered to another dashboard where some specific aspect of the filtered data is easier to find. To pin a filter, hover over it and click on the pin icon. Unpinning is done the same way.

Other possibilities to adjust filters include temporary deactivation using the checkbox icon (filter appears then dimmed), permanent deletion using the thrash bin icon, and filter negation using magnifying glass icon (negated filters appear in red).

_images/mon_filter_types.png

Figure 19: Filter Alternations: pinned, pinned and negated, deactivated

Type filters checkboxes are shown in the top-bar and allow to easily restrict events by their respective types. This is particularly useful in dashboards with many event types, like in Overview, when administrator wishes to filter out events unrelated to his case.

Time filter sets the window of inspect time either absolutely, or relatively to current time. Of course, it can only cover the time period for which events are stored, as configured during the ABC Monitor installation.

Last but not least, the top search box allows to add full-text filter that looks for a pattern in multiple fields of the available events. By default, the full value, such as CallID must be included. Special terms can be used as follows:

  • *, asterisk, stands for a wildcard and can substitute for any number of any characters. Use of wildcards slows down the search.
  • \, backlash, means that the subsequent character will be interpreted literally
  • a colon-separated <name>:<value> pair means that the searched value is looked for only in a field of the given name
  • combinations of the terms are possible: AND allows to introduce multiple conditions, all of which must be met; OR allows to introduce multiple conditions, any of which must be met; NOT allows negation
  • also the syntax “attrs.source:[from_ip TO to_ip] allows matching the event source IP address against an IP address range

Therefore if there is a user Wesley making calls using his SIP address sip:wesley@frafos.net to reach the SIP address sip:123456@example.net and he makes the calls from an IP address 192.168.0.85 belonging to the Call Agent “wesley-net”, the following search expressions will match:

  • sip\:wesley@frafos.net will match all calls from/to Wesley; note that colon must be preceeded by backslash, otherwise the ABC Monitor would attempt to search through a field named sip
  • *wesley* will match all previous records, and probably some more as well, such as wesley.home and wesley.office. It will also match all calls from and to the Call Agent “wesley-net”.
  • attrs.dst_ca_name:wesley-net will match all calls towards the Call Agent wesley-net
  • 192.168.0.85 will match all events relating to that IP address.
  • attrs.source:[192.168.0.0 TO 192.168.0.255] will also match because Wesley’s IP address is inside the range
  • 487 will match all call attempts that failed with 487 SIP code
  • attrs.sip-code:487 OR attrs.sip-code:486 will match all call attempts that failed because of 487 (cancelled) or 486 (busy)

The following search expressions will not match:

  • wesley will not match, because full-match is attempted without wildcards
  • sip:wesley@frafos.net will not match because of the colon
  • NOT attrs.source:[192.168.0.0 TO 192.168.0.255] will certainly match many events but not Wesley’s as his IP address is in the negated IP range
  • attrs.duration:[500 TO *] will filter out all calls exceeding 500 seconds in duration

7.2.4. Overview Dashboard

The Overview Dashboard displays events of all types. This is often used, when inspecting a gap-free history of a specific IP address or user identified by a URI.

In the following example, we look at traffic generated in the frafos.net domain. We let a user to register using wrong password (failed-authentication event), then retry using a correct password (register-new), make a call to an annoucement (prompt, and also message-log because administrator chose to store all SIP traffic on this SBC, and action-log because administrator chose to issue a custom event for calls to a specific destination).

_images/mon-gapfree.png

Figure 20: Example: gap-free history of all events in a domain

Some other interesting chart in the Overview dashboard is that depicting total number of events by time. Especially finding a disproportionally high number of a specific event type indicates an unusual situation. For example a high number of greylisting events failures as shown in Figure Total Events with Disproportionally High Number of Greylisted IPs typically signifies a security attack.

_images/mon-manygl.png

Figure 21: Total Events with Disproportionally High Number of Greylisted IPs

7.2.5. Calls Dashboard

The Calls Dashboard analyzes call-related events (see Section Call Processing Events) to summarize processed SIP calls and how their quality was.

A screenshot of the top part of the dashboard has been already shown in Figure Screenshot: Example of a High Call Failure Rate Situation. From top to the bottom, there are call statistics, call events timeline, and breakdowns of successful calls by termination party and and final status.

The piechart breakdowns help to identify in detail why calls are being terminated. The left-hand side piechart shows who terminated established calls. The normal termination types are “caller-terminated” and “callee-terminated”. However calls could have been also terminated by the ABC SBC for a variety of reasons. These include “no-ack” when a caller failed to deliver the SIP ACK request, “rtp-timer-termianated” when RTP media stopped flowing without clean SIP session termination. See the Section Call Processing Events for the full list. The right-hand side piechart shows both successfully established calls and failed call attempts and structures them by status code in the outer ring. The status codes are categorized in the inner circle intro three groups: success (200-answered INVITEs), userfailure (486 Busy and 487 Canceled) and network failure (everything else). Clicking on a piechart segment allows to introduce a filter for the events of the same kind.

_images/mon_call_breakdowns.png

Figure 22: Screenshot: Call Completion Status Breakdown

The lower dashboard part is shown in Figure Screenshot: Lower Part of the Call Dashboard and includes call durations, break-down of calls by countries, and eventually quality details of QoS-troubles calls and the call event details.

_images/mon_calls_low.png

Figure 23: Screenshot: Lower Part of the Call Dashboard

Note that break-down of calls is calculated differently for source and destination. The source country is determined using the request source address, whereas the destination country is determined from request-URI using knowledge of the SIP service’s dialing plan. To accomplish the latter an administrator must tag the calls by a country tag in the ABC SBC. To do so, he must have knowledge of used dialing plans and set the call variable “dst_cc” in ABC SBC rules to proper country codes using the “Set Call Variable” Action, see Figure Screenshot: ABC Rule for Setting a Destination Country Code by Request URI Prefix.

_images/mon-dst-cc.png

Figure 24: Screenshot: ABC Rule for Setting a Destination Country Code by Request URI Prefix

The next section highlights calls with suboptimal VoIP quality. Especially of importance are calls with the attribute “attrs.rtp-direction” set to “oneway”. That means that for that call, media has been received only in one directon. This is an irritating VoIP phenomena which typically occurs when there are NAT connectivity problems for the affected user.

Last but not least in this dashboard, there is the list of call details. If PCAP and/or WAV recording was enabled for the respective calls in ABC SBC rules, the files or ladder diagrams (see example in Figure Screenshot: Ladder Diagram for the Suspicious User) can be downloaded from the unfolded event details. QoS reports are included in the call-stop events in JSON format. The reports include two parts, one for the media streams from and to the caller, and another one for the streams from and to the called party. The values have the following meaning:

  • max_delta stands for the maximum interarrival packet gap of received packets. Values above 120 ms for audio are already high and indicate a gap which could have occured due to muting or voice inactivity detection without marking it as such.
  • loss percentage shows relative number of packets lost [2]. Values above 1% show lossy networks, values bellow 5% can be often tolerated by listeners.
  • jitter [3] shows variation in packet transit delay. High value above 120 ms typically indicates network congestion and results in dropping of late-arrived packets.

The following example shows such a QoS report. The first bracket pair encloses records about packet streams from caller as seen by ABC SBC and to caller as reported by caller’s RTCP reports. The second bracket pair reports quality on streams from and to callee both of them with perfect QoS:

[{"dir":"in", "ssrc":"1864183198", "src_ip":"192.168.0.155", "src_port":"20518", "dst_ip":"192.168.0.155", "dst_port":"17342", "payload":"PCMU/8000", "packets":"15670", "expected":"16204", "bytes":"2695240", "last_seq_nr":"42433", "max_delta":"14851", "max_delta_seq":"41514", "gaps":"8", "lost_percentage":"3.295482596889657", "jitter":"13", "dropped":"0", "seconds_since_last_received_packet":"0", "MOScqex":"3.420"},

{"dir":"out", "ssrc":"1618416588", "src_ip":"192.168.0.155", "src_port":"17342", "dst_ip":"192.168.0.155", "dst_port":"20518", "packets":"16271", "bytes":"2798612", "last_seq_nr":"16280", "lost_percentage":"6", "jitter":"39", "rtt_min":"114", "rtt_max":"1150", "rtt_avg":"164", "seconds_since_last_sent_packet":"0"}]

[{"dir":"in", "ssrc":"1618416588", "src_ip":"77.178.115.37", "src_port":"7078", "dst_ip":"12.89.111.155", "dst_port":"24540", "payload":"PCMU/8000", "packets":"16270", "expected":"16279", "bytes":"2798440", "last_seq_nr":"16280", "max_delta":"80", "max_delta_seq":"6453", "gaps":"9", "lost_percentage":"0.05528595122550525", "jitter":"0", "dropped":"0", "seconds_since_last_received_packet":"0", "MOScqex":"4.40"},

{"dir":"out", "ssrc":"1864183198", "src_ip":"12.89.111.155", "src_port":"24540", "dst_ip":"77.178.115.37", "dst_port":"7078", "packets":"15671", "bytes":"2695412", "last_seq_nr":"42433", "lost_percentage":"0", "jitter":"163", "rtt_min":"25", "rtt_max":"28", "rtt_avg":"25", "seconds_since_last_sent_packet":"0"}

]

7.2.6. Registration Dashboard

The registration dashboard helps to figure out if registration works for SIP users and also identify where they are coming from by analyzing registration events (see Section Registration Events). Events reporting on expired registrations are of particular concern because often they mean a user cannot be reached by signaling messages. Most often this is caused by broken home routers, corporate firewalls with a too strict policy, or imperfect SIP client implementations that ignore some important nuances of the SIP RFC3261 contact registration handshake.

The dashboard is structured in several parts shown in Figure Screenshot: Registration DAshboard. Bellow the statistic is also a list of the actual registration details (not shown in the Figure).

The first row shows a timeline of the registration events. Our screenshots shows a usual situation in which in every time bucket the number of new registration is about the same as number of deleted and expired registrations. Unusual situations that can be captured here are connectitivy outages demonstrated by an increase in expired registrations. Note that SIP devices that register properly and keep re-registering do not produce events as they cause neither a new registration, nor a deleted/expired one.

The second row shows a geographic map of registration events. This gives a rough idea how the users are distributed in the world, even though it is not perfect. That’s because the map really shows the events. As mentioned in previous paragraph, not every registered user must be producing registrations events in the examined period of time.

The next row shows use of transport protocols as reported in the registration events, these may include UDP, TCP, TLS and Websockets.

Finally we see the breakdown of SIP User-Agents, here FritzBox being the device producing more reigstration events, and user accounts that expire most often – probably as result of some NAT traversal difficulties.

_images/mon-registrations.png

Figure 25: Screenshot: Registration DAshboard

7.2.7. Connectivity CA Dashboard

This dashboard that came with 4.0 release focuses on topology and visualizes statistics for calls between Call Agents. This helps to discover situations such as a destination CAll Agent failing abnormally often to complete calls, or SIP compatiblity issues on a link from one CA to another. The numbers visualized in this dashboard refer to the currently chosen time window, as is the case with all other dashboards. The screenshots shown here visualize a situation with five call-agents.

There are two graphs in the top row that provide a quick glance at the situation. The first chart is a directed cyclic relationship graph showing how events, typically call-start, call-attempt and call-end, flow between Call Agents. The stronger the lines, the more traffic the events represent on this route.

The table on the right-hand side shows statistics for calls by destination call agent. This table allows to quickly find out signaling performance of the CA.

_images/ma_ca_connnectivity.png

Figure 26: Screenshot: Top Part of the Connectivity CA Dashboard

In the next rows there are several CAxCA matrixes that visualize the following characteristics of calls between source CA (Y axis) and destination CA (X axis):

  • number of call attempts
  • connection failure ratio, i.e. number of call attempts divided by sum of call-attempts and call-starts
  • duration of calls between CAs
  • number of completed calls

Darker colors represent higher numbers, hovering with a mouse over a field shows the actual numbers. In the example screenshot, the darkest failure ratio of 78.9% is shown for the pair proxy->users_WebRTC.

_images/mon_dst_cas.png

Figure 27: Screenshot: Bottom Part of the Connectivity CA Dashboard

The CA Connectivity Dashboard can be used for traffic analysis the same way as laid out in the Section HOWTO Find a Needle in the Haystack: Iterative Event Filtering. Administator starts by finding out some aggregated value which appears worth investigating. It can be for example an unusually high failure rate for a SIP connection from Call Agent “proxy” to Call Agent “users_webRTC” as shown in Figure Screenshot: a CA-CA connection Matrix with High Failure Rate. This connection shows 78.9% failure rate. It pays off therefore to investigate it in detail.

_images/mon-high-ca-failure.png

Figure 28: Screenshot: a CA-CA connection Matrix with High Failure Rate

Narrowing events down to those concerning this connection is as easy as clicking on that particular field in connection matrix and confirm the resulting filter as shown in Figure Screenshot: Applying a CA-CA filter.

_images/mon-ca2ca-filter.png

Figure 29: Screenshot: Applying a CA-CA filter

After applying and pinning the filters, one can switch to the Call Dashboard and inspect the failures for this particular connection in details. Here, one can find that 500 SIP responses dominate and inspect the details of the respective events.

_images/mon-ca2ca-500.png

Figure 30: Screenshot: Finding out the Root Cause of High CA-CA Failure rate

The bottom-most Connectivity Dashboard lane shows availability of the monitored Call Agents. Only Call Agents for which monitoring has been enabled are shown (See Section IP Blacklisting: Adaptive Availability Management). The 0 status represents a Call Agent that is reachable, all other values represent some kind of connectivity issues (unreachable, DNS-unresolveable, overloaded or returning a negative response).

_images/mon_availability_lanes.png

Figure 31: Screenshot: Call Agent Availability Lanes

7.2.8. Security Dashboard

Security Dashboard is perhaps one of the most important ones as it tracks events relating to security as explained in Section Security Events and attempts to answer the question where the attacks are coming from. Occurence of such events may indicate an attack that can compromise security of a SIP user or of the whole service. A detailed debate of security techniques recommended to fortify a SIP service against attackes is provided in Section Securing SIP Networks using ABC SBC and ABC Monitor (optional).

The security dashboard comes with three important charts in the Toplist section: the most frequent offenders by originating IP address, /24 netmask and geographic region.

Unless an attacker is mounting a sophisticated distributed attack, the top-list shows which IP address is causing the most of offending traffic. It is as easy as a single click of button to limit all events to those caused by the offending IP address, inspect these and undertake some appropriate security measures, blocking the IP address typically. Even if some more sophisticated attackers can send small batches of traffic from multiple IP addresses in the same subnet – they will appear on the /24 subnet toplist.

The geograpic map is also very important from the security point of view. Even though most attackers don’t avail of many IP addresses, they sometimes do use more than one subnet to stay under radar screen. As long as they do not use VPNs, these can be tied down by their geographic region.

An example of a situation in a public SIP service is shown in Figure Example of the Security Dashboard. It shows the most active IP addresses violating the SIP site’s policies, and also their break-down by subnet and country, China being the most active source of offending traffic.

_images/mon_sec_dashboard_40.png

Figure 32: Example of the Security Dashboard

7.2.9. Exceeded Limits Dashboards

Trying to find some unusual patterns may be sometimes a repetative task. Therefore it is possible to raise alerts when some abnormal conditions repeat too often. The ABC Monitor allows to configure such alerts under “Settings” and inspect the alerts in the Exceeded Limits Dashboard. There are several types of the alerts, that are described in subsequent subsections.

The dashboard is only of advisory nature: it highlites excessive traffic but does not take a further action. Administrator must act if he choses to. The following example in Figure Exceeded Limits Dashboard shows various such alerts as they occur over time. The donut chart breaks down the number of alerts by their type, the most offending URIs are shown in the top-chart on the right-hand side.

_images/toolongcalls.png

Figure 33: Exceeded Limits Dashboard

Threshold for the respective alert types can be configured from the Settings Menu.

_images/mon_settings.png

Figure 34: Alert Threshold Settings

7.2.9.1. Maximum Call Duration (max_duration)

This alert is raised when a call is completed that exceeded a maximum call length threshold. The threshold value is configurable under “Settings”. Default value is 10800 seconds (3 hours).

7.2.9.2. Too Frequent Call Attempts from a URI (call_start)

This alert is raised when a user identified by his URI makes too many call attempts. This can be caused for example by a SIP scanner. The number of attempts and the time-span are configurable under settings and default to 10 attempts for previous 10 minutes.

7.2.9.3. Too Frequent Call Attempts or Short Calls from a URI (scanners)

This is similar to the previous alert except very short calls bellow 0.5 seconds count towards the limit as well.

7.2.9.4. Repeated Traffic Shaping Violations from an IP (limit)

This alert is raised when too many limit events originate from a single IP address over a period of time. By default, 10 such limit event occurences over past 10 minutes will raise the alert.

7.2.9.5. Repeated Drop for an IP Address (message-drop)

This alert is raised when the rule action drop in an SBC drops an incoming SIP request from an IP address too often, by default 10 times in the past 10 minutes.

7.2.9.6. Too Many Authentication Failures from an IP Address (auth_failed)

This alert is raised when an authentication fails too many times from a single IP address. By default, 10 attempts in past 10 minutes from the same IP address will raise the alert.

7.2.9.7. Too Many Authentication Fsilures from a URI (auth_failed)

This alert is raised when an authentication fails too many times from a single URI. By default, 10 attempts in past 10 minutes from the same IP address will raise the alert.

7.2.9.8. Rapid Growth in Number of Security Events (security_metrics)

This alert is raised when the number of security events (drop, limit, auth-failed, log-reply) begins to grow too quickly.

7.2.9.9. A URI Active from behind too many IP addresses (many_IPs)

This alert allows to detect situations in which a user as identified by his From URI is spotted at too many IP addresses. This may be caused by both legitimate and ilegitimate behaviour. Sometimes users like to be reachable under the same URI at multiple destinations (office, home, second-home) or multiple call agents may be registered under the same call center’s SIP AoR. However it may be also a case of identity theft. The alert includes number of IPs found, and the actual IP addresses if there are fewer than five of them. The alert doesn’t repeat until next day.

7.2.9.10. Too Many Users behind a single IP Address (many_URIs)

This alert is triggered when events from too many users appear coming from a single IP address. This may be often legitimate when there are multiple users behid a home NAT, carrier NAT, or a PBX. The URI count is shown in the alert (countURI field) and so are the actual URIs if there are fewer than five of them (URIs field). The alert doesn’t repeat until next day.

7.2.9.11. Changed Country Alert (diff_country)

This alert is raised when a new registration, call attempt or call with the same From URI comes from a different country than previously in the past 24 hours. The alert may identify both legitimate cases (users or call-centers with presence in multiple countries) as well as identity theft. The field firstCountry shows the country that was encountered previously, geoip.country_name show the current event’s country name.

7.2.9.12. Too Many Minutes from a User (too_many_minutes)

This alert is raised for tagged calls when a user identified by his From URI address makes too many call minutes in the observed period of time. By default 7260 minutes in the past two hours will trigger the alert. The field “durationSum” shows the offending number of seconds.

To tag calls to count against the many-minutes alert, set the call variable minute_counter to the value enabled. This can be particularly useful in topologies when a call on a way to and from a PBX passes the SBC more than once, see Figure Screenshot: Setting Minute Counter Call Variable.

_images/sbc_minute_counter.png

Figure 35: Screenshot: Setting Minute Counter Call Variable

7.2.9.13. Underperforming Destination Call Agent (poor_failure_ratio_ca)

This alert is raised when the number of call attempts is relatively high to the number of successful calls. This may often be the case when a destination Call Agent begins to be overloaded. The alert is raised when the failure ratio exceeds 90%.

7.2.10. System Dashboard

The system dashboard shows utilization of the ABC SBC linux operating system: system load, memory and CPU.

The example screenshot shown in Figure System Dashboard shows utilization of an SBC. The situation here is normal as all the values keep oscilating within a fixed range.

_images/mon-system-40.png

Figure 36: System Dashboard

7.2.11. Network and Statistics Dashboard

The network statistics dashboard shows amount of traffic processed by all of the managed SBCs, both at high-level (number of calls and registrations) and low-level (number of bytes and packets). It also shows statistics of automated blacklisting.

The example in the Figure Network Statistics Dashboard Capturing a Failover Situation shows a typical situation on a public SIP service. The number of registrations remains fixed over time at about 3 thousands. Parallel calls peak at 8 PM, and a moderate number of auto-blacklisted IP addressed reaches slightly above one hundred. The number of greylisted IP addresses is quite high though: at 70,000 and constantly increasing. That’s a clear evidence that the public SIP service is continuously subject to SIP scanning. The number of IP addresses that have passed greylisting is slightly higher than number of registrations: obviously some registrations and re-registrations occur in about same quantity, leaving the number of current registrations constant, and increasing number of IP addresses that have been accepted over time.

_images/mon-networksstats-40.png

Figure 37: Network Statistics Dashboard Capturing a Failover Situation

Particularly the number of calls and registrations is important – a dip often almost always indicates some abnormal network conditions. For example, if the SIP services loses its IP connectivity, SIP re-registrations will not reach it and subsequently the number of current registrations sinks down.

7.2.12. Diagnostics Dashboard

The diagnostics dashboard collects details that help with troubleshooting of low-level problems. Such may include SIP and SDP interoperability problems, or QoS problems. The dashboard shows events described in Section Diagnostics Events. ABC Monitor visualizes these events in severa dashboards: Diagnostics, Transport and Connectivity CA.

Most of these events appear only when activated by administrator in the ABC SBC rules. The key diagnostics feature the ability to store PCAP files of SIP/RTP and WAV audio files. Being able to retrospectively inspect these allow administrators to find a problem which appears only transiently and is hard to reproduce.

In order for these files to appear in the dashoard, the ABC SBC must be configured to produce them. Once configured for selected calls, as soon as they complete, ABC SBC uploads the resulting WAV and PCAP files to ABC Monitor. Eventually administrator can download them from the diagnostics dashboard by clicking on the respective event details. There is also a possibility to see the SIP traffic in from of a ladder chart as shown in Figure Screenshot: Ladder Diagram for the Suspicious User.

It is worth noting that this ABC SBC capability to report on the traffic as seen “from inside” is superior to the capability of external snooping-based monitoring equipment. The “insider view” allows to analyze such SIP traffic even when it is encrypted when on the net, or obfuscated using Topology Hiding (see Section Topology Hiding.)

To activate recoding of the SIP traffic one must use the action “Log received traffic”. When this action is called for a SIP call, the SIP signaling is recorded in PCAP file, optionally including RTP traffic.

_images/10000000000003C5000000676B8F84EF.png

Figure 38: Configuring traffic capturing

When recording completes, a “message-log” event is produced that includes a references to the stored PCAP file.

The parameter “PCAP file name” allows administrators to define their own filename for the PCAP files. Using Replacement expressions (see Using Replacements in Rules) one can include SIP message elements in the filename that may make identification and sorting the recorded files easier. If no filename is chosen, the ABC SBC chooses its own ephemeral filenames. A fixed name may result in mixed PCAPs for different transactions. If custom filename is being used, it is recommended to not use fixed filename but include some date or time variable replacement in the filename to make it unique. In any case, the filename is relative to the path /data/traffic_log to avoid conflicts with the filesystem. Use a filename with .pcap extension.

Note that this action cannot be used multiple times for the same call meaningfully. In such case an error is reported in the SBC process log and only the first used logging action takes effect. Also due the the “inside view” nature how the packets are captured, they may display some minor differences from the actual traffic as seen on the net. Specifically TCP headers are not shown for SIP traffic sent using TCP.

Custom events also appear in this dashboard and also require a proper configuration on the SBC side as shown in the Section Diagnostics Events.

7.2.13. Monitor Troubleshooting

Should the ABC Monitor itself become a bottleneck in a network, it is a good idea to check its status. To see the status page, open its URL with path “/status” as shown in the screenshot Figure Configuring traffic capturing. If the status shown on the page is not “green”, collect the statistics and contact frafos support.

_images/kibana_status.png

Figure 39: Configuring traffic capturing

Footnotes

[1]For in-depth discussion of packet bursts we recommend the following article: http://www.voiptroubleshooter.com/indepth/burstloss.html
[2]For in-depth discussion of packet loss we recommend the following article: http://www.voiptroubleshooter.com/problems/packetloss.html
[3]For in-depth discussion of jitter and its sources we recommend the following article: http://www.voiptroubleshooter.com/indepth/jittersources.html

Table Of Contents

Previous topic

7.1. Overview of Monitoring and Troubleshooting Techniques

Next topic

7.3. Live ABC SBC Information

This Page