Docs » Overview of SignalFx Microservices APM

Overview of SignalFx Microservices APM 🔗

SignalFx’s Microservices APM (µAPM) offers unique insights into distributed applications to enable application performance monitoring, troubleshooting, and root-cause analysis through end-to-end distributed tracing. Leveraging SignalFx’s advanced metrics and analytics capabilities, SignalFx Microservices APM automatically identifies the right traces to retain for problem isolation and data exploration, while also capturing 100% of metrics for all spans and traces.

SignalFx µAPM architecture 🔗

SignalFx µAPM begins with application instrumentation to capture end-to-end transaction traces from your distributed applications. Trace spans from an application are first sent to the local SignalFx Smart Agent, then to the SignalFx Smart Gateway. The Smart Gateway runs in your environment and acts as a central aggregation point for all trace spans sent from applications that are instrumented for µAPM (see illustration below). In addition to implementing SignalFx’s proprietary tail-based sampling of traces, the Smart Gateway is responsible for generating trace-related metrics and metrics metadata.

../../_images/arch-overview-1.png

The Smart Gateway can also be clustered to scale up for greater span-processing capacity, as shown below.

../../_images/arch-overview-2.png

The tracing metrics and the traces selected for retention are then sent to the SignalFx SaaS platform where they can be explored, visualized and analyzed.

Next steps 🔗

Review the rest of this section to become familiar with the basic concepts of distributed tracing, the features and capabilities of SignalFx Microservices APM, and follow the µAPM Quick Start Guide to learn how to get started sending traces to SignalFx.

Key elements of SignalFx µAPM 🔗

The following tenets make up SignalFx’s Microservices APM:

  • Instrumentation

    SignalFx is instrumentation-agnostic; customers can send spans and traces instrumented via Zipkin, OpenTracing, or OpenCensus libraries as long as they are sent using Zipkin’s v1 or v2 JSON wire format, or Jaeger’s Thrift wire format. Additionally, we can ingest data from a variety of service meshes such as Istio (via SignalFx’s mixer adapter) or directly from data planes such as Envoy or LinkerD. Aditionally, SignalFx provides out-of-the-box instrumentation for popular frameworks and libraries for languages, such as Java, Python, Ruby, etc.

  • NoSample™ architecture

    SignalFx implements tail-based sampling for distributed applications via our unique NoSample™ Architecture; the SignalFx Smart Gateway observes every transaction across distributed services and prioritizes interesting outlier traces, ensuring that customers will have traces with higher latency, errors, and rare execution paths as opposed to randomly selecting traces to keep, which often misses these important traces.

    If you would like a deep dive into our NoSample architecture, see this blog post.

  • Real-time streaming of traces and spans

    The SignalFx Smart Gateway generates metrics for each unique span and trace path, which enables SignalFx to help understand what “normal” looks like for any span or trace and how far away from normal is the trace or set of traces you’re looking at. These metrics are streamed in real-time via our streaming architecture and can leverage all the analytics and detector capabilities provided with the metrics product.

  • Application/infrastructure correlation

    Host metrics for each service, along with host information for each span, let you determine if the underlying infrastructure is contributing to your application’s performance degradation, helping you quickly isolate problems.

  • Service-Map enabled visualization and dashboards

    SignalFx’s Smart Gateway captures 100% of metrics for each unique span/operation and trace execution path (endpoint), and provides out-of-the-box service and endpoint dashboards containing relevant health metrics contextualized with end-to-end service maps, along with the associated infrastructure host metrics for those services and endpoints. Service dependencies are automatically discovered and service maps are dynamically generated from recorded trace data.

  • Analytics and alerting

    All metrics generated by SignalFx’s Smart Gateway stream through our advanced metrics architecture, such that they can be applied analytics and advanced alerting in real-time like all other metrics.

  • Flexible trace search and directed troubleshooting

    SignalFx’s flexible trace search lets you search traces by any combination of service name, operation name, tags, duration or time window. Results are shown with a dynamic service map, trace volume chart, and latency histogram. This flexibility makes finding the relevant set of transactions quick and easy. Additionally, the Outlier Analyzer provides a prescriptive approach for troubleshooting by surfacing hidden patterns in the long tail transactions at an aggregate level.

  • Contextual Alerting

    SignalFx built-in dynamic detector templates enable monitoring of services and endpoints a breeze.

  • Seamless Metrics to Trace Workflow

    Users can navigate to the right set of traces from any chart or alert carrying over time context and other dimensions in the near future.

Assumptions and requirements 🔗

Smart Agent and Smart Gateway 🔗

SignalFx Microservices APM requires the deployment of the SignalFx Smart Agent on each instance that hosts an instrumented application, as well as the deployment of at least one instance of the SignalFx Smart Gateway in your environment. Applications should be configured to report all trace spans to their local Smart Agent, which then forwards them to the deployed Smart Gateway. The Smart Gateway then reports metrics from the processed traces, and the traces selected by NoSample™, to SignalFx.

The Smart Agent must be able to communicate with the Smart Gateway over HTTP to the port defined in the Smart Gateway’s listener configuration (typically port 8080). The Smart Gateway must be able to communicate with SignalFx via HTTPS (port 443) over the public internet.

Both the Smart Agent and the Smart Gateway are available as statically linked x86_64 ELF (Linux) binaries and therefore have no library dependencies. They are also available as Docker container images for ease of deployment and orchestration.

Supported span formats 🔗

Applications are expected to send trace spans over the wire using either Zipkin’s JSON format or Jaeger’s Thrift format, and respecting the OpenTracing specification and semantics. Support for OpenCensus’s wire format will be added in the future; in the meantime, OpenCensus libraries can be configured to send trace spans using Zipkin’s or Jaeger’s wire format. If you use any of SignalFx’s automatic instrumentation agents or libraries, your applications will be automatically configured to comply to those requirements.

Getting around the µAPM UI 🔗

SignalFx µAPM provides three main pages for viewing your services, operations, traces, and spans. These pages are all available by hovering over µAPM on the navigation bar.

../../_images/apm-dropdown.png
  • The Services page provides a “big picture” overview of your environment.

    ../../_images/services-page.png
  • The Traces page lets you “slice and dice” your trace data to narrow in on traces that can help you find and diagnose problems in your code.

    ../../_images/traces-page.png
  • The View a trace page lets you navigate through spans in an individual trace to find the precise area where there may be problems in your code.

    ../../_images/view-trace-no-legend.png

Additionally, SignalFx provides a built-in dashboard group with dashboards that display information about specified services and endpoints, and another dashboard group with dashboards that display information about the SignalFx Smart Gateway.

../../_images/service-dashboard.png


../../_images/clusters-dashboard.png

For an overview of how all these pages work together to help you quickly locate problem areas in your code, see Example: Finding the Root Cause of a Problem.