Skip to content

Observability

This lab explores one of the main strengths of Istio: observability.

The services in our mesh are automatically made observable, without adding any burden on devops teams.

Deploy the Addons

The Istio distribution provides addons for a number of systems that together provide observability for the service mesh:

  • Zipkin or Jaeger for distributed tracing
  • Prometheus for metrics collection
  • Grafana provides dashboards for monitoring, using Prometheus as the data source
  • Kiali allows us to visualize the mesh

These addons are located in the samples/addons/ folder of the distribution.

  1. Navigate to the addons directory

    cd ~/istio-1.22.0/samples/addons
    
  2. Deploy each addon:

    kubectl apply -f prometheus.yaml
    
    kubectl apply -f grafana.yaml
    
    kubectl apply -f extras/zipkin.yaml
    
    kubectl apply -f kiali.yaml
    
  3. To enable distributed tracing, we must explicitly define a provider, and enable it in the mesh, as follows:

    ---
    apiVersion: install.istio.io/v1alpha1
    kind: IstioOperator
    spec:
      profile: default
      meshConfig:
        enableTracing: true
        defaultConfig:
          tracing:
            sampling: 100.0
        extensionProviders:
        - name: zipkin
          zipkin:
            service: zipkin.istio-system.svc.cluster.local
            port: 9411
    
    istioctl install -f observability/trace-config.yaml
    

    Then:

    kubectl apply -f observability/enable-tracing.yaml
    
  4. Verify that the istio-system namespace is now running additional workloads for each of the addons.

    kubectl get pod -n istio-system
    

The istioctl CLI provides convenience commands for accessing the web UIs for each dashboard.

Take a moment to review the help information for the istioctl dashboard command:

istioctl dashboard --help

Generate a load

In order to have something to observe, we need to generate a load on our system.

Use a simple bash while loop to make repeated curl requests to the app:

while true; do curl -I http://$GATEWAY_IP; sleep 0.5; done

The curl requests will be running in foreground. It may be simplest to obtain a new shell prompt by opening a second, separate terminal.

Kiali

Launch the Kiali dashboard:

istioctl dashboard kiali

Warning

If the dashboard page fails to open, just click on the hyperlink in the console output.

Note

The istioctl dashboard command also blocks. Leave it running until you're finished using the dashboard, at which time press Ctrl+C to interrupt the process and get back to the terminal prompt.

The Kiali dashboard displays.

Customize the view as follows:

  1. Select the Graph section from the sidebar.
  2. Under Select Namespaces (at the top of the page), select the default namespace, the location where the application's pods are running.
  3. From the third "pulldown" menu, select App graph.
  4. From the Display "pulldown", toggle on Traffic Animation and Security.
  5. From the footer, toggle the legend so that it is visible. Take a moment to familiarize yourself with the legend.

Observe the visualization and note the following:

  • We can see traffic coming in through the ingress gateway to the web-frontend, and the subsequent calls from the web-frontend to the customers service.
  • The lines connecting the services are green, indicating healthy requests.
  • The small lock icon on each edge in the graph indicates that the traffic is secured with mutual TLS.

Such visualizations are helpful with understanding the flow of requests in the mesh, and with diagnosis.

Feel free to spend more time exploring Kiali.

We will revisit Kiali in a later lab to visualize traffic shifting such as when performing a blue-green or canary deployment.

Kiali Cleanup

Close the Kiali dashboard. Interrupt the istioctl dashboard kiali command by pressing Ctrl+C.

Zipkin

Launch the Zipkin dashboard:

istioctl dashboard zipkin

The Zipkin dashboard displays.

  • Click on the red '+' button and select serviceName.
  • Select the service named web-frontend.default and click on the Run Query button (lightblue) on the right.

A number of query results will display. Each row is expandable and will display more detail in terms of the services participating in that particular trace.

  • Click the Show button to the right of one of the traces having four (4) spans.

The resulting view shows spans that are part of the trace, and more importantly how much time was spent within each span. Such information can help diagnose slow requests and pin-point where the latency lies.

Distributed tracing also helps us make sense of the flow of requests in a microservice architecture.

Zipkin Cleanup

Close the Zipkin dashboard. Interrupt the istioctl dashboard zipkin command with Ctrl+C.

Prometheus

Prometheus works by periodically calling a metrics endpoint against each running service (this endpoint is termed the "scrape" endpoint). Developers normally have to instrument their applications to expose such an endpoint and return metrics information in the format the Prometheus expects.

With Istio, this is done automatically by the Envoy sidecar.

Observe how Envoy exposes a Prometheus scrape endpoint

  1. Run the following command:

    kubectl exec svc/customers -- curl -s localhost:15020/stats/prometheus \
      | grep istio_requests
    

    Why port 15020?

    See Ports used by Istio sidecar proxy.

    The list of metrics returned by the endpoint is rather lengthy, so we just peek at "istio_requests" metric. The full response contains many more metrics.

Access the dashboard

  1. Start the prometheus dashboard

    istioctl dashboard prometheus
    
  2. In the search field enter the metric named istio_requests_total, and click the Execute button (on the right).

  3. Select the tab named Graph to obtain a graphical representation of this metric over time.

    Note that you are looking at requests across the entire mesh, i.e. this includes both requests to web-frontend and to customers.

  4. As an example of Prometheus' dimensional metrics capability, we can ask for total requests having a response code of 200:

    istio_requests_total{response_code="200"}
    
  5. With respect to requests, it's more interesting to look at the rate of incoming requests over a time window. Try:

    rate(istio_requests_total[5m])
    

There's much more to the Prometheus query language (this may be a good place to start).

Grafana consumes these metrics to produce graphs on our behalf.

Close the Prometheus dashboard and terminate the corresponding istioctl dashboard command.

Grafana

  1. Launch the Grafana dashboard

    istioctl dashboard grafana
    
  2. From the sidebar, select Dashboards

  3. Click on the folder named Istio to reveal pre-designed Istio-specific Grafana dashboards
  4. Explore the Istio Mesh Dashboard. Note the Global Request Volume and Global Success Rate.
  5. Explore the Istio Service Dashboard. First select the service web-frontend and inspect its metrics, then switch to the customers service and review its dashboard.
  6. Explore the Istio Workload Dashboard. Select the web-frontend workload. Look at Outbound Services and note the outbound requests to the customers service. Select the customers workload and note that it makes no Outbound Services calls.

Feel free to further explore these dashboards.

Cleanup

  1. Terminate the istioctl dashboard command (Ctrl+C)
  2. Likewise, terminate the bash while loop.

Next

We turn our attention next to security features of a service mesh.