Observability¶
This lab explores one of the main strengths of Istio: observability.
The services in our mesh are automatically made observable, without adding any burden on devops teams.
Deploy the Addons¶
The Istio distribution provides addons for a number of systems that together provide observability for the service mesh:
- Zipkin or Jaeger for distributed tracing
- Prometheus for metrics collection
- Grafana provides dashboards for monitoring, using Prometheus as the data source
- Kiali allows us to visualize the mesh
These addons are located in the samples/addons/ folder of the distribution.
-
Navigate to the addons directory
-
Deploy each addon:
-
To enable distributed tracing, we must explicitly define a provider, and enable it in the mesh, as follows:
Then:
-
Verify that the
istio-systemnamespace is now running additional workloads for each of the addons.
The istioctl CLI provides convenience commands for accessing the web UIs for each dashboard.
Take a moment to review the help information for the istioctl dashboard command:
Generate a load¶
In order to have something to observe, we need to generate a load on our system.
Use a simple bash while loop to make repeated curl requests to the app:
The curl requests will be running in foreground. It may be simplest to obtain a new shell prompt by opening a second, separate terminal.
Kiali¶
Launch the Kiali dashboard:
Warning
If the dashboard page fails to open, just click on the hyperlink in the console output.
Note
The istioctl dashboard command also blocks.
Leave it running until you're finished using the dashboard, at which time
press Ctrl+C to interrupt the process and get back to the terminal prompt.
The Kiali dashboard displays.
Customize the view as follows:
- Select the Traffic Graph section from the sidebar.
- Under Select Namespaces (at the top of the page), select the
defaultnamespace, the location where the application's pods are running. - From the third "pulldown" menu, select App graph.
- From the Display "pulldown", toggle on Traffic Animation and Security.
- From the footer, toggle the legend so that it is visible. Take a moment to familiarize yourself with the legend.
Observe the visualization and note the following:
- We can see traffic coming in through the ingress gateway to the
web-frontend, and the subsequent calls from theweb-frontendto thecustomersservice. - The lines connecting the services are green, indicating healthy requests.
- The small lock icon on each edge in the graph indicates that the traffic is secured with mutual TLS.
Such visualizations are helpful with understanding the flow of requests in the mesh, and with diagnosis.
Feel free to spend more time exploring Kiali.
We will revisit Kiali in a later lab to visualize traffic shifting such as when performing a blue-green or canary deployment.
Kiali Cleanup¶
Close the Kiali dashboard. Interrupt the istioctl dashboard kiali command by pressing Ctrl+C.
Zipkin¶
Launch the Zipkin dashboard:
The Zipkin dashboard displays.
- Click on the red '+' button and select serviceName.
- Select the service named
web-frontend.defaultand click on the Run Query button (lightblue) on the right.
A number of query results will display. Each row is expandable and will display more detail in terms of the services participating in that particular trace.
- Click the Show button to the right of one of the traces having four (4) spans.
The resulting view shows spans that are part of the trace, and more importantly how much time was spent within each span. Such information can help diagnose slow requests and pin-point where the latency lies.
Distributed tracing also helps us make sense of the flow of requests in a microservice architecture.
Zipkin Cleanup¶
Close the Zipkin dashboard. Interrupt the istioctl dashboard zipkin command with Ctrl+C.
Prometheus¶
Prometheus works by periodically calling a metrics endpoint against each running service (this endpoint is termed the "scrape" endpoint). Developers normally have to instrument their applications to expose such an endpoint and return metrics information in the format the Prometheus expects.
With Istio, this is done automatically by the Envoy sidecar.
Observe how Envoy exposes a Prometheus scrape endpoint¶
-
Run the following command:
Why port 15020?
See Ports used by Istio sidecar proxy.
The list of metrics returned by the endpoint is rather lengthy, so we just peek at "istio_requests" metric. The full response contains many more metrics.
Access the dashboard¶
-
Start the prometheus dashboard
-
In the search field enter the metric named
istio_requests_total, and click the Execute button (on the right). -
Select the tab named Graph to obtain a graphical representation of this metric over time.
Note that you are looking at requests across the entire mesh, i.e. this includes both requests to
web-frontendand tocustomers. -
As an example of Prometheus' dimensional metrics capability, we can ask for total requests having a response code of 200:
-
With respect to requests, it's more interesting to look at the rate of incoming requests over a time window. Try:
There's much more to the Prometheus query language (this may be a good place to start).
Grafana consumes these metrics to produce graphs on our behalf.
Close the Prometheus dashboard and terminate the corresponding istioctl dashboard command.
Grafana¶
-
Launch the Grafana dashboard
-
From the sidebar, select Dashboards
- Click on the folder named Istio to reveal pre-designed Istio-specific Grafana dashboards
- Explore the Istio Mesh Dashboard. Note the Global Request Volume and Global Success Rate.
- Explore the Istio Service Dashboard. First select the service
web-frontendand inspect its metrics, then switch to thecustomersservice and review its dashboard. - Explore the Istio Workload Dashboard. Select the
web-frontendworkload. Look at Outbound Services and note the outbound requests to the customers service. Select thecustomersworkload and note that it makes no Outbound Services calls.
Feel free to further explore these dashboards.
Cleanup¶
- Terminate the
istioctl dashboardcommand (Ctrl+C) - Likewise, terminate the bash while loop.
Next¶
We turn our attention next to security features of a service mesh.