Observability¶
This lab explores one of the main strengths of Istio: observability.
The services in our mesh are automatically made observable, without adding any burden on devops teams.
Deploy the Addons¶
The Istio distribution provides addons for a number of systems that together provide observability for the service mesh:
- Zipkin or Jaeger for distributed tracing
- Prometheus for metrics collection
- Grafana provides dashboards for monitoring, using Prometheus as the data source
- Kiali allows us to visualize the mesh
These addons are located in the samples/addons/
folder of the distribution.
-
Navigate to the addons directory
-
Deploy each addon:
-
To enable distributed tracing, we must explicitly define a provider, and enable it in the mesh, as follows:
Then:
-
Verify that the
istio-system
namespace is now running additional workloads for each of the addons.
The istioctl
CLI provides convenience commands for accessing the web UIs for each dashboard.
Take a moment to review the help information for the istioctl dashboard
command:
Generate a load¶
In order to have something to observe, we need to generate a load on our system.
Use a simple bash while
loop to make repeated curl
requests to the app:
The curl requests will be running in foreground. It may be simplest to obtain a new shell prompt by opening a second, separate terminal.
Kiali¶
Launch the Kiali dashboard:
Warning
If the dashboard page fails to open, just click on the hyperlink in the console output.
Note
The istioctl dashboard
command also blocks.
Leave it running until you're finished using the dashboard, at which time
press Ctrl+C to interrupt the process and get back to the terminal prompt.
The Kiali dashboard displays.
Customize the view as follows:
- Select the Traffic Graph section from the sidebar.
- Under Select Namespaces (at the top of the page), select the
default
namespace, the location where the application's pods are running. - From the third "pulldown" menu, select App graph.
- From the Display "pulldown", toggle on Traffic Animation and Security.
- From the footer, toggle the legend so that it is visible. Take a moment to familiarize yourself with the legend.
Observe the visualization and note the following:
- We can see traffic coming in through the ingress gateway to the
web-frontend
, and the subsequent calls from theweb-frontend
to thecustomers
service. - The lines connecting the services are green, indicating healthy requests.
- The small lock icon on each edge in the graph indicates that the traffic is secured with mutual TLS.
Such visualizations are helpful with understanding the flow of requests in the mesh, and with diagnosis.
Feel free to spend more time exploring Kiali.
We will revisit Kiali in a later lab to visualize traffic shifting such as when performing a blue-green or canary deployment.
Kiali Cleanup¶
Close the Kiali dashboard. Interrupt the istioctl dashboard kiali
command by pressing Ctrl+C.
Zipkin¶
Launch the Zipkin dashboard:
The Zipkin dashboard displays.
- Click on the red '+' button and select serviceName.
- Select the service named
web-frontend.default
and click on the Run Query button (lightblue) on the right.
A number of query results will display. Each row is expandable and will display more detail in terms of the services participating in that particular trace.
- Click the Show button to the right of one of the traces having four (4) spans.
The resulting view shows spans that are part of the trace, and more importantly how much time was spent within each span. Such information can help diagnose slow requests and pin-point where the latency lies.
Distributed tracing also helps us make sense of the flow of requests in a microservice architecture.
Zipkin Cleanup¶
Close the Zipkin dashboard. Interrupt the istioctl dashboard zipkin
command with Ctrl+C.
Prometheus¶
Prometheus works by periodically calling a metrics endpoint against each running service (this endpoint is termed the "scrape" endpoint). Developers normally have to instrument their applications to expose such an endpoint and return metrics information in the format the Prometheus expects.
With Istio, this is done automatically by the Envoy sidecar.
Observe how Envoy exposes a Prometheus scrape endpoint¶
-
Run the following command:
Why port 15020?
See Ports used by Istio sidecar proxy.
The list of metrics returned by the endpoint is rather lengthy, so we just peek at "istio_requests" metric. The full response contains many more metrics.
Access the dashboard¶
-
Start the prometheus dashboard
-
In the search field enter the metric named
istio_requests_total
, and click the Execute button (on the right). -
Select the tab named Graph to obtain a graphical representation of this metric over time.
Note that you are looking at requests across the entire mesh, i.e. this includes both requests to
web-frontend
and tocustomers
. -
As an example of Prometheus' dimensional metrics capability, we can ask for total requests having a response code of 200:
-
With respect to requests, it's more interesting to look at the rate of incoming requests over a time window. Try:
There's much more to the Prometheus query language (this may be a good place to start).
Grafana consumes these metrics to produce graphs on our behalf.
Close the Prometheus dashboard and terminate the corresponding istioctl dashboard
command.
Grafana¶
-
Launch the Grafana dashboard
-
From the sidebar, select Dashboards
- Click on the folder named Istio to reveal pre-designed Istio-specific Grafana dashboards
- Explore the Istio Mesh Dashboard. Note the Global Request Volume and Global Success Rate.
- Explore the Istio Service Dashboard. First select the service
web-frontend
and inspect its metrics, then switch to thecustomers
service and review its dashboard. - Explore the Istio Workload Dashboard. Select the
web-frontend
workload. Look at Outbound Services and note the outbound requests to the customers service. Select thecustomers
workload and note that it makes no Outbound Services calls.
Feel free to further explore these dashboards.
Cleanup¶
- Terminate the
istioctl dashboard
command (Ctrl+C) - Likewise, terminate the bash while loop.
Next¶
We turn our attention next to security features of a service mesh.