{"_id":"5a3d451a04fc4c001c8395c4","project":"56be3387be55991700c3ca0d","version":{"_id":"56be3388be55991700c3ca10","project":"56be3387be55991700c3ca0d","__v":8,"createdAt":"2016-02-12T19:33:28.313Z","releaseDate":"2016-02-12T19:33:28.313Z","categories":["56be3389be55991700c3ca11","57646709b0a8be1900fcd0d8","5764671c89da831700590782","57646d30c176520e00ea8fe5","5764715d4f867c0e002bc8e3","57698fa2e93bfd190028815c","576c2af16c24681700c902da","5787da96b008c91900aae865"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"category":{"_id":"56be3389be55991700c3ca11","__v":2,"pages":["56be338abe55991700c3ca13","56be34fa37d84017009de5f7"],"project":"56be3387be55991700c3ca0d","version":"56be3388be55991700c3ca10","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2016-02-12T19:33:29.389Z","from_sync":false,"order":2,"slug":"documentation","title":"Documentation"},"user":"5633ec9b35355017003ca3f2","githubsync":"","__v":0,"parentDoc":null,"updates":[],"next":{"pages":[],"description":""},"createdAt":"2017-12-22T17:47:06.582Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":7,"body":"Troubleshooting issues with your index starts with metrics. Read on to learn how to read, interpret, and act on your Solr metrics charts. \n\n# Navigating to Metrics\n\nThe Metrics dashboard is located in each index dashboard. Log into Websolr, click on your index, and click on the metrics tab. \n\n# Metrics utilities\n\n## Time Window Selector\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/ff56da4-d3057ca-Screen_Shot_2017-11-02_at_1.47.21_PM.png\",\n        \"d3057ca-Screen_Shot_2017-11-02_at_1.47.21_PM.png\",\n        858,\n        186,\n        \"#3c3d44\"\n      ],\n      \"caption\": \"\"\n    }\n  ]\n}\n[/block]\nUse this selector to choose between 4 window sizes:\n1. The last 1 hour (1h)\n2. The last 24 hrs (1d)\n3. The last 7 days (7d) \n4. The last 28 days (28d)\n\n## Time Scrubber\n\nClick on the arrows to go back or forth in time within the same time window size.\n\n## UTC and Local Timezone Toggle\n\nClick on the timezone to toggle between displaying the graph timestamps in UTC time or your local timezone.\n\n## Highlighting\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/ff3193b-b404854-metrics_-_highlight.gif\",\n        \"b404854-metrics_-_highlight.gif\",\n        1592,\n        746,\n        \"#284b91\"\n      ]\n    }\n  ]\n}\n[/block]\nYou can drill down to smaller windows of time on any graph by clicking and dragging to select a time range.\n\n# Metrics Overview\n\nMore information doesn’t necessarily mean more clarity. When something unexpected happens to your Solr traffic, it’s important to know how to see your metrics and draw conclusions \n\nBelow we'll cover what each graph displays and some examples of what they will look like given certain use cases, like high-traffic or indices in different states (normal, or experiencing downtime). We’ll start with the most information-dense graph: the request heatmap.\n\n# Performance Heatmap\n\nThis graph reveals how fast requests are. Each column in the graph represents a “slice” of time. Each row, or “bucket”, in the slice represents duration speed. The ‘hotter’ a bucket is colored, the more requests there are in that bucket. To further help visualize the difference in the quantity of requests for each bucket, every slice of time can be viewed as a histogram on hover.\n\n**Example 1** \n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/d08fa74-a87cbef-Screen_Shot_2017-11-02_at_2.29.05_PM.png\",\n        \"a87cbef-Screen_Shot_2017-11-02_at_2.29.05_PM.png\",\n        3272,\n        794,\n        \"#451965\"\n      ]\n    }\n  ]\n}\n[/block]\nThis heatmap displays a healthy index with a lot of traffic, and some slow request times (toward the top), but a majority of it is occurring below 40ms. \n\n\n**Example 2** \n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/e4deb6b-90d1a1e-Screen_Shot_2017-11-02_at_2.29.37_PM.png\",\n        \"90d1a1e-Screen_Shot_2017-11-02_at_2.29.37_PM.png\",\n        3288,\n        812,\n        \"#d7d7d6\"\n      ]\n    }\n  ]\n}\n[/block]\nHere we have an index with very little traffic. It’s important to note that the ‘heat’ color of every bucket is determined relative to the other data in the graph - so a side-by-side comparison of two request heatmaps using color won't be accurate.\n\n# Request Counts\n\nThis graph is straightforward: it shows the number of requests handled by the index at a given time.\n\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/3e832e8-e0832cc-Screen_Shot_2017-11-02_at_2.30.15_PM.png\",\n        \"e0832cc-Screen_Shot_2017-11-02_at_2.30.15_PM.png\",\n        1622,\n        472,\n        \"#6cccad\"\n      ]\n    }\n  ]\n}\n[/block]\n# Request Duration Percentiles\n\nThe request duration graph, similar to the request heatmap, shows a distribution of request speed based on three percentiles of the requests in that time slice: 50%, 95%, and 99%. This is helpful in determining where the bulk of your requests sit in terms of speed, and how slow the outliers are.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/d48e357-f51c034-Screen_Shot_2017-11-02_at_3.29.49_PM.png\",\n        \"f51c034-Screen_Shot_2017-11-02_at_3.29.49_PM.png\",\n        1612,\n        484,\n        \"#64baad\"\n      ]\n    }\n  ]\n}\n[/block]\n# Queue Time\n\nQueue time is the total amount of time requests were “queued” or paused at our load balancing layer.  Ideally, the queue time is 0, but in the event that you send many requests in parallel, our load balancer will queue up requests while waiting for executing requests to finish.  This is part of our Quality of Service layer.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/f95c01b-be463cd-Screen_Shot_2017-11-02_at_2.49.12_PM.png\",\n        \"be463cd-Screen_Shot_2017-11-02_at_2.49.12_PM.png\",\n        1624,\n        462,\n        \"#98abb0\"\n      ]\n    }\n  ]\n}\n[/block]\n# Concurrency\n\nConcurrency shows the number of requests that are happening at the same time. Since indices are limited on concurrency, this can be an important one to keep an eye on.  When you reach your plan’s max concurrency, you will notice queue time start to consistently increase.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/6500bcb-a8d5c2f-Screen_Shot_2017-11-02_at_2.49.37_PM.png\",\n        \"a8d5c2f-Screen_Shot_2017-11-02_at_2.49.37_PM.png\",\n        1622,\n        466,\n        \"#46b3c3\"\n      ]\n    }\n  ]\n}\n[/block]\n# Bandwidth\n\nThis graph shows the amount of data crossing the network - going into the index (shown in green), and coming from the index (in blue). \n\nWe expect most bandwidth graphs to look something like the graph below — a relatively higher count of ‘from client’ data compared to ‘to client’ data. These bars show the relationship of read and write data; the ‘To Client’ data coming from write - or indexing - requests, and the ‘From Client’ data the result of read requests.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/fcb6893-0a82cc8-Screen_Shot_2017-11-02_at_2.49.55_PM.png\",\n        \"0a82cc8-Screen_Shot_2017-11-02_at_2.49.55_PM.png\",\n        1658,\n        492,\n        \"#548ab3\"\n      ],\n      \"caption\": \"\"\n    }\n  ]\n}\n[/block]\nThe relationship between green to blue bars in this graph really depends on your use-case. A staging index, for example, might see a larger ratio of Write:Read data. It’s important to note that this graph deals exclusively in data - a high-traffic index will probably see a lot of data coming “From” the index, but a low-traffic index with very complicated queries and large request bodies will also have a larger “From Client” data than would otherwise be expected. Therefore, it’s helpful to look at request counts to get a feeling for the average ‘size’ of a request. \n\n# Response Codes\n\nThis graph can do two things:\n\n  * It will confirm that responses are successful: 2xx responses. This means that everything is moving along well and requests are formed correctly.\n  * In the less positive case, it can be a debugging tool that can help figure out where any buggy behavior is coming from. In general, 4xx requests are the result of a malformed query from your app or some client, whilst a 5xx request is from our end.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/eb9184e-55fc596-Screen_Shot_2017-11-03_at_5.36.17_PM.png\",\n        \"55fc596-Screen_Shot_2017-11-03_at_5.36.17_PM.png\",\n        1608,\n        380,\n        \"#8acaae\"\n      ]\n    }\n  ]\n}\n[/block]\nIt's important to note while reading this graph, that 5xx requests don’t necessarily mean that your index is down. A common situation on shared-tier architecture is an index that’s getting throttled by a noisy neighbor who’s taking up a lot of resources on the server. This can interrupt some (but not all) normal behavior on your index, resulting in a mix of 2xx and 5xx requests.\n\nTolerance for a few 5xx requests every now and then should be expected with any cloud service. We’re committed to getting all production indices a 99.99% uptime (i.e. expected 0.001% 5xx responses), and we often have a track record of 4 9’s and higher. See our uptime history here:\n\n* [Websolr Status](http://status.websolr.com)\n\nWe have a lot of people that are very sensitive to 5xx requests. In these cases, it’s usually best to be on a higher plan or a dedicated setup. Reach out to us at [support:::at:::websolr.com](mailto:support@websolr.com) if this is something your team needs.","excerpt":"How to interpret metrics from requests to Solr on the index metrics dashboard.","slug":"slice-metrics","type":"basic","title":"Operational Metrics"}

Operational Metrics

How to interpret metrics from requests to Solr on the index metrics dashboard.

Troubleshooting issues with your index starts with metrics. Read on to learn how to read, interpret, and act on your Solr metrics charts. # Navigating to Metrics The Metrics dashboard is located in each index dashboard. Log into Websolr, click on your index, and click on the metrics tab. # Metrics utilities ## Time Window Selector [block:image] { "images": [ { "image": [ "https://files.readme.io/ff56da4-d3057ca-Screen_Shot_2017-11-02_at_1.47.21_PM.png", "d3057ca-Screen_Shot_2017-11-02_at_1.47.21_PM.png", 858, 186, "#3c3d44" ], "caption": "" } ] } [/block] Use this selector to choose between 4 window sizes: 1. The last 1 hour (1h) 2. The last 24 hrs (1d) 3. The last 7 days (7d) 4. The last 28 days (28d) ## Time Scrubber Click on the arrows to go back or forth in time within the same time window size. ## UTC and Local Timezone Toggle Click on the timezone to toggle between displaying the graph timestamps in UTC time or your local timezone. ## Highlighting [block:image] { "images": [ { "image": [ "https://files.readme.io/ff3193b-b404854-metrics_-_highlight.gif", "b404854-metrics_-_highlight.gif", 1592, 746, "#284b91" ] } ] } [/block] You can drill down to smaller windows of time on any graph by clicking and dragging to select a time range. # Metrics Overview More information doesn’t necessarily mean more clarity. When something unexpected happens to your Solr traffic, it’s important to know how to see your metrics and draw conclusions Below we'll cover what each graph displays and some examples of what they will look like given certain use cases, like high-traffic or indices in different states (normal, or experiencing downtime). We’ll start with the most information-dense graph: the request heatmap. # Performance Heatmap This graph reveals how fast requests are. Each column in the graph represents a “slice” of time. Each row, or “bucket”, in the slice represents duration speed. The ‘hotter’ a bucket is colored, the more requests there are in that bucket. To further help visualize the difference in the quantity of requests for each bucket, every slice of time can be viewed as a histogram on hover. **Example 1** [block:image] { "images": [ { "image": [ "https://files.readme.io/d08fa74-a87cbef-Screen_Shot_2017-11-02_at_2.29.05_PM.png", "a87cbef-Screen_Shot_2017-11-02_at_2.29.05_PM.png", 3272, 794, "#451965" ] } ] } [/block] This heatmap displays a healthy index with a lot of traffic, and some slow request times (toward the top), but a majority of it is occurring below 40ms. **Example 2** [block:image] { "images": [ { "image": [ "https://files.readme.io/e4deb6b-90d1a1e-Screen_Shot_2017-11-02_at_2.29.37_PM.png", "90d1a1e-Screen_Shot_2017-11-02_at_2.29.37_PM.png", 3288, 812, "#d7d7d6" ] } ] } [/block] Here we have an index with very little traffic. It’s important to note that the ‘heat’ color of every bucket is determined relative to the other data in the graph - so a side-by-side comparison of two request heatmaps using color won't be accurate. # Request Counts This graph is straightforward: it shows the number of requests handled by the index at a given time. [block:image] { "images": [ { "image": [ "https://files.readme.io/3e832e8-e0832cc-Screen_Shot_2017-11-02_at_2.30.15_PM.png", "e0832cc-Screen_Shot_2017-11-02_at_2.30.15_PM.png", 1622, 472, "#6cccad" ] } ] } [/block] # Request Duration Percentiles The request duration graph, similar to the request heatmap, shows a distribution of request speed based on three percentiles of the requests in that time slice: 50%, 95%, and 99%. This is helpful in determining where the bulk of your requests sit in terms of speed, and how slow the outliers are. [block:image] { "images": [ { "image": [ "https://files.readme.io/d48e357-f51c034-Screen_Shot_2017-11-02_at_3.29.49_PM.png", "f51c034-Screen_Shot_2017-11-02_at_3.29.49_PM.png", 1612, 484, "#64baad" ] } ] } [/block] # Queue Time Queue time is the total amount of time requests were “queued” or paused at our load balancing layer. Ideally, the queue time is 0, but in the event that you send many requests in parallel, our load balancer will queue up requests while waiting for executing requests to finish. This is part of our Quality of Service layer. [block:image] { "images": [ { "image": [ "https://files.readme.io/f95c01b-be463cd-Screen_Shot_2017-11-02_at_2.49.12_PM.png", "be463cd-Screen_Shot_2017-11-02_at_2.49.12_PM.png", 1624, 462, "#98abb0" ] } ] } [/block] # Concurrency Concurrency shows the number of requests that are happening at the same time. Since indices are limited on concurrency, this can be an important one to keep an eye on. When you reach your plan’s max concurrency, you will notice queue time start to consistently increase. [block:image] { "images": [ { "image": [ "https://files.readme.io/6500bcb-a8d5c2f-Screen_Shot_2017-11-02_at_2.49.37_PM.png", "a8d5c2f-Screen_Shot_2017-11-02_at_2.49.37_PM.png", 1622, 466, "#46b3c3" ] } ] } [/block] # Bandwidth This graph shows the amount of data crossing the network - going into the index (shown in green), and coming from the index (in blue). We expect most bandwidth graphs to look something like the graph below — a relatively higher count of ‘from client’ data compared to ‘to client’ data. These bars show the relationship of read and write data; the ‘To Client’ data coming from write - or indexing - requests, and the ‘From Client’ data the result of read requests. [block:image] { "images": [ { "image": [ "https://files.readme.io/fcb6893-0a82cc8-Screen_Shot_2017-11-02_at_2.49.55_PM.png", "0a82cc8-Screen_Shot_2017-11-02_at_2.49.55_PM.png", 1658, 492, "#548ab3" ], "caption": "" } ] } [/block] The relationship between green to blue bars in this graph really depends on your use-case. A staging index, for example, might see a larger ratio of Write:Read data. It’s important to note that this graph deals exclusively in data - a high-traffic index will probably see a lot of data coming “From” the index, but a low-traffic index with very complicated queries and large request bodies will also have a larger “From Client” data than would otherwise be expected. Therefore, it’s helpful to look at request counts to get a feeling for the average ‘size’ of a request. # Response Codes This graph can do two things: * It will confirm that responses are successful: 2xx responses. This means that everything is moving along well and requests are formed correctly. * In the less positive case, it can be a debugging tool that can help figure out where any buggy behavior is coming from. In general, 4xx requests are the result of a malformed query from your app or some client, whilst a 5xx request is from our end. [block:image] { "images": [ { "image": [ "https://files.readme.io/eb9184e-55fc596-Screen_Shot_2017-11-03_at_5.36.17_PM.png", "55fc596-Screen_Shot_2017-11-03_at_5.36.17_PM.png", 1608, 380, "#8acaae" ] } ] } [/block] It's important to note while reading this graph, that 5xx requests don’t necessarily mean that your index is down. A common situation on shared-tier architecture is an index that’s getting throttled by a noisy neighbor who’s taking up a lot of resources on the server. This can interrupt some (but not all) normal behavior on your index, resulting in a mix of 2xx and 5xx requests. Tolerance for a few 5xx requests every now and then should be expected with any cloud service. We’re committed to getting all production indices a 99.99% uptime (i.e. expected 0.001% 5xx responses), and we often have a track record of 4 9’s and higher. See our uptime history here: * [Websolr Status](http://status.websolr.com) We have a lot of people that are very sensitive to 5xx requests. In these cases, it’s usually best to be on a higher plan or a dedicated setup. Reach out to us at [support@websolr.com](mailto:support@websolr.com) if this is something your team needs.