{"id":530,"date":"2019-01-22T01:37:18","date_gmt":"2019-01-22T01:37:18","guid":{"rendered":"https:\/\/uptimerobot.com\/blog\/?p=530"},"modified":"2025-11-13T14:58:32","modified_gmt":"2025-11-13T14:58:32","slug":"a-downtime-what-happened-and-very-sorry","status":"publish","type":"post","link":"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/","title":{"rendered":"A Downtime, What Happened and&#8230; Very Sorry."},"content":{"rendered":"<p>Last Thursday (10 Jan 2019), starting with 02:30, we experienced an issue that caused a full downtime of ~12 hours and intermittent issues more than that afterwards.<\/p>\n<p>First of all, so so sorry about this. And, as a summary, it was totally our fault.<\/p>\n<p><strong>Uptime Robot<\/strong> is available since Jan 2010 and it is the first time we had such a major\u00a0problem.<\/p>\n<p><strong>We would like to share what happened<\/strong> and what we&#8217;ll be doing to prevent it from repeating:<\/p>\n<ul>\n<li>Our main DB server became unreachable. We first thought it was a network issue, then discovered that it wasn&#8217;t able to boot and later on made sure that the harddisk had problems.<\/li>\n<li>We were ok as we had the replicate DB server. Decided to make it the master DB server. We couldn&#8217;t connect to this server at first, made a power reboot, then connected and <strong>made a huge personal mistake<\/strong> here. Before starting the (MySQL) DB server after the reboot, we had\u00a0to change several of its settings so that it was ready for the live load. Besides few my.ini changes, we removed the innodb logs so that they were re-created with the right settings. Started the server, all good.. and it stopped by itself. Checked the MySQL error logs and saw that there were sync problems with MySQL&#8217;s log sequence number being in the future. The problem is, with the power reboot, <strong>the DB server was shutdown unexpectedly and we must have started it with the original settings, then stopped normally and make the changes afterwards<\/strong>. A simple yet huge mistake.<\/li>\n<li>After lots of retries with different options (including forcing innodb recovery), some major\u00a0tables didn&#8217;t recover.<\/li>\n<li>And, we decided to make a full restore from the backups. We take very regular backups. We have 2 types of data:\n<ul>\n<li>the account settings, monitors, alert contacts.. (backups taken directly to the backup server every 1 hour)<\/li>\n<li>and the logs (this data is pretty huge, backups are taken\u00a0every day to the local server at first so that it is faster, automatically zipped and moved to the backup server afterwards)\n<ul>\n<li>The latest backup was ~23 minutes ago before the incident. We restored it.<\/li>\n<li>The latest logs backup was ~7 hours ago before the incident. Yet, the zip file was corrupt. So were several of the latest backup files. The latest healthy logs backup was taken 7 days ago.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>We tried to reach the contents of the corrupted backup files with several methods\/tools but failed (this process took the most of the hours as we wanted to re-enable the DB with the latest log file backup). And, we restored the backup taken 7 days ago (since that day, we tried with much more tools, suggestions, etc.. yet, convinced that those files are corrupt at their cores).<\/li>\n<li>We made the site live after the restore process but realized that there were many inconsistencies due the date differences of the backup files used. Worked on creating a tool to remove those inconsistencies, paused the system for another 3 hours the next day, ran this tool to recover all the inconsistencies and made the system live again.<\/li>\n<li>After the event, when looking at it calmly, the most logical explanation is the harddisk\u00a0having\u00a0an issue for several days before totally going down and corrupting\u00a0the local backups we had taken on it (which we then moved as corrupted).<\/li>\n<li>And, we couldn&#8217;t restore the log (up-down) data between 03 Jan to 10 Jan.<\/li>\n<\/ul>\n<p>This is actually a short summary of the issue we experienced. <strong>We did various mistakes<\/strong>:<\/p>\n<ul>\n<li>Not using a RAID (this was due to a negative experience we had with RAID in the past but, thinking twice, it was still better than having a single corrupted harddisk).<\/li>\n<li>Handling the replicate going master badly. We must have had a more detailed self-documentation about this process.<\/li>\n<li>Taking larger backups locally and then moving to the\u00a0backup server.<\/li>\n<li>Also, we didn&#8217;t have a communication tool in place when the system was fully down and user data was unreachable.. which is so wrong.<\/li>\n<\/ul>\n<p><strong>We are taking several actions<\/strong> to make sure that such a downtime never repeats and any such issue is handled much better:<\/p>\n<ul>\n<li>The backup scenarios are already changed including verification for each backup file.<\/li>\n<li>Getting ready to move all critical servers to RAID setups (will share a scheduled maintenance for it soon).<\/li>\n<li>Already updated our recovery documentation\u00a0accordingly and will be documenting such cases in more detail from now on.<\/li>\n<li>Working on creating a better communication channel that is not tied to our infrastructure.<\/li>\n<\/ul>\n<p>Very sorry for the trouble again, we <strong>learned a lot from it\u00a0<\/strong>and we can&#8217;t thank enough to all <strong>Uptime Robot<\/strong> users for supporting and helping us during the issue .<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last Thursday (10 Jan 2019), starting with 02:30, we experienced an issue that caused a full downtime of ~12 hours and intermittent issues more than that afterwards. First of all, so so sorry about this. And, as a summary, it was totally our fault. Uptime Robot is available since Jan 2010 and it is the [&hellip;]<\/p>\n","protected":false},"author":16,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"_is_featured_guide":false,"_post_views":31,"_reading_completions":69,"footnotes":""},"categories":[2],"tags":[],"class_list":["post-530","post","type-post","status-publish","format-standard","hentry","category-announcements"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>A Downtime, What Happened and... Very Sorry. | UptimeRobot Blog<\/title>\n<meta name=\"description\" content=\"Read about the 12-hour outage at UptimeRobot, what caused it, how we responded, and our plan to prevent it happening again.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"A Downtime, What Happened and... Very Sorry. | UptimeRobot Blog\" \/>\n<meta property=\"og:description\" content=\"Read about the 12-hour outage at UptimeRobot, what caused it, how we responded, and our plan to prevent it happening again.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/\" \/>\n<meta property=\"og:site_name\" content=\"UptimeRobot Blog\" \/>\n<meta property=\"article:published_time\" content=\"2019-01-22T01:37:18+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-13T14:58:32+00:00\" \/>\n<meta name=\"author\" content=\"Tomas Koprusak\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Tomas Koprusak\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/\"},\"author\":{\"name\":\"Tomas Koprusak\",\"@id\":\"https:\/\/uptimerobot.com\/blog\/#\/schema\/person\/4e351b3eb3c7a5987a73b413d8354468\"},\"headline\":\"A Downtime, What Happened and&#8230; Very Sorry.\",\"datePublished\":\"2019-01-22T01:37:18+00:00\",\"dateModified\":\"2025-11-13T14:58:32+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/\"},\"wordCount\":792,\"commentCount\":31,\"articleSection\":[\"Announcements\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/\",\"url\":\"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/\",\"name\":\"A Downtime, What Happened and... Very Sorry. | UptimeRobot Blog\",\"isPartOf\":{\"@id\":\"https:\/\/uptimerobot.com\/blog\/#website\"},\"datePublished\":\"2019-01-22T01:37:18+00:00\",\"dateModified\":\"2025-11-13T14:58:32+00:00\",\"author\":{\"@id\":\"https:\/\/uptimerobot.com\/blog\/#\/schema\/person\/4e351b3eb3c7a5987a73b413d8354468\"},\"description\":\"Read about the 12-hour outage at UptimeRobot, what caused it, how we responded, and our plan to prevent it happening again.\",\"breadcrumb\":{\"@id\":\"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/uptimerobot.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Announcements\",\"item\":\"https:\/\/uptimerobot.com\/blog\/category\/announcements\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"A Downtime, What Happened and&#8230; Very Sorry.\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/uptimerobot.com\/blog\/#website\",\"url\":\"https:\/\/uptimerobot.com\/blog\/\",\"name\":\"UptimeRobot Blog\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/uptimerobot.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/uptimerobot.com\/blog\/#\/schema\/person\/4e351b3eb3c7a5987a73b413d8354468\",\"name\":\"Tomas Koprusak\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/uptimerobot.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/42e6751dc39e91f1c7ab4926189550054308e366428ceb70e9621d680b843032?s=96&d=retro&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/42e6751dc39e91f1c7ab4926189550054308e366428ceb70e9621d680b843032?s=96&d=retro&r=g\",\"caption\":\"Tomas Koprusak\"},\"description\":\"He has worked for Sygic as a marketer and co-led the implementation and development of a product acquired from a competitor. He has also worked as a freelance developer, helping clients from various areas. Tomas brings a wealth of industry experience to our team. He spent a few years in the blockchain industry, leading projects and marketing teams at multiple blockchain-based companies. He has presented products and managed deals in more than 10 countries around the world, managed the ICO, and built a successful marketing team at Fuergy that continues to thrive. Tomas managed a product team for the biggest job site in Slovakia, covering development and transformation to a new B2B app. Not only is Tomas skilled at web development, but he also has a deep understanding of SaaS businesses, which makes him an invaluable asset in shaping and leading various projects at UptimeRobot. His focus is always on the continual improvement of our service and user experience. In addition to his professional achievements, Tomas is a devoted father. His personal interests include cycling (he traveled around the whole country of Slovakia), playing guitar (he even played in a band), servicing bikes, music, and enjoying good beer.\",\"sameAs\":[\"https:\/\/www.linkedin.com\/in\/tomas-koprusak\"],\"url\":\"https:\/\/uptimerobot.com\/blog\/author\/tomas\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"A Downtime, What Happened and... Very Sorry. | UptimeRobot Blog","description":"Read about the 12-hour outage at UptimeRobot, what caused it, how we responded, and our plan to prevent it happening again.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/","og_locale":"en_US","og_type":"article","og_title":"A Downtime, What Happened and... Very Sorry. | UptimeRobot Blog","og_description":"Read about the 12-hour outage at UptimeRobot, what caused it, how we responded, and our plan to prevent it happening again.","og_url":"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/","og_site_name":"UptimeRobot Blog","article_published_time":"2019-01-22T01:37:18+00:00","article_modified_time":"2025-11-13T14:58:32+00:00","author":"Tomas Koprusak","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Tomas Koprusak","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/#article","isPartOf":{"@id":"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/"},"author":{"name":"Tomas Koprusak","@id":"https:\/\/uptimerobot.com\/blog\/#\/schema\/person\/4e351b3eb3c7a5987a73b413d8354468"},"headline":"A Downtime, What Happened and&#8230; Very Sorry.","datePublished":"2019-01-22T01:37:18+00:00","dateModified":"2025-11-13T14:58:32+00:00","mainEntityOfPage":{"@id":"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/"},"wordCount":792,"commentCount":31,"articleSection":["Announcements"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/","url":"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/","name":"A Downtime, What Happened and... Very Sorry. | UptimeRobot Blog","isPartOf":{"@id":"https:\/\/uptimerobot.com\/blog\/#website"},"datePublished":"2019-01-22T01:37:18+00:00","dateModified":"2025-11-13T14:58:32+00:00","author":{"@id":"https:\/\/uptimerobot.com\/blog\/#\/schema\/person\/4e351b3eb3c7a5987a73b413d8354468"},"description":"Read about the 12-hour outage at UptimeRobot, what caused it, how we responded, and our plan to prevent it happening again.","breadcrumb":{"@id":"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/uptimerobot.com\/blog\/a-downtime-what-happened-and-very-sorry\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/uptimerobot.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Announcements","item":"https:\/\/uptimerobot.com\/blog\/category\/announcements\/"},{"@type":"ListItem","position":3,"name":"A Downtime, What Happened and&#8230; Very Sorry."}]},{"@type":"WebSite","@id":"https:\/\/uptimerobot.com\/blog\/#website","url":"https:\/\/uptimerobot.com\/blog\/","name":"UptimeRobot Blog","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/uptimerobot.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/uptimerobot.com\/blog\/#\/schema\/person\/4e351b3eb3c7a5987a73b413d8354468","name":"Tomas Koprusak","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/uptimerobot.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/42e6751dc39e91f1c7ab4926189550054308e366428ceb70e9621d680b843032?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/42e6751dc39e91f1c7ab4926189550054308e366428ceb70e9621d680b843032?s=96&d=retro&r=g","caption":"Tomas Koprusak"},"description":"He has worked for Sygic as a marketer and co-led the implementation and development of a product acquired from a competitor. He has also worked as a freelance developer, helping clients from various areas. Tomas brings a wealth of industry experience to our team. He spent a few years in the blockchain industry, leading projects and marketing teams at multiple blockchain-based companies. He has presented products and managed deals in more than 10 countries around the world, managed the ICO, and built a successful marketing team at Fuergy that continues to thrive. Tomas managed a product team for the biggest job site in Slovakia, covering development and transformation to a new B2B app. Not only is Tomas skilled at web development, but he also has a deep understanding of SaaS businesses, which makes him an invaluable asset in shaping and leading various projects at UptimeRobot. His focus is always on the continual improvement of our service and user experience. In addition to his professional achievements, Tomas is a devoted father. His personal interests include cycling (he traveled around the whole country of Slovakia), playing guitar (he even played in a band), servicing bikes, music, and enjoying good beer.","sameAs":["https:\/\/www.linkedin.com\/in\/tomas-koprusak"],"url":"https:\/\/uptimerobot.com\/blog\/author\/tomas\/"}]}},"_links":{"self":[{"href":"https:\/\/uptimerobot.com\/blog\/wp-json\/wp\/v2\/posts\/530","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/uptimerobot.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/uptimerobot.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/uptimerobot.com\/blog\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/uptimerobot.com\/blog\/wp-json\/wp\/v2\/comments?post=530"}],"version-history":[{"count":0,"href":"https:\/\/uptimerobot.com\/blog\/wp-json\/wp\/v2\/posts\/530\/revisions"}],"wp:attachment":[{"href":"https:\/\/uptimerobot.com\/blog\/wp-json\/wp\/v2\/media?parent=530"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/uptimerobot.com\/blog\/wp-json\/wp\/v2\/categories?post=530"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/uptimerobot.com\/blog\/wp-json\/wp\/v2\/tags?post=530"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}