Categories
Announcements

New Feature: Heartbeat Monitoring (Pro Plan)

Quick links

    Uptime Robot can already check the status of servers/devices who have public IPs with its ping and port monitoring feature.

    Yet, there are many other servers/computers/devices that are inside an intranet (but connected to the internet) and need to be monitored.

    Heartbeat monitoring

    It is now possible to monitor such endpoints using heartbeat monitoring.

    The feature works in an opposite way compared to other monitoring types.

    Uptime Robot provides a unique URL for each heartbeat monitor created and expects the monitored item to send regular requests to this URL.

    Once a regular request doesn’t arrive on time, the monitor is marked as down.

    Heartbeat Monitoring

    Heartbeat monitoring is not only ideal for monitoring servers/computers inside an intranet but also a great fit for monitoring the health of the regular/cron jobs your website/app may be performing.

    As an example, if your app runs a cron job which deletes the old logs every 10 minutes, you can update the code to send a HTTP request to the heartbeat monitor’s URL once that cron job is ran and know that the cron job may be having problems if the heartbeat monitor is down.

    How to use the feature?

    Heartbeat monitoring is available in the Pro Plan and it works with steps:

    • Create a new heartbeat monitor using the Add New Monitor dialog
    • Get the URL of the heartbeat monitor created in the same dialog
    • Setup a cron job (or a scheduled task in Windows) that sends an HTTP request to this heartbeat URL every x minutes (where x is the interval selected for the monitor)
    • That is it.

    Alternatively, please check the docs for creating cron jobs in Unix/Linux and scheduled tasks in Windows.

    P.S> The feature is in beta status and look forward to any feedback/suggestions.

    77 replies on “New Feature: Heartbeat Monitoring (Pro Plan)”

    This sounds great!

    I think it’d be nice if
    1) longer intervals were allowed. 24 hours is not enough for weekly tasks, and problematic for daily tasks — just because Monday’s task finished at 04:00 doesn’t mean that Tuesday’s task will also finish by 04:00; it might take minutes or hours longer.
    2) the heartbeat URLs included human-readable text, to make it easier to verify that a task was configured with the correct URL. Instead of “https://heartbeat.uptimerobot.com/mMonitorGUID” something like “https://heartbeat.uptimerobot.com/Initial-Monitor-Name/mMonitorGUID” would be nice.

    Great addition! However, we have many cron jobs running only once a week. Would be great if the monitoring interval could be extended to that time range.

    Great addition!

    I have processes that have to run during a specific window of time, rather than at a specific interval. It would be useful to add a ‘must check-in during this period:’ condition. Also, a combination of this plus interval, such as ‘must check-in every 4hrs between 0 and 15m past the hour’.

    Thx.

    T.

    Would be nice if this could also be used for monitoring backups, if an e-mail is received and the subject is backup for XXX is successfull, make it green, otherwise red.

    Make this a +1 for longer intervals (and then a way to type them in rather than using the slider). We pay for a service to do exactly this, and they can’t handle our one job that is supposed to run once every 45 days, because it isn’t a cron expression. If you avoid that limitation you will be far ahead of the current services out there.

    That’s awesome. Nice addition.
    +1 for Peter’s suggestion. We have daily cron jobs that depends on external services and execution time may wildly fluctuate.

    I’ll add that for cron jobs it would be great to be able to say when we expect an heartbeat. Given a cron that runs everyday at 0200, if there is no heartbeat by 0400 I should get an alert. Setting the interval to 26 hours would do that. Now imagine that my cron have a nasty bug an take longer every day to run. The “window” will slide every day a little bit further and basically allow 26 hours from last execution *end* to the next end. Which may end in a cron that takes 10 hours to run and finish at 1200, with no alerts, since last execution was also very slow and withing the 26 hours.
    A monitoring definition like a cron and a delay would make it easy to detect performance problems in crons. Granted, this is way more complex than what is currently in place. Just throwing the idea here 🙂

    Anyway, nice job!

    Heartbeat tests sound great. Sort of the thin end of passive monitoring.

    I concur with the others that heartbeat tests should be able to be long intervals, e.g. monthly or quarterly password rotations. I would like to be able to specify everything up to ~370 days.

    If would be convenient to be able to specify two times, the ‘expected frequency’, as now, e.g. 1 hour, and an ‘allowed tolerance’ e.g. 10 minutes. So an hourly task heartbeat would be ‘Down’ if an hour and 10 minutes had elapsed since the last heartbeat.

    Otherwise, if there is only one time setting as now, the time should be treated as a the time when the heartbeat response is late. E.g. a weekly backup is late when 8 days have elapsed, not 7 days and 1 minute.

    Given a heartbeat is basically a passive check, it would be a nice extension to be able to make a negative heartbeat, ie. to actually declare a heartbeat failure.

    e.g. my suggestion, something like, normal heatbeat:

    https://heartbeat.uptimerobot.com/Current-Monitor-Name-But-Ignored/mMonitorGUID/UP

    and option affirmatively negative heatbeat:

    https://heartbeat.uptimerobot.com/Current-Monitor-Name-But-Ignored/mMonitorGUID/DOWN

    This would give a lot of flexibility to the heatbeat system. It would enable/encoruage using longer heatbeat intervals, because we have the option to come in early with a ‘DOWN’. So could check every 10 seconds as the source end, but only heatbeat to uptimerobot every 5 minutes. With the option to notify DOWN immediately for known failure.

    Agreed with all the above. HealthchecksIO allows for /start and /fail suffixes to denote when it starts and when it explicitly fails.

    That’s a great feature, it will allow us to reduce the risk surface (open ports and ICMP) for monitoring purposes.
    Nevertheless, we’ve tested it and no alerts are being sent out, either via email or SMS.
    Care to check that?
    Thank you
    Rui Meleiro

    It seems you have fixed the issue before my question even past moderation (probably due to the additional queries on the same issue, I guess)

    Nice idea and it will probably convince me to move over to Pro once it is out of beta. How about moving it to the free plan until it comes out of beta? You might get more useful comments if users don’t have to pay to test your new features.

    I have try to make a tast schedule in windows, but i works not for me. I can see thenext tast will run after 5 min, but this time only change even 5 min. If i run the tast manuel it works and the monitor are UP, but after 5 min the monitor is down aging.
    I have make the tast schedule like written in the help site to create a Tast schedule.
    What is wrong ?

    +1 for: longer time intervals (yes up to a year), some tolerance on the check (e.g. job takes 5 mins longer one day), and also time windows.

    Basically would agree with all of above … longer time frames being probably the first most useful change.

    Good though.

    It does not appear email notifications are working for the heartbeat monitoring. Can somebody confirm? Thanks, Jim

    This is fantastic! I am a Pro user. I’m with everyone else. More control over the interval. I too want to use this for backup monitoring.

    Keep up the great work!!

    This is great, however is there a known issue with the alerts? I set a few heartbeat monitors up shortly after this post, I am not getting status alerts for the heartbeat monitors

    Hi there,
    Great feature, thank you! We do use the API a lot and we’d like to use the API for this feature as well. Is this already done? If not, when can we expect it?
    (we’d like to add as well the URLs through the API, so we don’t have to leave our dashboard)

    Hi Stephan,

    thank you! I am happy to tell you that API for this feature is available as well: https://uptimerobot.com/api Also you can add URL or IP to HTTP, Keyword, Ping or Port monitor types with using URL parameter in API.

    Have a great day!

    I’ve started using this to monitor services running within a kubernetes cluster, as they are not accessible from outside, i’ve built a monitor proxy to do the checking and then fire a heartbeat when successful. There is a sample kubernetes config for use and its available as a docker container too. Issues and requests on github.

    https://github.com/stewartmckee/heartbeat_monitor

    Stewart.

    Also.. quick question, how often can we ping the heartbeat url? I get 429’s if i’m too often. I’ve got the monitor set to 1 min in uptimerobot and am sending the request every 60 seconds, but that could lead to a false positive in certain scenarios. I’ve checked the faq and docs but doesn’t mention it, i’d like to call every 50-55 seconds to cover any network delays. Can you let us know how often you would advise calling the heartbeat to avoid false positive?

    Thanks!

    Hi Stewart,

    you can send a maximum of 2 requests in 30 seconds, so we recommend working with that 🙂

    Have a great day!

    Hey, when it responds “fail” for ratelimit, can I just keep sending requests? I want to send a request every 15 seconds to make sure the heartbeat goes through. Is this a good idea or will my requests get missed?

    Hey Graham,

    you can send a maximum of 2 requests in 30 seconds. Keep this in mind and you should be good 🙂

    Same here. Set up the tasks, tested them, they show as working. I even hit the URLs manually from a browser and got a good response but the alerts are not updating.

    I use this for my site; however I put it on a 1 minute interval; sent a test via my browser and did not send again for 20 minutes; I was not notified and the monitor reports that it is at 0%.

    Also Rate limit exceed. error is unclear. (also there’s a typo)

    Also +1 for higher limits; I have a cronjob that runs every day, but as long as it goes every 28 hours or so I’m happy

    Wondering whether a heartbeat ‘pair’ may help solve some other suggestions where jobs run at odd times (or take a while to complete).

    A ‘ping’ to say the job has started and another to say it’s complete, with incomplete ‘pairs’ highlighted as having problems?

    Some webhooks (e.g. from prometheus alertmanager) are POSTed by default. Could you accept POST requests to the heartbeat url too? Thx

    Nice use of the current app! Does the heartbeat honor the maintenance windows? I just set one up and its reporting down 30 mins into a maintenance period. Thanks

    Hi – I like the information in the android app that shows access times (for relevant monitors).  This information does not seem to be available on the web version.  Can you make that information available?
    Thanks!
    ( I recognize that this does not relate specifically to the heart beat function, but I can’t find a place to make general suggestions publicly)

    Hi Michael, thank you for reaching out to us! You can also always contact our support (support@uptimerobot.com), do you mean the graph with response times or something else? We are currently working on upgrading our service so I will be happy to suggest this.

    Hi there,

    thank you for reaching out to us, have you tried contacting our support too? Usually, it means we are not receiving any requests, it’s hard to investigate the issues since it depends on the request received, but if you are still having any issues please send us all the details to support@uptimerobot.com.

    Kristian,

    it looks like heartbeat monitors are not mentioned yet in the api docs – https://uptimerobot.com/api parameters lists:

    “””
    monitor>type
    1 – HTTP(s)
    2 – Keyword
    3 – Ping
    4 – Port
    “””

    I am able to create one using the gui. I have not yet attempted to try using the API assuming that eventually a 5th type will be added?

    “””
    monitor>type
    1 – HTTP(s)
    2 – Keyword
    3 – Ping
    4 – Port
    5 – Heartbeat
    “””

    Looking forward to using this feature but want to do so using the API, we do not maintain our account directly via the ui.

    Thanks,

    I love the concept of heartbeats but it seems at night-time it doesn’t work too well. Alot of the times I keep getting monitor down issues.

    Hi Daimian,

    thank you for reaching out to us, are you still having any issues? Please contact our support (support@uptimerobot.com) and we will be happy to take a look at it.

    Could the API doc be updated with the type for heartbeats? It currently only has the below:

    monitor>type
    1 – HTTP(s)
    2 – Keyword
    3 – Ping
    4 – Port

    Hi Ryan,

    thank you for bringing this to our attention, we will definitely take a look at it!

    Need email heartbeat (deadman) monitor. Where I send an email every and I get notified if the email doesn’t show up. This lets me know there is an issue with my email sending process or it’s smtp service. I also like the idea of being able to set up a rule like someone else said so you can see the status of the email (backup or other success type) and flag or notify accordingly.

    +1 for > 24 hour time on heartbeat monitor. Turning off mine for now on my once a day jobs; it is not useful since sometime runs a few minutes longer than the previous day. I am not a fan of false alarms.

    I love this feature but it would be great if you supported HTTP posts aswell. Tools like Alert Manager for Prometheus have a webhook ability that only supports post.

    fetch “https://heartbeat.uptimerobot.com/Your URL”>/dev/null 2>&1
    On Pfsense you need to use fetch to make work without an changes.

    It’s January 2021, what is the current status of the feature?

    – does it have API?
    – does interval still 24h?
    – does it still in Beta?

    Not more than 24h? That’s a pity. Now I get many false alarms when big once-every-day jobs take a few more moments than the previous day. Even just 25h would be much more usable.

    Oof – I agree with everyone here, I’ve got a daily task and it can take 2-3 hours to complete, so I need 27 hour window otherwise I’ll get false positives at the 24h limit. Please consider extending to say 36 hours for most use-cases?

    Everyone claims for longer check intervals for almost 2 year now, but this seems just completely ignored by developers. Too bad. 🙁

    This makes this unusable for long running tasks, like backups.

    Hi, thank you for your feedback, we are working on new features based on the feedback from our community, we’ll consider this in the future 🙂

    It might already be in the comments, but didn’t see it.

    1) Could you consider email-based heartbeats, rather than POST/GET? Some (old…!?) systems might not easily be able to send custom POST requests, but almost everything (old or basic) has an email notification system. So configure that system to send a “job/task complete” email to monitorGUID@heartbeat.uptimerobot.com, when no email arrives in the defined window, trigger DOWN. Currently the support team have to manually observer that the regular daily email didn’t come in from System X.

    I know that email is not the “correct” communication protocol for this, but in the real world it is still used!

    2) Ability to configure day of week alerts. Eg a job / task that is scheduled to run daily at 9pm Monday to Friday only. Getting an alert on Saturday at 9.10pm because the job didn’t run is not helpful. But by 9.10pm on Monday if the job hasn’t run, then we need to know.

    A workaround is to make 5 weekly jobs – “Monday Job X Heartbeat”, “Tuesday Job X heartbeat” and so on, but…

    Leave a Reply

    Your email address will not be published. Required fields are marked *