Wednesday, January 9, 2013

SharePoint Scheduled Alerts Deconstructed

This post contains:

  • Links To Other Alert Troubleshooting Articles
  • Why Scheduled Alerts Mysteriously Stop Working
  • What The Immediate Alerts Timer Job Does (Five Stored Procedures)
  • How To Find Alerts That Didn't Fire
  • The Moral Of The Story
I've spent the past two weeks scouring the internet, tracing SQL, extracting table DDL and stored procedures from SharePoint content databases to finally crack the scheduled alert code. I've generated some SQL queries you can use to find failing alerts, force alerts to re-fire, and determine the type of each alert.

The SharePoint scheduled alert design hasn't changed much from WSS 3.0. This analysis applies to WSS 3.0, MOSS 2007, SharePoint 2010 and I suspect SharePoint 2013 but haven't verified it.   All my research and data analysis have lead to four words:

Synchronize your server times.

If your database server and WFEs are not sync'd to the time server, your scheduled alerts will fail. No setting changes, property changes, timer restarts, moving DB processing to another WFE... or whatever will fix your problem. And the next time you do a reboot, time will sync up and things will start working again. Maddening.

 

Links To Other Alert Troubleshooting Articles


I'm going to start with credit to some of the posts that helped me along before I get into my stuff.
I can sum most of this up fairly quickly.
  • Scheduled alerts are run via the Immediate Alerts timer job.
  • The Timerlock table in each database tells you which WFE runs timer jobs for it.
  • The property "alerts-enabled" must be set to true for each URL.
  • The property "job-immediate-alerts" tells you when the job runs, typically "every 5 minutes between 0 and 59".
  • Daily and weekly alert subscriptions are both stored in the SchedSubscriptions table - NotifyFreq column value of 1 = daily, 2 = weekly.
  • All activity on the site is stored in the EventCache table.
  • When a subscription is set on an EventCache row, an EventLog entry is created the next time the "Immediate Alerts" job runs from the timer.
  • Since immediate alerts fire immediately (hence the name) there is no need to store up alerts for a daily or weekly summary.

 

Why Scheduled Alerts Mysteriously Stop Working 


This is where we kept getting lost. There was nothing wrong with any part of the email or event processing architecture. Immediate alerts worked fine, scheduled alerts didn't go out. We needed a deeper understanding of what was really happening.
  • Scheduled alert subscriptions require an intersection entity (many to many relationship) and use the EventSubsMatches table to join on the EventCache Event ID and the SchedSubscriptions Subscription ID.
  • When the "Immediate Alerts" timer runs and the SchedSubscription NotifyTime has been past by the SQL server current time, all alerts joined to the subscription EventsSubsMatches table are summarized and sent out to the subscriber.
  • The processed rows are removed from EventSubsMatches.
  • The subscriptions where the NotifyTime has been passed by the WFE time are incremented by one unit (day, week).
Notice that the alerts are processed using the SQL server time and the subscriptions are incremented by the WFE time? If your SQL server time is running behind the WFE, it will fail to find any alerts to process (less than 24 hours) and then another stored procedure comes along and pushes the subscription forward like it was processed. This is why my subscriptions weren't going out.

If you turn on verbose for the Timer ULS log, you will see that the process is running fine but not finding anything to send out.  You watch your logs and you capture:
Begin invoke timer job Immediate Alerts, id {81E07A90-327B-471A-A3A6-7A7672F1D905}, DB {7ABFF5BD-7E23-43FE-B399-F6980AA3E954}
It's working!  You say to yourself...
AlertsJob loaded 0 of 0 event data records
AlertsJob loaded 0 of 0 subscription records
Alertsjob results for immediate delivery: 0 prematches, 0 passed filtering, 0 of 0 passed security trimming, 0 final after rollup
Alertsjob results for scheduled delivery: 0 prematches, 0 passed filtering, 0 of 0 passed security trimming, 0 final after rollup
AlertsJob processed 0 daily notifications in 0 digests, sent 0 emails, failed to send 0 emails
AlertsJob processed 0 weekly notifications in 0 digests, sent 0 emails, failed to send 0 emails
Wait, I know I subscribed to alerts and I SHOULD have seen some!  Why does it say "0" everywhere?

 

What The Immediate Alerts Timer Job Does


Time to fire up the SQL Profiler and run a trace.  I've captured the stored procedures that the timer job uses.  Now we can really see what goes on.

proc_GetEventDataAndSubscriptionFilters
This takes entries from EventCache table, processes immediate alerts
Updates EventLog
Updates EventBatch with timestamp and last processed EventCache Item ID
Nulls out the EventData and ACL columns in EventCache
proc_EnumSubscribedSites
Pulls a list of Site GUIDs that have subscriptions
proc_MatchSchedSubscriptions
For each site with subscriptions, pull the list of scheduled subscriptions that should fire
This one has a where clause that blocks my failed subscriptions
EventLog.EventTime  >= @EventTime - CASE
                    WHEN (SchedSubscriptions.NotifyFreq = 1)
                    THEN 1.0
                    ELSE 7.0
                    END
proc_DeleteEventLog
For each site collection, delete from EventSubsMatchs
Where EventLog.EventTime < current time
Delete from EventLog
proc_UpdateSchedSubscriptionTimes
Adds either a day or a week to NotifyTime and NotifyTimeUTC

As you can see, I tested all the where clauses for proc_MatchSchedSubscriptions and found that this particular one was selecting against my scheduled alerts.

Then, proc_UpdateSchedSubscriptionTimes procedure runs after everything - there is no two phased commit, no logic to ensure it didn't miss anything - just an increment by one to get ready for the next summary.
This is probably a good time to say:

 "Microsoft doesn't support any manual manipulation of the SharePoint content databases and recommends only using the API."  

So, proceed at your own risk :D.

 

How To Find Alerts That Didn't Fire


Open up your SQL Management Studio, connect to your data server, and run this query

Look at your results tab and you will see all the databases where scheduled alerts that should have been sent - weren't.  You will also see another curiosity - alerts with no e-mail addresses.  Why would those be there?  You can't send an alert unless you have an e-mail address right?  This cracks me up.  You can create an alert in the GUI even if your site collection user profile doesn't have an e-mail address.  However, the stored procedure where clause in proc_MatchSchedSubscriptions blocks those entries.  The user THINKS they are going to get alerts, nothing tells them they won't on the front end, and SharePoint just ignores them on the back end.  Isn't that nice?


Ok, you now have database names with failing alerts, now open a new query window and run all these queries at once against the DB in question. This is an older screen shot:
 
Hmmm, you say. What next? Well, I found out that if you push the Schedule back one unit (day, week), it will fire again the next time "Immediate Alerts" timer runs. Also, if there is no e-mail address, you can blow them away (UNSUPPORTED!!).  If you want to try your luck, you can use these queries.  I found this would fire off my alerts 90% of the time but they would fail again on the next rotation - because the server times were off.

The moral of this story:  

 

"Make sure all your system times are in sync and you can use the summary queries I provided to watch your alerts magically correct themselves as they cross the immediate alert timer boundary."

5 comments:

  1. THANK YOU!!!! you sir are a genious! After weeks of pulling my hair out with this issue it's finally resolved...At least until the time stops syncing again.

    ReplyDelete
  2. Deleted user alerts where it will store and how to find the who deleted the user alerts

    ReplyDelete
  3. Deleted user alerts where it will store and how to find the who deleted the user alerts

    ReplyDelete
  4. Hi Tom,
    Thanks for your article, I have just two questions hopefully you will be able to help me out with:
    1-EventLog Table not found on my content database.
    2- when different users creates weekly alerts on the same list but on different times,
    so the first one will receive his weekly alert for all modifications and the timer job will marks the EventCache
    entry as processed and updates the last processed event details in EventBatches table(id=x for example),
    then the second user will receive his weekly alert for items where id is greater than x but not others,
    could you please tell me if there is a solution for this issue.
    Thanks.

    Regards

    ReplyDelete
    Replies
    1. Elaakil, what version of SharePoint are you running? This analysis was performed on WSS 3.0 and MOSS 2007. I haven't worked with the newer versions so I'm uncertian if they changed the content database structure (you'll have to look in the content databases and you may have to script out the stored procedures to dig into it yourself). I don't have a solution for your second issue.

      Delete