Wednesday, January 9, 2013

SharePoint Scheduled Alerts Deconstructed

This post contains:

  • Links To Other Alert Troubleshooting Articles
  • Why Scheduled Alerts Mysteriously Stop Working
  • What The Immediate Alerts Timer Job Does (Five Stored Procedures)
  • How To Find Alerts That Didn't Fire
  • The Moral Of The Story
I've spent the past two weeks scouring the internet, tracing SQL, extracting table DDL and stored procedures from SharePoint content databases to finally crack the scheduled alert code. I've generated some SQL queries you can use to find failing alerts, force alerts to re-fire, and determine the type of each alert.

The SharePoint scheduled alert design hasn't changed much from WSS 3.0. This analysis applies to WSS 3.0, MOSS 2007, SharePoint 2010 and I suspect SharePoint 2013 but haven't verified it.   All my research and data analysis have lead to four words:

Synchronize your server times.

If your database server and WFEs are not sync'd to the time server, your scheduled alerts will fail. No setting changes, property changes, timer restarts, moving DB processing to another WFE... or whatever will fix your problem. And the next time you do a reboot, time will sync up and things will start working again. Maddening.

 

Links To Other Alert Troubleshooting Articles


I'm going to start with credit to some of the posts that helped me along before I get into my stuff.
I can sum most of this up fairly quickly.
  • Scheduled alerts are run via the Immediate Alerts timer job.
  • The Timerlock table in each database tells you which WFE runs timer jobs for it.
  • The property "alerts-enabled" must be set to true for each URL.
  • The property "job-immediate-alerts" tells you when the job runs, typically "every 5 minutes between 0 and 59".
  • Daily and weekly alert subscriptions are both stored in the SchedSubscriptions table - NotifyFreq column value of 1 = daily, 2 = weekly.
  • All activity on the site is stored in the EventCache table.
  • When a subscription is set on an EventCache row, an EventLog entry is created the next time the "Immediate Alerts" job runs from the timer.
  • Since immediate alerts fire immediately (hence the name) there is no need to store up alerts for a daily or weekly summary.

 

Why Scheduled Alerts Mysteriously Stop Working 


This is where we kept getting lost. There was nothing wrong with any part of the email or event processing architecture. Immediate alerts worked fine, scheduled alerts didn't go out. We needed a deeper understanding of what was really happening.
  • Scheduled alert subscriptions require an intersection entity (many to many relationship) and use the EventSubsMatches table to join on the EventCache Event ID and the SchedSubscriptions Subscription ID.
  • When the "Immediate Alerts" timer runs and the SchedSubscription NotifyTime has been past by the SQL server current time, all alerts joined to the subscription EventsSubsMatches table are summarized and sent out to the subscriber.
  • The processed rows are removed from EventSubsMatches.
  • The subscriptions where the NotifyTime has been passed by the WFE time are incremented by one unit (day, week).
Notice that the alerts are processed using the SQL server time and the subscriptions are incremented by the WFE time? If your SQL server time is running behind the WFE, it will fail to find any alerts to process (less than 24 hours) and then another stored procedure comes along and pushes the subscription forward like it was processed. This is why my subscriptions weren't going out.

If you turn on verbose for the Timer ULS log, you will see that the process is running fine but not finding anything to send out.  You watch your logs and you capture:
Begin invoke timer job Immediate Alerts, id {81E07A90-327B-471A-A3A6-7A7672F1D905}, DB {7ABFF5BD-7E23-43FE-B399-F6980AA3E954}
It's working!  You say to yourself...
AlertsJob loaded 0 of 0 event data records
AlertsJob loaded 0 of 0 subscription records
Alertsjob results for immediate delivery: 0 prematches, 0 passed filtering, 0 of 0 passed security trimming, 0 final after rollup
Alertsjob results for scheduled delivery: 0 prematches, 0 passed filtering, 0 of 0 passed security trimming, 0 final after rollup
AlertsJob processed 0 daily notifications in 0 digests, sent 0 emails, failed to send 0 emails
AlertsJob processed 0 weekly notifications in 0 digests, sent 0 emails, failed to send 0 emails
Wait, I know I subscribed to alerts and I SHOULD have seen some!  Why does it say "0" everywhere?

 

What The Immediate Alerts Timer Job Does


Time to fire up the SQL Profiler and run a trace.  I've captured the stored procedures that the timer job uses.  Now we can really see what goes on.

proc_GetEventDataAndSubscriptionFilters
This takes entries from EventCache table, processes immediate alerts
Updates EventLog
Updates EventBatch with timestamp and last processed EventCache Item ID
Nulls out the EventData and ACL columns in EventCache
proc_EnumSubscribedSites
Pulls a list of Site GUIDs that have subscriptions
proc_MatchSchedSubscriptions
For each site with subscriptions, pull the list of scheduled subscriptions that should fire
This one has a where clause that blocks my failed subscriptions
EventLog.EventTime  >= @EventTime - CASE
                    WHEN (SchedSubscriptions.NotifyFreq = 1)
                    THEN 1.0
                    ELSE 7.0
                    END
proc_DeleteEventLog
For each site collection, delete from EventSubsMatchs
Where EventLog.EventTime < current time
Delete from EventLog
proc_UpdateSchedSubscriptionTimes
Adds either a day or a week to NotifyTime and NotifyTimeUTC

As you can see, I tested all the where clauses for proc_MatchSchedSubscriptions and found that this particular one was selecting against my scheduled alerts.

Then, proc_UpdateSchedSubscriptionTimes procedure runs after everything - there is no two phased commit, no logic to ensure it didn't miss anything - just an increment by one to get ready for the next summary.
This is probably a good time to say:

 "Microsoft doesn't support any manual manipulation of the SharePoint content databases and recommends only using the API."  

So, proceed at your own risk :D.

 

How To Find Alerts That Didn't Fire


Open up your SQL Management Studio, connect to your data server, and run this query

Look at your results tab and you will see all the databases where scheduled alerts that should have been sent - weren't.  You will also see another curiosity - alerts with no e-mail addresses.  Why would those be there?  You can't send an alert unless you have an e-mail address right?  This cracks me up.  You can create an alert in the GUI even if your site collection user profile doesn't have an e-mail address.  However, the stored procedure where clause in proc_MatchSchedSubscriptions blocks those entries.  The user THINKS they are going to get alerts, nothing tells them they won't on the front end, and SharePoint just ignores them on the back end.  Isn't that nice?


Ok, you now have database names with failing alerts, now open a new query window and run all these queries at once against the DB in question. This is an older screen shot:
 
Hmmm, you say. What next? Well, I found out that if you push the Schedule back one unit (day, week), it will fire again the next time "Immediate Alerts" timer runs. Also, if there is no e-mail address, you can blow them away (UNSUPPORTED!!).  If you want to try your luck, you can use these queries.  I found this would fire off my alerts 90% of the time but they would fail again on the next rotation - because the server times were off.

The moral of this story:  

 

"Make sure all your system times are in sync and you can use the summary queries I provided to watch your alerts magically correct themselves as they cross the immediate alert timer boundary."

SharePoint Alerts - Force Scheduled Alerts, Remove Bad Alerts

--Push All Old Subscriptions
update SchedSubscriptions
  SET
        NotifyTime    = CASE WHEN (NotifyTime    IS NOT NULL) THEN NotifyTime    ELSE  @Now END - CASE WHEN (NotifyFreq = 1) THEN 1.0 ELSE 7.0 END,
        NotifyTimeUTC = CASE WHEN (NotifyTimeUTC IS NOT NULL) THEN NotifyTimeUTC ELSE  @Now END - CASE WHEN (NotifyFreq = 1) THEN 1.0 ELSE 7.0 END
  where Id in (select ss.Id
            FROM dbo.EventSubsMatches AS esm LEFT OUTER JOIN
            dbo.EventCache AS ec ON esm.EventId = ec.Id LEFT OUTER JOIN
            dbo.SchedSubscriptions AS ss ON esm.SubId = ss.Id LEFT OUTER JOIN
            dbo.EventLog AS el ON esm.EventId = el.Id
            WHERE (DATEDIFF(Hour,EventLog.eventtime,SchedSubscriptions.NotifyTimeUTC) > (24 * (CASE SchedSubscriptions.notifyfreq when 1 then 1.0 else 7.0 END))));
           
--Delete subscriptions with missing e-mail addresses
SET NOCOUNT ON;
DECLARE @sub uniqueidentifier;
DECLARE sub_cursor CURSOR FOR
    select ss.Id
    from UserInfo as ui inner join
    SchedSubscriptions as ss on ui.tp_ID = ss.UserId
    where ss.UserEmail = N'';
OPEN sub_cursor

FETCH NEXT FROM sub_cursor
INTO @sub
WHILE @@FETCH_STATUS = 0
BEGIN
    delete from SchedSubscriptions    where Id = @sub
    delete from EventSubsMatches    where SubId = @sub
    PRINT 'Deleting subscription with missing e-mail address ' + CAST(@sub as varchar(40))
    FETCH NEXT FROM sub_cursor
    INTO @sub
END
CLOSE sub_cursor;
DEALLOCATE sub_cursor;

SharePoint Alerts - Failing Alerts, All Databases Summary Query

exec sp_msforeachdb @command1 = 'print "?"; use [?] select db_name() as dbname,
  dbo.Webs.FullUrl
        ,sum(CASE dbo.SchedSubscriptions.NotifyFreq
                WHEN 1 THEN 1
            END) as Daily
        ,sum(CASE dbo.SchedSubscriptions.NotifyFreq
                WHEN 2 THEN 1
            END) as Weekly
        ,SUM(CASE dbo.SchedSubscriptions.UserEmail
                WHEN N'''' THEN 1
                ELSE 0
                END) MissingEmail
        ,MAX(DATEDIFF(hour,EventLog.EventTime,SchedSubscriptions.NotifyTimeUTC)) as OldestHours
FROM         dbo.EventCache with (nolock) INNER JOIN
                      dbo.EventLog with (nolock) ON dbo.EventCache.Id = dbo.EventLog.Id INNER JOIN
                      dbo.EventSubsMatches with (nolock) ON dbo.EventCache.Id = dbo.EventSubsMatches.EventId INNER JOIN
                      dbo.SchedSubscriptions with (nolock) ON dbo.EventSubsMatches.SubId = dbo.SchedSubscriptions.Id INNER JOIN
                      dbo.Webs with (nolock) ON dbo.EventCache.WebId = dbo.Webs.Id
WHERE (DATEDIFF(Hour,EventLog.eventtime,SchedSubscriptions.NotifyTimeUTC) > (24 * (CASE SchedSubscriptions.notifyfreq when 1 then 1.0 else 7.0 END)))
GROUP BY dbo.SchedSubscriptions.NotifyFreq, dbo.Webs.FullUrl
ORDER BY dbo.SchedSubscriptions.NotifyFreq, dbo.Webs.FullUrl '

SharePoint Alerts - Failing Alerts Single DB Query

--Scheduled Alerts Query
-- Tom Sheffrey 1/3/2013
-- Find alerts that should have fired
-- Check for missing Join values

-- DB and Server Info
select DB_NAME() as DatabaseName
    ,getdate() as ServerTimeStamp
    ,LockedBy
    ,dateadd(hour, -6,LockedTime) as LockedTimeCST from timerlock;

-- Failing alert summary
Select dbo.Webs.FullUrl
        ,count(CASE dbo.SchedSubscriptions.NotifyFreq
                WHEN 1 THEN 1
            END) as Daily
        ,count(CASE dbo.SchedSubscriptions.NotifyFreq
                WHEN 2 THEN 2
            END) as Weekly
        ,SUM(CASE dbo.SchedSubscriptions.UserEmail
                WHEN N'' THEN 1
                ELSE 0
                END) MissingEmail
        ,MAX(DATEDIFF(hour,EventLog.TimeLastModified,SchedSubscriptions.NotifyTimeUTC)) as OldestHours
FROM         dbo.EventCache with (nolock) INNER JOIN
                      dbo.EventLog with (nolock) ON dbo.EventCache.Id = dbo.EventLog.Id INNER JOIN
                      dbo.EventSubsMatches with (nolock) ON dbo.EventCache.Id = dbo.EventSubsMatches.EventId INNER JOIN
                      dbo.SchedSubscriptions with (nolock) ON dbo.EventSubsMatches.SubId = dbo.SchedSubscriptions.Id INNER JOIN
                      dbo.Webs with (nolock) ON dbo.EventCache.WebId = dbo.Webs.Id
WHERE (DATEDIFF(Hour,EventLog.eventtime,SchedSubscriptions.NotifyTimeUTC) > (24 * (CASE SchedSubscriptions.notifyfreq when 1 then 1 else 7 END)))
GROUP BY dbo.SchedSubscriptions.NotifyFreq, dbo.Webs.FullUrl
ORDER BY dbo.SchedSubscriptions.NotifyFreq, dbo.Webs.FullUrl;

-- Entries with NULL subscriptions
select COUNT(eventsubsmatches.eventid) as NullMatches
            from EventSubsMatches left outer join
                SchedSubscriptions on EventSubsMatches.SubId = SchedSubscriptions.Id
            where SchedSubscriptions.Id is NULL;
-- Detail records with EventType and Filter analysis
select w.FullUrl as WebUrl
    ,ec.SiteId
    ,esm.eventid
    ,esm.subid as SubscriptionID
    ,ec.ItemName
    ,ec.ItemFullUrl
    ,el.eventtime as EventTime
    ,ss.NotifyTime as SubscriptionNotifyTime
    ,w.TimeZone
    ,DATEDIFF(hour,ss.NotifyTime,ss.NotifyTimeUTC) as OffsetHour
    ,DATEDIFF(hour,el.eventtime,ss.NotifyTimeUTC) as Hours
    ,CASE ss.NotifyFreq
        when 1 THEN 'Daily'
        when 2 THEN 'Weekly'
    END as Frequency
    ,ss.UserEmail
    ,CASE ss.AlertType
        when 0 then 'List'
        when 1 then 'Item'
        else CAST(ss.AlertType as varchar(8))
    END as AlertType
    ,CASE ss.EventType
        when -1 THEN 'All Changes'
        when 1 then 'New items are added'
        when 2 then 'Existing items are modified'
        when 4 then 'Items are deleted'
        when 4080 then 'Web discussion updates'
        else 'Unknown'
    END as 'Change Type'
    ,CASE ss.Filter
        when N'' then 'Anything Changes'
        when '%Editor/New%' then 'Someone else changes a document'
        when '%Author%' then 'Someone else changes a document owned by me'
        when '%Editor/Old%' then 'Someone else changes a document modified by me'
        else 'Unknown'
    END as 'Send alerts for these changes'
FROM dbo.EventSubsMatches AS esm LEFT OUTER JOIN
     dbo.EventCache AS ec ON esm.EventId = ec.Id LEFT OUTER JOIN
     dbo.SchedSubscriptions AS ss ON esm.SubId = ss.Id LEFT OUTER JOIN
     dbo.EventLog AS el ON esm.EventId = el.Id LEFT OUTER JOIN
     dbo.Webs AS w ON ec.WebId = w.Id
WHERE (DATEDIFF(hour,el.eventtime,ss.NotifyTimeUTC) > (24 * (CASE ss.notifyfreq when 1 then 1.0 else 7.0 END)))
    or WebUrl is NULL
order by hours desc;