Thursday, February 14, 2013

User Profile Sync Issues - MOSS 2007, SharePoint

Errors had been popping up in our event log for some time and no resolution had been found by opening tickets with Microsoft so I embarked on another epic analysis and you get the cream off the top.

The user does not exist or is not unique

We were seeing errors in the event viewer that looked like:
failure trying to synch site <GUID> for ContentDB <GUID> WebApp <GUID>.  Exception message was The user does not exist or is not unique

Now right off the bat, I should let you know that we cheat a little and have developed some SQL Server Integration Services (SSIS) scripts that gather all these super helpful guid numbers and link them to fairly useless discripters like database names, web app URLs, site URLs...  So I just take the GUID and dump it in my search routine and voila - I know which site this little timer job is harping about.  You may need to query (never access the SharePoint database directly... blah blah blah) the objects table in the config database then query the sites table in the content database to find your site.

Analysis:

This was likely caused by user IDs that were formatted with an @ symbol instead of the common AD format with the \.  IE:  userid@domain (wrong) vs. domain\userid (right).  The assumption is that these IDs were migrated in from WSS 2.0.
Neither of the supported methods for removing the malformed ID worked:
·         STSADM -o deleteuser –url <siteurl> -userlogin <userid>
·         Using the API: SPUserCollection.Remove(<userid>)
The user ID had to be transformed to a supported format before it could be deleted properly from SharePoint using the API.  The user ID is stored in two places – the UserInfo table and the AllUserData table.  Site collection users are also members of a list.  Both locations must be updated with the corrected user ID.

Solution: Transform IDs to Supported Format

The following query will transform all malformed IDs to the supported format in all content databases on a database server (test, test, test, test... I recommend testing this in non-prod first.  test, test, test):
-- These series of statements will transform IDs with @ symbols in them to
-- the standard format ms\userid
drop table #UserIdsSel
create table #UserIdsSel (u_db varchar(128), u_siteid uniqueidentifier, id bigint, t_login varchar(255), u_type varchar(12))
exec sp_msforeachdb @command1 = 'print "?"; use [?]
create table #UserIds (u_db varchar(128), u_siteid uniqueidentifier, id bigint, t_login varchar(255), u_type varchar(12))
insert #UserIds
select db_name() as dbname
  ,tp_siteid
  ,tp_ID
  ,''<YOURDOMAIN>\'' + parsename(replace(tp_login,''@'',''.''),2 )as NewLogin
  ,''@ Removed''
from userinfo
where tp_ExternalTokenLastUpdated is NULL
 and tp_deleted = 0
 and tp_domaingroup = 0
 and tp_Login like (''%@%'')

 update UserInfo set tp_Login = t_login
 from #UserIds u
 where tp_SiteID = u.u_siteid
      and tp_ID = id

update AllUserData set nvarchar3 = t_login
 from #UserIds u
 where u.u_siteid = tp_SiteId
      and u.id = tp_ID
      and tp_contenttype = ''Person''

insert #UserIdsSel
select * from #UserIds

drop table #UserIds'

select * from #UserIdsSel
Once the IDs have been transformed to a supported format, they can be deleted without error.  Since they were malformed in the first place, they could not have been used to access SharePoint.

Duplicate User IDs in the Site Collection

We discovered that there were some site collections with duplicate user ID values.  The API should prevent this from happening but this is likely due to a migration.  We seem to be moving corruption from place to place instead of cleaning it up as we go.

Analysis:

The STSADM command and the API both operate against the first ID.  The code assumes the user ID is unique so the second value is missed.  The duplicate ID needs to be converted to a new value so the API can “see” it and handle it correctly.

Solution: Transform IDs to Unique IDs by Adding ‘zzz’

The duplicate ID needs to be transformed to a new value.  The following query will process any non-deleted, inactive duplicate ID in all content databases on the database server - you may need to go back and reread the assumptions I made if the query doesn't work for you:
--Transform duplicate ids to new values
drop table #UserIdsSel
create table #UserIdsSel (u_db varchar(128), u_siteid uniqueidentifier, id bigint, t_login varchar(255), u_type varchar(12))
exec sp_msforeachdb @command1 = 'print "?"; use [?]
create table #UserIds (u_db varchar(128), u_siteid uniqueidentifier, id bigint, t_login varchar(255), t_type varchar(12))
insert #UserIds
select db_name() as dbname
      ,u1.tp_siteid
      ,u1.tp_ID 
      ,u1.tp_Login + ''zzz''
      ,''Altered''
from userinfo u1 with (nolock) inner join
userinfo u2 with (nolock) on u1.tp_siteid = u2.tp_siteid
      and u1.tp_login = u2.tp_login
      and u1.tp_id <> u2.tp_id
where u1.tp_ExternalTokenLastUpdated is NULL
 and u1.tp_deleted = 0
 and u1.tp_domaingroup = 0

update UserInfo set tp_Login = t_login
 from #UserIds u
 where tp_SiteID = u.u_siteid
      and tp_ID = id

update AllUserData set nvarchar3 = t_login
 from #UserIds u
 where u.u_siteid = tp_SiteId
      and u.id = tp_ID
      and tp_contenttype = ''Person''

insert into #UserIdsSel
select * from #UserIds

drop table #UserIds'

Duplicate Key Value

Message: Cannot insert duplicate key row in object 'dbo.UserMemberships' with unique index 'CX_UserMemberships_RecordId_MemberGroupId_SID'. The duplicate key value is (<RecordID>, <MemberGroupID>, <SID>).
This error occurred on the Web Front End but actually refers to the SSP database.  While performing a User Profile Sync, a recursive membership stored procedure is fired off.  The query used to determine the uniqueness of the new row is different than the unique clustered index resulting in a failure to insert the row.
My analysis below returned 10,912 unique RecordIDs out of 175,602 in the UserMemberships table with 120,851 missing a valid recursive membership.  All have an SID.  These 120,851 are cases where we would get the error from the Event Viewer.  The record count in UserMemberships is 3,583,335.

Analysis:

I'm guessing this is due to some sort of corruption in the SSP.  I removed one of the failing entries from the UserMemberships table only to have it fail on another entry.  I pulled the info from the SSP Database stored procedures and I believe I have found where it occurs:
Stored Procedure:  membership_updateRecursiveMemberships
BEGIN
INSERT INTO UserMemberships (RecordId, SID, MemberGroupId, GroupType, GroupTitle, PolicyId, ItemSecurity)
SELECT IM.MasterRecordId, NULL, IM.MemberGroupId, @GroupType_DistributionList, N'', @DLSourceId, @DefaultItemPrivacy
FROM #MembershipsFromImport IM
LEFT JOIN UserMemberships UM ON UM.RecordId = IM.MasterRecordID AND UM.MemberGroupId = IM.MemberGroupId
WHERE UM.MemberGroupId IS NULL
SET @ErrorCode = @@Error
END
This query defines what to insert into the table:
SELECT DISTINCT U.MasterRecordId, G2.[Id], G2.SourceReference, G2.cs_SourceReference
FROM #UsersToConsider U
INNER JOIN UserMemberOf V WITH (NOLOCK) ON U.RecordId = V.RecordId
INNER JOIN MemberGroup G WITH (NOLOCK, INDEX=IX_MemberGroup_Source_SourceReference) ON V.cs_SourceReference = G.cs_SourceReference AND V.SourceReference = G.SourceReference AND G.Source = @DLSourceId
INNER JOIN MembershipRecursive R WITH (NOLOCK) ON G.[Id] = R.GroupId
INNER JOIN MemberGroup G2 WITH (NOLOCK) ON R.ParentGroupId = G2.[Id]

So I did some research on how I might be able to find some corruption.  I pulled the values above and did a union all on UserMemberships to see the differences:
SELECT DISTINCT U.MasterRecordId, G2.[Id], G2.SourceReference, G2.cs_SourceReference
FROM UserProfile_Full U
INNER JOIN UserMemberOf V WITH (NOLOCK) ON U.RecordId = V.RecordId
INNER JOIN MemberGroup G WITH (NOLOCK, INDEX=IX_MemberGroup_Source_SourceReference) ON V.cs_SourceReference = G.cs_SourceReference AND V.SourceReference = G.SourceReference
INNER JOIN MembershipRecursive R WITH (NOLOCK) ON G.[Id] = R.GroupId
INNER JOIN MemberGroup G2 WITH (NOLOCK) ON R.ParentGroupId = G2.[Id]
where U.RecordID = 321984
union all
select UM.RecordId
      ,UM.MemberGroupId as id
      ,NULL
      ,NULL
from UserMemberships UM
where UM.RecordId = 321984
order by id asc

When I performed this query, I found that the "valid" user memberships had a record in both cases.  However, the "invalid" memberships were missing a row from the first case.
So I wrote another query to find all "invalid" memberships in the SSP and I was blown away by how many hits I had:
drop table #groupMembership
create table #groupMembership (
      record bigint,
      mgroup bigint
      )
insert into #groupMembership
SELECT DISTINCT U.MasterRecordId, G2.[Id]
FROM UserProfile_Full U
      INNER JOIN UserMemberOf V WITH (NOLOCK) ON U.RecordId = V.RecordId
      INNER JOIN MemberGroup G WITH (NOLOCK, INDEX=IX_MemberGroup_Source_SourceReference) ON V.cs_SourceReference = G.cs_SourceReference AND V.SourceReference = G.SourceReference
      INNER JOIN MembershipRecursive R WITH (NOLOCK) ON G.[Id] = R.GroupId
      INNER JOIN MemberGroup G2 WITH (NOLOCK) ON R.ParentGroupId = G2.[Id]

select um.RecordId
         ,sum(case when SID IS NULL then 1 else 0 end) NULLSid
         ,sum(case when SID IS null then 0 else 1 end) WithSid
from UserMemberships  UM with (nolock)
where um.id not in
      (select um.id
      from UserMemberships UM with (nolock) inner join
            #groupMembership gm on UM.RecordId = gm.record and UM.MemberGroupId = gm.mgroup)
and UM.RecordId <> -1
group by UM.RecordId
order by UM.RecordId

Likely Solution:  Delete Database Sync Information

All memberships that seemed to be causing failures were manually deleted with another series of SQL queries but the correct method is likely the stsadm sync command.  Microsoft Premier support claims that the sync command will clear the SSP of these relationships and cause them to be recreated on the next firing of the “Profile Synchronization” timer job which by default runs every hour.  The sync timer frequency can be changed using the -SyncTiming option - M:5 would set it to run every 5 minutes.

stsadm -o sync -listolddatabases 0

Shared Service Provider SharedServices1
ID: e51f0d3e-85cc-498a-b330-04a2ccfc1086  Synchronized: 2/5/2013 8:55:06 AM
ID: 401a8778-32de-4836-82db-085191159ce7  Synchronized: 2/5/2013 8:55:07 AM
ID: ea06156e-27a5-4bb5-a3c5-0ae046083930  Synchronized: 2/5/2013 8:55:06 AM
ID: d82336e9-71c1-40eb-8c41-10793f7100db  Synchronized: 1/22/2013 8:50:07 AM
ID: 5f93b1cd-0510……

stsadm -o sync -deleteolddatabases 0

Deleted sync information for DB e51f0d3e-85cc-498a-b330-04a2ccfc1086
Deleted sync information for DB 401a8778-32de-4836-82db-085191159ce7
Deleted sync information…..