Replication always brings up new surprises, and the most recent surprise I encountered was this error message on every single publication from one database:
“could not execute sp_replcmds on server [servername]”
Clearly this problem was with the log reader agent as it was effecting all of the publications, and reading on from the error message it detailed the cause of the problem:
“could not locate text information records for the column [column], id [id] during command construction.”
So a text column was causing the replication issues. According to Microsoft best practices with replication, it is not recommended to use the “text” data type. Instead, NVARCHAR(MAX) is recommended. But at any rate, save for a snapshot scheduled for 2AM to add a table with a single column of DATETIME2(3), nothing else had been added to this publication for some time. So the column causing issues had always been in replication. And if you do not have the ability to change the data type of the column, you’ve got to get it fixed. First thing to do is find the table the text column is on, so run the following SQL on the database being published:
SELECT C.id, C.name, T.name FROM sys.columns C INNER JOIN sys.tables T ON T.object_id = C.object_id WHERE C.Id = [id] AND C.name = [column]
Unless you are very unfortunate, the above SQL will return only one table name, which is your table causing issues. So now you can find the publication the table is in.
On the Distribution database for the published database, run the following SQL:
SELECT a.article, p.publication FROM MSArticles a INNER JOIN MSPublications p on p.publication_id = a.publication_id where a.article = [tablename]
This will give you all of the publications that the table with the column causing replication issues. At this point the fastest way to fix this with the least disruption is to take this table out of replication. So go to the publisher/s and remove this article. You may find that removing this table results in the same error above but for a different table… it did for me. So I re-ran the process and removed the 2nd table from replication.
Eventually the log reader for the publisher database went green, and replication was able to continue. Now you need to re-add the articles back into replication and run the snapshot agent to push the tables out again. There will only be a lock on the tables that have been removed from replication whilst the snapshot is taken, so the impact is mitigated. Once the tables are re-added then replication will continue as normal.
The advantage of removing/adding and running snapshots over reinitialization is that it is faster and will lock less tables.
Can’t wait to see what happens with replication next time! To be sure I’ll blog about it!Follow @rPh0enix