Define operational standards for managing SQL Server transactional replication environments. Covers topology roles, monitoring, safe maintenance windows, subscription management, and troubleshooting. Snapshot and merge replication are not addressed here.
| Role | Description |
|---|---|
| Publisher | Source instance. Holds the publication database. Marked articles are tracked by the Log Reader Agent. |
| Distributor | Intermediary. Holds the distribution database. Stores transactions between publisher and subscriber. May be co-located on the publisher or on a dedicated instance. |
| Subscriber | Destination instance. Receives replicated transactions via the Distribution Agent. |
A dedicated distributor instance is preferred for production environments with high transaction volume — co-locating the distributor on the publisher adds contention on the publisher's log reader.
Before configuring replication:
- Publication database is in FULL recovery model — Simple recovery is not compatible with transactional replication
- Publisher and subscriber share compatible collation (
SQL_Latin1_General_CP1_CI_AS) - Snapshot share is accessible by both publisher and distributor service accounts (read/write)
- Distribution database is sized appropriately — undersizing causes latency accumulation under load
- SQL Server Agent is running on all three roles
- Service accounts have
db_owneron the distribution database and appropriate access to publication databases
# Verify recovery model on publication candidates
$splatRec = @{
SqlInstance = $publisher
EnableException = $true
}
Get-DbaDatabase @splatRec |
Where-Object RecoveryModel -ne 'Full' |
Select-Object SqlInstance, Name, RecoveryModelReplication latency is the delay between a transaction committing on the publisher and arriving at the subscriber. Monitor via the distribution database:
SET NOCOUNT ON;
SELECT
msp.[publisher_db],
mss.[subscriber_db],
mss.[last_delivered_time],
mss.[latency] AS latency_sec,
mss.[delivery_rate] AS rows_per_sec,
mss.[delivery_latency] AS delivery_latency_ms
FROM [distribution].[dbo].[MSsubscriptions] AS mss
JOIN [distribution].[dbo].[MSpublisher_databases] AS msp
ON mss.[publisher_id] = msp.[publisher_id]
AND mss.[publisher_db_id] = msp.[id]
ORDER BY mss.[latency] DESC;# dbatools — publication status across all publications
$splatPub = @{
SqlInstance = $distributor
EnableException = $true
}
Get-DbaReplPublication @splatPub |
Select-Object SqlInstance, PublicationName, PublicationType, StatusLatency thresholds:
| Threshold | Action |
|---|---|
| < 60 seconds | Normal |
| 60 – 300 seconds | Investigate — check Log Reader and Distribution Agent job history |
| > 300 seconds | Alert — identify root cause before latency compounds |
| Growing unbounded | Agent likely stopped or erroring — check agent jobs immediately |
Replication relies on three SQL Agent jobs per publication/subscription: Log Reader Agent, Snapshot Agent, and Distribution Agent. Monitor their status:
SET NOCOUNT ON;
SELECT
j.[name] AS AgentJob,
j.[enabled],
h.[run_date],
h.[run_time],
CASE h.[run_status]
WHEN 0 THEN 'Failed'
WHEN 1 THEN 'Succeeded'
WHEN 2 THEN 'Retry'
WHEN 3 THEN 'Cancelled'
WHEN 4 THEN 'In Progress'
END AS LastStatus,
h.[message]
FROM [msdb].[dbo].[sysjobs] AS j
JOIN [msdb].[dbo].[sysjobhistory] AS h
ON j.[job_id] = h.[job_id]
WHERE j.[name] LIKE '%REPL%'
OR j.[category_id] IN (
SELECT [category_id]
FROM [msdb].[dbo].[syscategories]
WHERE [name] LIKE 'REPL%'
)
ORDER BY h.[run_date] DESC, h.[run_time] DESC;Certain DBA operations require coordination before being performed on published databases.
Index rebuilds on published tables are logged operations and generate a large volume of log records that the Log Reader Agent must process. This can spike latency significantly.
- Schedule index rebuilds on publisher databases off-hours only.
- Monitor latency during and after the rebuild window until it returns to baseline.
- Do not rebuild multiple published tables simultaneously.
Transaction log backups on the publisher truncate the log only after the Log Reader Agent has read and forwarded all transactions to the distributor. If the Log Reader Agent is stopped or lagging, log backups will not shrink the active log — the log grows until the agent catches up.
If log growth is unexpectedly high:
- Check Log Reader Agent status and latency.
- Resolve the agent issue before taking corrective action on the log file.
- Do not set recovery model to SIMPLE as a workaround — this drops the publication.
Schema changes on published articles must be replicated to subscribers. SQL Server handles DDL replication automatically for supported changes, but some changes require manual intervention or subscription reinitialization.
Before making schema changes to published tables:
- Confirm the change is in the list of DDL changes supported by transactional replication.
- Test the change on the test replication environment first.
- Coordinate timing with a low-latency window.
- Monitor agent jobs immediately after the change for errors.
Reinitialize when a subscription is out of sync and latency cannot be resolved by letting the distribution agent catch up.
- Subscription is marked as inactive or unsubscribed due to extended latency
- Log Reader Agent shows unresolvable errors related to missing log records
- Schema mismatch between publisher and subscriber article tables
# Generate a new snapshot and reinitialize
$splatReinit = @{
SqlInstance = $publisher
Database = $publicationDb
PublicationName = $publicationName
EnableException = $true
}
Invoke-DbaReplPublicationRebuild @splatReinitFor large publications, snapshot generation can take significant time and will increase publisher disk and I/O load. Schedule during a low-activity window.
Use only if the existing snapshot is recent and subscriber schema matches:
-- On the distributor
EXEC [distribution].[dbo].[sp_reinitsubscription]
@publication = N'PublicationName',
@subscriber = N'<subscriber-instance>',
@destination_db = N'<subscriber-database>',
@invalidate_snapshot = 0; -- 0 = use existing snapshot| Error | Likely Cause | Resolution |
|---|---|---|
| Log Reader Agent: "Could not find log record" | Log backup ran before Log Reader processed records | Check Log Reader latency; if log was backed up with COPY_ONLY this should not occur — verify backup type |
| Distribution Agent: deadlock on subscriber | Concurrent writes on subscriber conflicting with replication | Identify conflicting process; replication agents retry deadlocks automatically up to the configured retry count |
| Snapshot Agent failure | Snapshot share permissions or disk space | Verify service account access to snapshot share; check free disk space |
| Subscription marked inactive | Latency exceeded the max_distrib_retention threshold |
Reinitialize the subscription |
| "The process could not execute 'sp_replcmds'" | Log Reader Agent lacks permission on publication DB | Verify Log Reader Agent account has db_owner or explicit EXECUTE on sp_replcmds |
Replication Monitor (SSMS > Replication > Launch Replication Monitor) provides a graphical view of all publications, latency history, and agent job logs. Use it for initial diagnosis before querying distribution tables directly.
SET NOCOUNT ON;
-- Last 50 error messages from all replication agents
SELECT TOP 50
[time],
[error_code],
[error_text],
[xact_seqno]
FROM [distribution].[dbo].[MSrepl_errors]
ORDER BY [time] DESC;- Monitoring — general job monitoring procedures
- SQL Agent Job Standards — replication agent job naming and configuration
- Log Shipping Setup — alternative DR approach for non-published databases