Salesforce Sync at Enterprise Scale: Backfill, Throttling & Dedup (2026)

Ampersand Blog Writings from the founding team

17 min read
May 4, 2026
Article cover image

Salesforce Sync at Enterprise Scale: Backfill Strategy, API Throttling, and Account Deduplication for Millions of Records

How to scale Salesforce integrations to millions of records with backfills, throttling strategies, deduplication logic, and schema evolution

Chris Lopez's profile picture

Chris Lopez

Founding GTM

Salesforce Sync at Enterprise Scale: Backfill Strategy, API Throttling, and Account Deduplication for Millions of Records

The first Salesforce integration most product teams ship is for a 50,000-record customer. The second is for a 500,000-record customer. The third is for a five-million-record customer with a custom data model, a deduplication layer, and an opinion about which API tier you are allowed to use during their business day. The architecture that worked for the first two breaks on the third, and almost always in ways that surface as a 3am page during the customer's first backfill.

We have spent a meaningful amount of time advising engineering teams running into this exact transition. A Series B SaaS company sells into a Fortune 500 customer. The customer's Salesforce org has eight million Account records, twelve million Contact records, custom field schemas inherited from a 2014 migration, three duplicate-management layers, and a peak API throughput envelope shaped by Marketing Cloud and a half-dozen point integrations. The Series B's product team, who up until that moment thought of Salesforce as "the CRM with the REST API," is now expected to backfill, sync, and reconcile against an environment that will cheerfully throw 503s at them if they push too hard.

This post is for the engineering leaders about to make that transition. It walks through the four problems that show up at enterprise scale (backfill volume, API throttling, account deduplication, and schema evolution), the architecture that works, and the specific mechanisms (throttle ceilings, webhook throughput envelopes, programmatic field-add) that separate integrations that survive from integrations that get rolled back.

The four problems at enterprise scale

A Salesforce integration that handled 50,000 records cleanly does not automatically scale to eight million. Four specific problems compound.

The first is the initial backfill. Pulling eight million Account records out of Salesforce requires either the Bulk API 2.0 (with its job-based query model and result polling) or paginated REST queries (with their limits on page size and request count). Either way, the backfill is not a one-API-call event. It is a multi-hour, often multi-day, sustained workload that has to coexist with the customer's other Salesforce traffic. Naive backfills saturate the customer's API quota, get throttled, and either fail outright or take so long that the customer's IT team intervenes.

The second is API throttling. Salesforce enforces a 24-hour rolling API request limit per org, scaled by license tier. A typical Enterprise org has 1,000,000 requests per day per active user, capped at a per-org maximum. Marketing Cloud, customer-built point integrations, and Salesforce-internal jobs (workflow rules, flow executions, asynchronous Apex) all draw against the same pool. A new integration vendor that consumes 30% of the customer's daily quota during business hours will be the thing the customer's IT team blames the next time their reports are slow, even if the actual cause is unrelated. Best practice is to cap the integration's quota usage at a configurable percentage (a default of 80% is reasonable for backfills, with the option to drop to 30% or 40% during the customer's business hours), and to expose that ceiling per-customer.

The third is account deduplication. Eight million Account records do not arrive deduplicated. The customer typically has multiple records for the same logical account (legacy migrations, partner-routing duplicates, multi-touch attribution overlays). The integration vendor's product cannot treat each Salesforce Account ID as a distinct customer. It has to absorb the customer's deduplication signal, which might come from a custom "GoldenAccountID" field, from Salesforce's own duplicate-management rules, from a third-party MDM tool, or from a manual mapping the customer's RevOps team maintains. This is not the integration platform's job to solve, but it is the integration platform's job to expose the right fields and join keys so the product team can solve it.

The fourth is schema evolution. The customer's Salesforce schema is not static. New custom fields get added by the customer's admin every quarter. Legacy fields get deprecated. Object types get added (custom objects, picklist value changes). An integration that hardcodes the schema at install time will break the first time the customer's admin adds a new field the product team needs. The integration has to support schema discovery, a programmatic field-add mechanism, and a process for surfacing schema changes to the integration vendor's product team.

These four problems are not exotic. They are the median enterprise Salesforce integration story. The architectural answer is the same in every case: treat the integration as a managed product surface, not a one-off engineering project, and build the throttling, backfill, dedup-aware mapping, and schema discovery into infrastructure.

Backfill strategy: time, throughput, and quota interaction

The first decision in any enterprise Salesforce backfill is which entities to sync, in what order, and at what throughput. The instinct most teams have is to "sync everything in parallel for speed." This is wrong, because Salesforce's API quota is shared across all queries. Parallel backfill of Accounts, Contacts, and Opportunities triples the quota draw without tripling the wall-clock speedup, because each query stream is competing for the same rate-limited backend.

The pattern that works is to sync entities sequentially, prioritized by the customer's most urgent need. For most B2B SaaS customers, that means Accounts first (because everything else joins on them), then the user records (so you have ownership context), then Contacts and Leads. Opportunities, Tasks, Activities, and custom objects come after.

Throughput within an entity should be configured against a percentage of the customer's daily quota. The default we recommend is 80%, with the explicit option to drop to 30% or 40% during the customer's business hours and ramp to 90% during the overnight window. The configuration should be per-customer (because customer A has different business hours than customer B and a different total quota envelope) and per-project (because a one-time backfill can run hotter than an ongoing incremental sync).

For an eight-million-record backfill at typical Bulk API 2.0 throughput, with 80% quota utilization, expect 2 to 3 days of wall-clock time. Some teams find this surprising. It is not. The bottleneck is not the integration vendor's compute. It is the customer's API quota, and the customer is paying their Salesforce license for a fixed envelope. The right answer is to set expectations at the start of the integration, not to try to outrun the limit.

Field selection is the other lever. Backfilling all 200 fields on the Account object is slower than backfilling the 30 fields the product actually uses. The customer's typical objection ("but we might want field X later") should be answered by the integration's ability to add fields programmatically without a redeploy. We expose this through an update installation API that lets the integration vendor add objects or fields to a running installation, with the change taking effect on the next sync window. This means field selection at backfill time can be conservative without becoming a future blocker.

API throttling: configurable ceilings per customer

The integration platform's role in throttling is to expose a configurable ceiling, default it intelligently, and respect it without exception. The mechanisms.

A throttle ceiling, expressed as a percentage of the customer's daily API quota, configurable per integration installation. The default of 80% leaves headroom for the customer's other traffic. Customers who run hot Marketing Cloud workloads might want 50%. Customers who use Salesforce only for the integration might want 95%.

A schedule-aware throttle adjustment. The integration drops to a lower ceiling during the customer's business hours (say, 30% from 9am to 6pm in the customer's primary timezone) and ramps to the configured ceiling overnight. This is non-trivial to implement correctly because business hours vary per customer, and many enterprises operate across multiple timezones.

Adaptive backoff on 429 and 503 responses. When Salesforce returns a quota-exceeded or service-unavailable response, the integration backs off exponentially, retries with jitter, and reports the throttle event to operators. The integration should never silently drop a message because it got a 429.

Per-customer telemetry. The dashboard should show, per customer, current quota utilization, peak utilization in the last 24 hours, and any throttle events. This is what allows the integration vendor's support engineer to answer the customer's "is your integration eating my quota?" question with data rather than apology. We have written about how auth and token management is its own product surface, and quota management belongs to the same family of "things you assumed were free that turn out to be load-bearing infrastructure."

Account deduplication: exposing the right fields, not solving the problem

A common architectural mistake is for the integration platform to attempt deduplication on its own. Do not do this. The customer's deduplication is the customer's source of truth, often layered through third-party MDM tools (Reltio, Informatica, Oracle Customer Hub) or custom Salesforce rules. The integration platform's job is to expose the deduplication signal, not to invent it.

Concretely, the integration should support fetching a configurable set of deduplication-relevant fields per Account: the Salesforce Account ID, any custom "GoldenAccountID" or "MasterRecordID" field, the duplicate-rule outcomes from Salesforce's native duplicate management, and any external-system IDs the customer maps in (NetSuite Customer ID, ZoomInfo Company ID, Clearbit Company Domain). The product layer consuming the integration's output then performs the dedup logic against its own data model, using these fields as join keys.

The mechanism for fetching this is custom-query support against the customer's tenant. SOQL is Salesforce's query language, and a configurable SOQL query per customer (or per integration installation) lets the customer's RevOps team specify exactly which fields drive the dedup logic. The same architectural argument applies across other systems of record: each customer has their own dedup logic, and the integration platform's job is to expose the query interface in the customer's native query language, not to abstract it away. We have written elsewhere about how field mapping is how AI agents learn enterprise reality, and the deduplication problem is one of the cleanest cases for why per-customer mapping infrastructure beats hardcoded assumptions.

Associations between objects (account owner email, contact-to-account joins, opportunity-line-item rollups) are typically supported via manual setup at integration time, with the integration vendor's CS engineer working with the customer's RevOps team to identify the right join keys. Once those joins are configured, they propagate through the integration's reads automatically.

Schema evolution: the programmatic field-add API

The single feature that separates an integration that survives a year in production from one that gets rolled back at the six-month mark is the ability to add fields and objects programmatically.

The customer's Salesforce admin will add a new field. The integration vendor's product will need that field. Without a programmatic add, the only path is to deploy a new version of the integration code, with all the testing and release-cycle overhead that implies. With a programmatic add, the integration vendor's CS engineer makes a single API call (or edits a YAML config that is then synced) and the new field is part of the next sync.

This is the pattern we expose through Ampersand's update installation API. The integration vendor calls it from their own application code (typically from an admin UI their CS team uses) and the new objects or fields land in the running installation without any further customer involvement.

Webhook throughput is the other dimension of schema evolution that matters at enterprise scale. The customer's Salesforce org generates change events through Platform Events, Streaming API, or Change Data Capture. At eight million records with normal daily churn, the change-event volume can hit tens of thousands of messages per hour. The integration platform's webhook ingestion has to absorb this. We support 1,000 messages per second per customer with 300KB payload caps, and a retry mechanism for failed deliveries. Most home-built integrations underspec this and end up with backlog spirals during high-churn windows (year-end forecast adjustments, big migration events, RevOps cleanup projects).

For enterprise customers, the practical recommendation is to use an admin user (or a dedicated integration user with admin permissions) for initial setup and the first backfill, then switch to an API-only user for ongoing operation. The admin user has the visibility needed to debug the initial setup. The API-only user is the right security posture for production. Both should be supported and documented.

Industry context: enterprise Salesforce integration is the procurement gate

Salesforce remains the dominant CRM at the enterprise tier, and its share is, if anything, expanding as Microsoft Dynamics and HubSpot push upmarket but have not yet displaced Salesforce in Fortune 500 accounts. For any product team selling into mid-market or enterprise, Salesforce integration is the procurement gate. RFP questions like "what's your max throughput against our org? What's your throttle behavior? How do you handle our duplicate management rules?" are now standard.

The depth of the integration is the depth of the product. We have written about building multi-tenant CRM integrations at scale and why traditional platforms fall short. The short version: a Salesforce integration that handles one customer well is not the same as a Salesforce integration that handles a thousand customers, each with their own quota envelope, schema, and dedup logic. Multi-tenancy is the actual problem. Single-tenant integrations are easy. Multi-tenant integrations are where most engineering teams hit the wall.

Industry analysts have started to surface this as a category. Gartner's 2026 CRM integration platform analysis notes that "embedded integration capabilities" (their term for what we call native product integrations) are now a procurement criterion for B2B SaaS vendors selling into Salesforce-heavy customer bases. The buyer expectation is no longer "you have an integration." It is "your integration handles our specific Salesforce environment without us having to think about it."

Comparison: home-built, embedded iPaaS, and Ampersand for enterprise Salesforce sync

DimensionHome-built with Salesforce SDKsEmbedded iPaaSAmpersand
Backfill of 8M+ records4 to 8 weeks of engineeringOften hits scale walls2 to 3 days of wall-clock, configured
Per-customer throttle ceilingBuild and maintainInconsistentBuilt-in, default 80% configurable
Business-hours-aware throttlingBuild and maintainLimitedConfigurable per customer
Account deduplication supportCustom codeRecipe-levelNative SOQL query support per customer
Schema evolution / field addCode change, redeployRecipe edit, often manualupdate installation API, no redeploy
Webhook throughput at scaleCustom infraOften capped low1,000 msg/sec, 300KB cap, retries
Admin vs API user lifecycleCustom workflowLimitedFirst-class, with switch flow
Per-customer observabilityBuild dashboardsGenericPer-customer logs, alerts, quota telemetry
Engineering FTE per year2 to 30.5 to 1 plus iPaaS0.25 to 0.5

The "embedded iPaaS" column is worth a final note. Embedded iPaaS products were designed for customer-facing automation surfaces, but they were not designed for the ongoing operational reality of enterprise Salesforce integrations. The throttling story, the dedup story, the schema-evolution story, and the per-customer observability story all break down at the seams of an iPaaS product. We have written about why migrating from embedded iPaaS to native product integrations reduces engineering overhead for exactly this reason.

How Ampersand handles enterprise Salesforce sync

Ampersand is a deep integration platform built for product developers shipping Salesforce, HubSpot, and other CRM integrations to their own customers. For enterprise Salesforce specifically, the load-bearing capabilities are these.

Configurable throttle ceilings per integration installation, defaulting to 80%, with business-hours-aware adjustments. Adaptive backoff on 429 and 503 responses. Per-customer quota telemetry on the dashboard.

Bulk API 2.0 backfill orchestration with sequential entity ordering, paginated checkpointing, and resumable jobs. A backfill that gets interrupted at hour 18 picks up at hour 18, not hour zero.

Custom SOQL query support per installation, scoped to the customer's tenant, configurable through declarative YAML. This is the mechanism that lets each customer's dedup logic drive the integration's reads.

The update installation API, which lets the integration vendor add objects or fields to a running installation programmatically. No redeploy, no customer intervention.

Webhook ingestion sized for enterprise volumes: 1,000 messages per second per customer, 300KB payload cap, retry with exponential backoff on delivery failures.

Per-customer dashboards with logs, alerts, error rates, throttle events, and quota usage. The integration vendor's support engineer can answer the customer's "is your integration the bottleneck?" question with data.

The Ampersand sell

Enterprise Salesforce integration is the operational story most product teams underestimate. The backfill is harder than expected. The throttling is more sensitive than expected. The dedup is messier than expected. The schema evolves faster than expected. The webhook volume is bigger than expected. Every one of those compounds, and the cost shows up as an integration that gets rolled back at the six-month mark or as an engineering team that becomes a Salesforce-maintenance shop.

Ampersand handles the full lifecycle. Configurable throttling. Resumable backfills. Custom SOQL queries per customer. Programmatic field add. Enterprise-grade webhook ingestion. Per-customer observability. We support thousands of customer installations across CRMs in production, and we have heard from product leaders, including Hatch CTO John Pena ("Ampersand lets our team focus on building product instead of maintaining integrations"), that the operational benefit shows up almost immediately.

The Ampersand documentation walks through the throttle configuration, the SOQL query model, the update installation API, and the webhook ingestion architecture. The how-it-works page shows the platform end to end. If you want to talk through the specifics of your enterprise Salesforce environment with an engineer who has shipped this exact pattern, that conversation is one click away on the main site.

FAQ

How does Ampersand handle a multi-million record backfill?

Bulk API 2.0 with sequential entity ordering and paginated checkpointing. The backfill is configured per installation, runs against a configurable percentage of the customer's daily quota (default 80%), and resumes from its last checkpoint if interrupted. For 8 million Account records, expect 2 to 3 days of wall-clock time at default settings.

What's the throttle behavior during the customer's business hours?

The default is to drop to a lower ceiling (typically 30% to 40%) during business hours and ramp to the full configured ceiling overnight. Business hours are configurable per customer and respect timezone.

How do I add new fields to a running integration without a redeploy?

The update installation API lets you programmatically add objects or fields to an active installation. Your CS engineer (or your customer-facing UI) calls it directly. The change takes effect on the next sync.

Can the integration query my customer's tenant with custom SOQL?

Yes. Custom SOQL queries per installation are first-class. The query lives in declarative configuration, version-controlled, and is editable per customer.

How does Ampersand handle account deduplication?

We expose the dedup-relevant fields (Master Record ID, custom golden-record fields, third-party MDM IDs) through configurable SOQL queries. The deduplication logic lives in your product layer, where it should. We provide the join-key infrastructure.

What's the recommended user posture for enterprise customers?

Connect with an admin user for initial setup and backfill, then switch to an API-only user for ongoing operation. Both are supported.

What about HubSpot, Microsoft Dynamics, and Zendesk at the same scale?

All supported, with the same architectural model. Throttle ceilings, custom queries, programmatic field-add, and per-customer observability extend across CRMs.

Conclusion

Enterprise Salesforce integration is the operational reality most product teams discover only after they have signed their first multi-million-record customer. The backfill, throttling, deduplication, schema evolution, and webhook throughput stories all compound. Building this in-house is possible, but the cost is years of engineering investment and a permanent maintenance tax. The architecture that scales is configurable throttling, resumable backfills, custom queries per customer, programmatic schema evolution, and per-customer observability, all delivered as managed integration infrastructure.

Ampersand is built for this. If you are sitting on an enterprise Salesforce blocker right now, or if you are about to scope your next backfill, the right path is to ship on infrastructure that already handles the operational complexity. Learn more at withampersand.com.

Recommended reads

View all articles
Loading...
Loading...
Loading...