Plaid's Bank Integration API: A System Design Study

TL;DR

Plaid is an integration abstraction layer over more than 12,000 banks, exposed as a single REST and webhook API. The architecture hides two fundamentally different backends: OAuth-based open-banking APIs for institutions that expose them, and screen-scraping adapters for the long tail that still don't. Plaid's token design — short-lived Link tokens, exchanged for long-lived access tokens — lets integrators reason about one interface while Plaid absorbs the heterogeneity.

Plaid is a case study in building a product that is, from the integrator's perspective, boring. A developer calling /transactions/sync gets transactions. A developer calling /accounts/balance/get gets balances. The developer does not think about whether the data originated from a Chase Open Banking endpoint, an OFX feed from a regional credit union, or a browser automation farm scraping a 2005-era bank portal. That invisibility is the product.

This piece is a system design study of how Plaid achieves the abstraction: token flow, webhook architecture, rate limiting, and the long, uneven transition from screen scraping to open banking.

The surface area: what Plaid exposes

Plaid's public API is organized around a small set of product endpoints — Transactions, Auth, Balance, Identity, Assets, Investments, Liabilities — each of which returns normalized JSON regardless of institution. Every integration with Plaid starts with a common flow:

Server creates a Link token by calling /link/token/create with the integrator's client_id, secret, and the list of products they want access to.
Client SDK (Plaid Link) renders the bank-selection and authentication UI, using the Link token as a session credential.
User authenticates with their bank. Plaid returns a public_token to the client.
Server exchanges the public_token for a long-lived access_token via /item/public_token/exchange.
Server uses access_token for all subsequent data pulls.

The design is a three-party OAuth variant, where Plaid is the authorization broker. The Link token is short-lived (30 minutes), single-use, and scoped to a specific integrator. The public_token is even shorter lived (30 minutes). The access_token is long-lived, bound to a Plaid "Item" (a user's connection to a single institution), and is the only credential the integrator stores.

// TypeScript sketch: Plaid token exchange on a Node backend.
import { Configuration, PlaidApi, PlaidEnvironments } from 'plaid';

const client = new PlaidApi(new Configuration({
  basePath: PlaidEnvironments.production,
  baseOptions: {
    headers: {
      'PLAID-CLIENT-ID': process.env.PLAID_CLIENT_ID!,
      'PLAID-SECRET': process.env.PLAID_SECRET!,
    },
  },
}));

// 1. Create a Link token for the client SDK.
export async function createLinkToken(userId: string) {
  const res = await client.linkTokenCreate({
    user: { client_user_id: userId },
    client_name: 'Example App',
    products: ['transactions'],
    country_codes: ['US'],
    language: 'en',
    webhook: 'https://api.example.com/plaid/webhook',
  });
  return res.data.link_token;
}

// 2. Exchange the short-lived public_token for a long-lived access_token.
export async function exchangePublicToken(publicToken: string) {
  const res = await client.itemPublicTokenExchange({
    public_token: publicToken,
  });
  return {
    accessToken: res.data.access_token, // store encrypted
    itemId: res.data.item_id,
  };
}

Every integrator ends up writing approximately this code. Plaid has deliberately kept the surface small.

Webhooks vs. polling

For transactional products, keeping the integrator's data in sync with the bank is the hard part. A naive integrator would poll /transactions/get every few minutes, which would overwhelm banks (many of which cap Plaid at a few concurrent connections) and be wasteful for dormant accounts. Plaid's answer is webhook-driven pull.

The pattern for transactions specifically is:

Plaid pulls from the bank on its own schedule (typically every few hours for most institutions; real-time via webhook where the bank supports it).
When new data arrives, Plaid fires a webhook to the integrator's registered URL — for example, SYNC_UPDATES_AVAILABLE.
The integrator responds by calling /transactions/sync with the cursor they last saw. Plaid returns added, modified, and removed transactions since that cursor.

Cursor-based pagination matters here: transaction records can mutate after posting (pending → posted, merchant name enrichment, category re-classification). Plaid exposes those mutations as modified or removed entries, which is why /transactions/sync replaced the older /transactions/get in 2022.

The screen-scraping backbone

A substantial fraction of US banks — especially regional credit unions and smaller community banks — have no open-banking API. For these institutions, Plaid ran (and still runs, for the long tail) a fleet of headless browser workers that log in on behalf of users and parse HTML.

This is not a secret. Plaid's public documentation distinguishes between "OAuth institutions" and "credential-based institutions," and the behavior differs: for credential-based, the user types their username and password into Plaid Link; for OAuth, the user is redirected to their bank's login page and returns with a grant. In the former case, Plaid's backend stores encrypted credentials because it must re-login to refresh data. In the latter, Plaid stores only refresh tokens.

Operationally, the scraping fleet is a reliability nightmare. Banks change their HTML without notice. CAPTCHAs appear. New MFA flows break adapters overnight. Plaid maintains per-institution "integration health" scores and a team whose full-time job is adapter maintenance. This is the iceberg under the clean API.

Rate limiting: three layers

Rate limits in Plaid exist at three layers:

Per-integrator quotas. Plaid assigns request-per-minute ceilings to each client_id, typically generous.
Per-institution concurrency. Banks themselves cap how many simultaneous Plaid sessions they will accept. This is the binding constraint for large integrators during peak hours.
Per-user caching. Plaid caches recent pulls and returns cached data if a fresh pull would violate an institution limit. Integrators get a freshness timestamp in the response.

A consequence: if an integrator tries to force-refresh a million users at 9:00 AM, the institution-level limits will queue the majority of the requests. Correctly architected integrators spread refreshes across the day and rely on webhooks to surface fresh data as Plaid pulls it.

                        PLAID ARCHITECTURE (SIMPLIFIED)

  ┌──────────────┐     Link token     ┌────────────────┐
  │  Integrator  │ ──────────────────>│  Plaid API     │
  │   Backend    │ <──────────────────│  (REST / gRPC) │
  └──────┬───────┘     access_token   └───┬────────┬───┘
         │                                │        │
         │ webhooks (transactions,        │        │
         │ ITEM_LOGIN_REQUIRED, etc.)     ▼        ▼
         ◄───────────────────────  ┌─────────┐ ┌────────────┐
                                   │ OAuth   │ │ Scraper    │
                                   │ adapter │ │ fleet      │
                                   │ (FDX,   │ │ (headless  │
                                   │ UK PSD2)│ │ browsers)  │
                                   └────┬────┘ └─────┬──────┘
                                        │            │
                                        ▼            ▼
                                    12,000+  financial institutions

Figure 1. Two very different backends behind one API. The integrator sees none of it.

The open banking transition

The long-term trajectory is open-banking APIs everywhere, scraping nowhere. In the UK, PSD2 and the Open Banking Implementation Entity forced most retail banks to expose standardized APIs; Plaid switched those integrations years ago. In the US, FDX (Financial Data Exchange) has been consolidating a standard, and the CFPB's Section 1033 rulemaking (finalized in 2024) formally codifies consumer data rights. Most large US banks — Chase, Bank of America, Wells Fargo, Citi, Capital One — now run production FDX-style APIs that Plaid consumes via OAuth.

But the tail is long. The US has roughly 10,000 banks and credit unions, and the smallest few thousand will not have APIs for years. Plaid's architecture has to bridge the gap indefinitely.

What the open-banking shift actually changes for integrators

From the integrator's perspective: almost nothing. The same access_token, the same /transactions/sync, the same webhooks. Plaid absorbs the difference. The change is visible in Link's UI (OAuth institutions redirect to the bank's login page; credential-based institutions collect the password in Plaid's UI) and in behavioral reliability: OAuth integrations break less often because banks are accountable for their own APIs.

Lessons for system designers

Abstraction layers are a real product. Plaid's moat is the adapter fleet and the institution relationships, not the API shape. The API shape is deliberately simple.
Tokens should match trust boundaries. Short-lived tokens for untrusted clients; long-lived tokens for servers that must be explicitly authorized. Plaid's Link/access split is a textbook implementation.
Webhook-driven pull beats polling for sync. It respects upstream rate limits, reduces integrator wasted calls, and gives you a single path for consistency.
Plan for heterogeneity. Any system that aggregates across institutions will have two classes of backend: the modern one and the legacy one. Design for both.

Updated 2026: CFPB 1033 and the scraping sunset

Updated 2026: The CFPB's Section 1033 final rule, issued in October 2024, took effect on a staggered timeline, with the largest banks required to comply first. Plaid publicly committed to moving all covered US integrations to OAuth-based open-banking flows by mid-2026. As of this update, Plaid still maintains credential-based adapters for smaller institutions outside the 1033 compliance cohort, so the dual-backend architecture described above continues to apply — just with a shrinking tail. Integrators should expect continued ITEM_LOGIN_REQUIRED volume on credential-based items; OAuth items are noticeably more stable.

Frequently asked questions

How many banks does Plaid integrate with?

Plaid has publicly reported coverage of more than 12,000 financial institutions across the United States, Canada, and Europe, combining open-banking APIs where available with screen-scraping adapters for institutions that still lack developer APIs.

What is the difference between a Link token and an access token?

A Link token is a short-lived, single-use credential used to initialize Plaid Link on the client. After the user authenticates with their bank, Plaid returns a public_token that the server exchanges for a long-lived access_token, which is then used for all subsequent data-fetching API calls.

Does Plaid use webhooks or polling?

Plaid uses webhooks to notify integrators of data changes (new transactions, balance updates, account disconnections). Polling is available for bootstrapping historical data, but for ongoing sync, webhook-driven pulls are the recommended pattern.

Is Plaid replacing screen scraping with open banking?

Partially. Where banks expose open-banking APIs (notably the UK, EU, and large US institutions participating in FDX), Plaid uses them preferentially. For banks without such APIs, Plaid still runs scrapers. The mix shifts toward open-banking APIs every year, but the long tail remains scraped.

How does Plaid rate-limit requests?

Plaid rate-limits at several layers: per-integrator API quotas, per-institution concurrency limits (a single bank may only allow N connections at a time), and per-user caching windows. The institution-level limits are the binding constraint for most large integrators.

What happens when a user changes their bank password?

The access_token becomes invalid, Plaid fires an ITEM_LOGIN_REQUIRED webhook, and the integrator must send the user back through Plaid Link in update mode to re-authenticate. The access_token itself is reused after a successful update, so application-side records remain stable.

Does Plaid store bank credentials?

For scraped integrations, Plaid historically stored user credentials in encrypted form because it needed them to re-login on behalf of the user. Under open-banking flows with OAuth, Plaid stores only refresh tokens, not credentials. Plaid's FAQ documents this distinction.

What US regulation affects Plaid architecture?

Section 1033 of the Dodd-Frank Act and CFPB rulemaking around personal financial data rights directly shape how Plaid fetches, stores, and shares data. FDX (Financial Data Exchange) standards also inform Plaid's API contracts with partner institutions.

Sources: Plaid's public API documentation, the FDX specification, CFPB Section 1033 rulemaking, and published interviews with Plaid engineering leadership. This analysis was performed independently by ML Systems Review.