MERGE allows users to perform multiple data modifications (INSERT, UPDATE, DELETE) within a single transaction. MERGE is useful for common database tasks like removing duplicates, syncing data between tables (upsert), and cleaning out old records.
MERGE internally performs a join between a target table and a data source, and then for each matching or non-matching row, you can decide whether to add it, update it, or delete it from the target table.
All rows in the join result fall into one of three categories: MATCHED, NOT MATCHED BY SOURCE, or NOT MATCHED [BY TARGET], meaning the row was either found in both target and source datasets, only in the target dataset, or only in the source dataset. In branches for the MATCHED category, columns from both target and source datasets will be available for use in expressions and conditions. Given a branch in one of the NOT MATCHED categories, only columns from the dataset where the row was found will be available.
There can be multiple branches defined for rows within each category. However at most one branch’s action will be executed per row, similar to a CASE statement. WHEN clauses are evaluated in order within each category, and the chosen action for each row will come from whichever branch condition matches first (or from the first branch without any condition). If none of the conditions match, no action is performed for that row.
When DO NOTHING is specified as the merge action, and the branch condition is met, the joined row is explicitly skipped. Since actions are evaluated in their specified order, DO NOTHING can be handy to skip non-interesting source or target rows before more fine-grained handling.
Any aggregated indexes on the target table will be updated in the same transaction.
Insert On Conflict
Upsert functionality is also available directly via theINSERT statement. The optional ON CONFLICT clause provides a way to reconcile or merge new data with existing records by either skipping or overwriting rows that already exist. However, it is available only for a narrow subset of INSERT statements. See limitations here.
Permissions
To have permission to runMERGE, you must have the respective INSERT, UPDATE, or DELETE permissions on the target table. You must also have SELECT permissions on the data source.
Syntax
Parameters
| Parameter | Description |
|---|---|
<target_table_name> | The table to apply changes to. |
<target_alias> | Optional alias for the target table. |
<data_source> | A table, subquery, or table-valued function (e.g., read_parquet). |
<source_alias> | Optional alias for the data source. Table-valued functions will default to their own name as their alias. |
<join_condition> | Condition to match source and target rows. |
<when_clause> | A branch to match rows against, with an optional condition and an action (UPDATE, DELETE, INSERT, DO NOTHING) to perform. Multiple clauses can repeat the same match category or same action. |
MATCHED ... [AND <condition>] | Extra conditions per when_clause are optional. To be considered met, they must evaluate to TRUE rather than FALSE or NULL. An unconditional when_clause must not precede a conditional one of the same category, as the latter would be unreachable. |
<settings> | List of query-specific settings overrides. |
Examples
Example 1: ETL Synchronization
When keeping a production table in sync with a staging table, it may be natural toINSERT brand new entries, UPDATE rows for which entries already existed, and DELETE stale entries that are no longer in the source.
UPSERT (INSERT UPDATE)
INSERT IGNORE
If you would like to preserve pre-existing entries rather than overwriting them, replace theUPDATE clause with a DO NOTHING clause, or omit the WHEN MATCHED branch entirely for the same final effect. This would mean INSERT brand new entries and do nothing for rows for which entries already existed.
Example 2: Data Deduplication
The data source can be any subquery, not just a static table. In this example, the subquery is identifyingDISTINCT rows from the table raw_table, in order to submit deduplicated results to the destination deduped_table.
USING examples with subqueries:
Example 3: Conditional Data Migration
You may want to define subsets of the source data to be treated differently during migration, depending on some subset of attributes. Use the attributes to define if-then-else conditions, and then define different insertion behavior for each subset. In this example, entries withattr in range (10, 100] should not be inserted. Using the DO NOTHING clause makes this contract clear, even though the conditions can be rewritten to omit the DO NOTHING branch by being more restrictive. Note also the ‘catch-all’ clause at the end which does not have a condition. This is generally not necessary - all clauses could have conditions, and not all rows must be caught by one of the branches.
Example 4: Merge with External Files (or any TVF)
The data source can be any Table Valued Function (TVF), not just a subquery or static table. This is particularly useful for merging with external files, though you can also reference an external table object as the source (e.g.USING my_external_table). Note that without defining an alias, the data source can be referenced later in the query by using the name of the TVF (e.g. read_parquet).
Example 5: Wildcard Syntax
The clausesINSERT * and UPDATE SET * can be used to ingest or overwrite rows without any data transformation. Note that these replacements are allowed only when the source and target tables have precisely matching column names.
This example also shows the strategy to avoid redundant UPDATE’s. Since comparisons between source and target columns are only evaluated on explicitly listed join or action clause conditionals, a full row re-write will still occur even if the other unnamed values haven’t changed. Utilize target.col <> source.col conditionals to catch these cases.
Limitations
WITHclause is not supported, for specifying one or more subqueries (CTE) that can be referenced from inside theMERGEquery.