Unique
Remove duplicate rows
The Unique tool enables you select one or several columns that should be unique whereby all the duplicate rows are removed.
The Unique tool keeps the first rows that is unique and discards all the following duplicates
When you select multiple columns, you create column pairs behind the scenes - almost like combining the two (or more) columns into a new column. If you dataset looks like this:
amount | text |
---|---|
100 | Hello |
200 | World |
300 | World |
200 | Hello World |
100 | hello world |
There are then three possible options and output:
- amount should be unique which outputs the first three rows
- text should be unique which outputs the first two rows and last two rows
- amount and text should be unique which outputs all five rows
Configuration
The Unique tool only has one required input.
Select unique columns
Select the column(s) you would like to be unique. You can select one or many.
The order in which you select multiple columns does not matter for the Unique tool
The Unique tool is case sensitive meaning This and this are two different and unique values
When To Use
The Unique tool in itself is relatively simple, but its use cases diverse:
- You want to remove any duplicate rows, for example duplicate companies from a CRM system
- It can be helpful to ensure that you don’t output duplicate records - both for performance and for later use of the data. Use the Unique tool before the Output tool to ensure that there are no duplicate records
- It typically a best practice to ensure that you’re joining with at least one unique table in the Combine tool - that Unique tool can help ensure that
- Use the Unique tool in combination with the Sort tool to get the maximum value for each of your rows. Sort by some relevant ID (ascending) and the value/date field (descending). Add a Unique tool afterwards and make the dataset unique by the relevant ID you used before. Now you have the largest value or most recent date for each of your rows