8 ways to remove duplicates from Array in TypeScript

eye-catchJavaScript/TypeScript
Sponsored links

How can we code when we want to get unique values from an array? Which one is the fastest? I will show the performance result at the end.

Sponsored links

Distinct an array of primitive values

The arrays that we will use are the following. Number and string array. We need to handle an object array differently, so let’s start with a simple one.

const primitiveNumbers = [1, 2, 3, 2, 4, 5, 6, 3, 1];
// unique values -> [ 1, 2, 3, 4, 5, 6 ]
const primitiveStrings = [
    "aaa", // duplicated 1
    "bbb",
    "ddd", // duplicated 2
    "aaa", // duplicated 1
    "eee",
    "ccc",
    "ddd", // duplicated 2
];
// unique values -> [ 'aaa', 'bbb', 'ddd', 'eee', 'ccc' ]

We need to execute several functions to see if those results are the same. I prepare the generic function to make it easy.

type UniqFunc<T> = (array: T[]) => T[];
function test<T>(array: T[], funcs: UniqFunc<T>[]) {
    for (const func of funcs) {
        const start = performance.now();
        const result = func(array);
        const elapsedTime = performance.now() - start;
        console.log(`${func.name}, time: ${Math.round(elapsedTime)}`);
        console.log(result);
    }
}

const funcsForPrimitive = [
    uniqByForOf,
    uniqByForEach,
    uniqByReduce,
    uniqByFilter,
    uniqBySetWithArrayFrom,
    uniqBySetWithSpread,
];
test(primitiveNumbers, funcsForPrimitive);
test(primitiveStrings, funcsForPrimitive);

All functions are defined as UniqFunc type that requires array variable. We can specify anything as far as it’s an array but I will use only number/string/object in this post.
test function measure how long it takes to complete the function’s work. funcsForPrimitive stores functions that I will show in this post.

Comparing values one by one

The first idea is to loop the array and check whether the item has already been shown or not. Add the value if it has not been in result variable.

function uniqByObject<T>(array: T[]) {
    const result: T[] = [];
    for (const item of array) {
        if (!result.includes(item)) {
            result.push(item);
        }
    }
    return result;
}

If you prefer forEach you can write it like this below.

function uniqByForEach<T>(array: T[]) {
    const result: T[] = [];
    array.forEach((item) => {
        if (!result.includes(item)) {
            result.push(item);
        }
    })
    return result;
}

Using Array.prototype.reduce

There are other ways if you don’t want to define a middle-man that stores results.

function uniqByReduce<T>(array: T[]): T[] {
    return array.reduce((acc: T[], cur: T) => {
        if (!acc.includes(cur)) {
            acc.push(cur);
        }
        return acc;
    }, [])
}

The first arg acc of reduce function callback is an accumulator that will be the return value. The second arg cur is the current value. What this does is basically the same as previous examples.

Using Array.prototype.filer

It can be written with one line if using the filter function.

function uniqByFilter<T>(array: T[]) {
    return array.filter((value, index) => array.indexOf(value) === index);
}

The behavior of indexOf is following.

The indexOf() method returns the first index at which a given element can be found in the array, or -1 if it is not present.

This table shows the relationships.

indexvalueindexOfresult
010true
121true
232true
321false
444true
555true
666true
732false
810false

Since indexOf function returns the first index the result of indexOf is different from the current index of the array.

Using Map

Map offers key-value object. If it receives the same key name it updates the value. It means that we don’t have to check the values in there if using the value as a key.

function uniqByMap<T>(array: T[]): T[] {
    const map = new Map();
    for (const item of array) {
        map.set(item, item);
    }
    return Array.from(map.values());
}

map.values() returns IterableIterator type but not array. It should be converted to Array type. That’s why Array.from() is used here.

Using Set object

I think it’s the easiest way to do it. What we have to do is just put the array into the constructor. Set object stores unique values. It removes the duplicates.

function uniqBySetWithArrayFrom<T>(array: T[]): T[] {
    return Array.from(new Set(array));
}
function uniqBySetWithSpread<T>(array: T[]): T[] {
    return [...new Set(array)];
}

Unfortunately, it can’t be used for an object because it refers to a reference but not the values in the object.

Sponsored links

Distinct an array of key-value object

The examples that I showed above can’t be used for an object. An object isn’t as simple as a primitive value because it refers to a reference. The following two objects have the same values but the result is false.

const obj1 = {foo: 1, hoge: 2};
const obj2 = {foo: 1, hoge: 2};
console.log(obj1 === obj2);
// false

If the same reference is assigned to another variable the result becomes true.

const obj1 = {foo: 1, hoge: 2};
const obj3 = obj1
console.log(obj1 === obj3);
// true

The objects are defined somewhere in the address space. The reference value of the object is the address. obj1 and obj2 are respectively stored in different addresses obj1 === obj2 becomes false.

To compare objects we need to check the values one by one. We will use the following object.

const objects = [
    { name: "aaa", price: 50 },
    { name: "bbb", price: 30 }, // duplicated 1
    { name: "ccc", price: 80 }, // duplicated 2
    { name: "aaa", price: 15 },
    { name: "bbb", price: 30 }, // duplicated 1
    { name: "ccc", price: 80 }, // duplicated 2
];
// Unique values
// [
//   { name: 'aaa', price: 50 },
//   { name: 'bbb', price: 30 },
//   { name: 'ccc', price: 80 },
//   { name: 'aaa', price: 15 }
// ]

Using lodash.isEqual

The way for object is basically the same as the one for primitive values. Loop the array and check the value one by one. The difference is to use lodash.isEqual in order to compare all primitive values in the object.

function uniqForObject<T>(array: T[]): T[] {
    const result: T[] = [];
    for (const item of array) {
        const found = result.some((value) => isEqual(value, item));
        if (!found) {
            result.push(item);
        }
    }
    return result;
}

This function can be used for primitive values as well.

Performance comparison

Let’s compare the functions’ performance! I used Node.js version 14.13.0 for the experiment. This is the code for it.

type UniqFunc<T> = (array: T[]) => T[];
function test<T>(array: T[], funcs: UniqFunc<T>[], loopCount = 1) {
    const times: number[] = [];
    for (const func of funcs) {
        const start = performance.now();
        for (let i = 0; i < loopCount; i++) {
            const result = func(array);
        }
        const elapsedTime = performance.now() - start;
        times.push(Math.round(elapsedTime));
    }
    return times;
}

function generatePrimitiveArray(length: number, range: number): number[] {
    const result = [];
    for (let i = 0; i < length; i++) {
        const value = Math.floor(Math.random() * range);
        result.push(value);
    }
    return result;
}

function measure() {
    const settings = [
        { arrayLen: 10, loopRangeS: 10000, loopRangeE: 200000, step: 5000 },
        { arrayLen: 10, loopRangeS: 100000, loopRangeE: 500000, step: 50000 },
        { arrayLen: 100, loopRangeS: 100000, loopRangeE: 500000, step: 50000 },
        { arrayLen: 1000, loopRangeS: 100000, loopRangeE: 500000, step: 50000 },
    ];
    for (const setting of settings) {
        console.log(`length: ${setting.arrayLen}`);
        const array = generatePrimitiveArray(setting.arrayLen, 10);

        const titles = funcsForPrimitive.map(x => x.name).join(",");
        console.log(`loop count,${titles}`);
        for (let i = setting.loopRangeS; i <= setting.loopRangeE; i += setting.step) {
            const times = test(array, funcsForPrimitive, i);
            const data = times.join(", ");
            console.log(`${i}, ${data}`);
        }
        console.log();
    }
}

measure();

The value range is set to 0 – 9. The same array is used in the loop for all functions to compare the performance precisely. If you want to try it yourself you can clone my repository or just copy and paste the code. You can change the range if you change the second argument value of generatePrimitiveArray function.

BlogPost/distinct.ts at master · yuto-yuto/BlogPost
Contribute to yuto-yuto/BlogPost development by creating an account on GitHub.

I tried the following code in order to make it faster.

async function testAsync<T>(array: T[], funcs: UniqFunc<T>[], loopCount = 1) {
    const promises = funcs.map((func) => execute(array, func, loopCount));
    const times = await Promise.all(promises);
    return times.map(x => Math.round(x));
}

async function execute<T>(array: T[], func: UniqFunc<T>, loopCount: number) {
    const start = performance.now();
    for (let i = 0; i < loopCount; i++) {
        const result = func(array);
    }
    return performance.now() - start;
}

However, it didn’t improve it because promise works on a single thread. It can improve the performance for I/O-related work for example but not for CPU-intensive work. To improve the performance for CPU-intensive work we need another thread. We can make it faster by worker thread but I don’t do it here.

You might want to check this post if you are not familiar with Promise.

Array Length 10

loop countForOfForEachReduceFilterMapSetWith
ArrayFrom
SetWith
Spread
ForObject
1000004025243274375992
1500004745485488564957
2000003635273982676875
250000404030391078593114
30000058504555123104102120
35000064453955149129124138
40000073665967171151140164
45000075686582189192155176
50000083777794191196172190
Array Length 10

The vertical line shows time (ms). uniqByReduce is top and second is uniqByForEach.

Array Length 10 loop 10000 to 200000

The result for the loop count of less than 200000 looks not good. I tried it again with a smaller loop count increment.

loop countForOfForEachReduceFilterMapSetWith
ArrayFrom
SetWith
Spread
ForObject
1000091710599414
1500022227778
200003235137712
2500044351191213
30000765513111015
35000656814121218
40000769827151320
450001086719161929
50000118111020171725
5500087131525201831
60000141191024212039
65000129101127232238
700001210131327292441
750001511101529272440
800001311141432272641
850001612121740312948
900001817131635312954
950001814151838323249
1000001617161743353358
1050001916151744383569
1100001815161845383765
1150002016182346403860
1200002218162247434072
1250002323172048454769
1300002320232357454377
1350002219192655474477
1400002519192255504695
1450002521242461524876
1500002720213566515082
1550003022252564545395
16000032222826695652102
16500028252735715862109
1700003325252773595697
1750002926402970645791
18000032262638756459113
18500035272830816563119
19000031252635766664122
19500036323239776963105
20000034273039837373133
Array Length 10

The difference is small. Let’s see the result with the log version.

Top is uniqByForEach but 2nd to 4th are similar. They are uniqByReduce, uniqByForOf and uniqByFilter.

Array Length 100

loop countForOfForEachReduceFilterMapSetWith
ArrayFrom
SetWith
Spread
ForObject
100000258219165256197205200838
1500002712272444192903032861235
2000003623053145633973883881553
2500004443913986804804854942095
3000005224544948205516005742418
3500006135405679386446646742770
40000071464162810737347877723291
45000079767675812078558768603586
50000084976881713199019659834085
Array Length 100

uniqForObject is too slow to compare. Top is uniqForEach and second is uniqByReduce.

Array Length 1000

loop countForOfForEachReduceFilterMapSetWith
ArrayFrom
SetWith
Spread
1000002772280627833552205019911985
1500004190402237855137311829642968
2000005423518049886853413939573945
2500006874651464398829519549444939
30000080467905782210299641459405916
35000096199038885812164733769606907
40000011228105401005113789866479177897
45000012106118521140315548941488878894
500000136231350912710171541040498939902
Array Length 1000

Wow, uniqBySetWithSpread or uniqBySetWithArrayFrom is top here! There are almost the same. Next is uniqByMap

Conclusion

We learned many ways to remove duplicated values from an array. We need to use a library such as lodash to compare two objects deeply but we can easily implement primitive values without a library.

uniqForEach, uniqByReduce, uniqByFilter or uniqByForOf is good enough for most cases. Using Set object is the best if you need to work with a big array and need to consider the performance.

Comments

Copied title and URL