Python join function that works for list of list with non-string values

eye-catchPython
Sponsored links

It occurs an error when join function is called with the following values.

# TypeError: sequence item 0: expected str instance, list found

"".join([["first"], ["second"]])
"".join([["1", 2, 3], [4, "5", 6]])
"".join([[[11], [21, 22]], [[31, 32], [41, 42]]])

This article solves the problem.

Sponsored links

How join function works

I expected that join function is used for a list with a separator like in TypeScript/JavaScript.

However, its syntax is the opposite.

In TypeScript/JavaScript

["one", "two", "three"].join(",");

In Python

",".join(["one", "two", "three"]) # one,two,three

Let’s check the behavior first with some examples.

print("--- ".join("a"))     # a
print("--- ".join("abc"))   # a--- b--- c
print("--- ".join(["a", "b", "c"]))     # a--- b--- c
print("--- ".join(["ab", "cd", "ef"]))  # ab--- cd--- ef

The string on the left side is used as a separator. If the right side is a string, it is split into a single character.

Sponsored links

Error cases that join cannot handle

List of list sequence item 0: expected str instance, list found

If the list consists of lists, an error occurs.

try:
    print("--- ".join([["first"], ["second"]]))
except BaseException as e:
    # sequence item 0: expected str instance, list found
    print(e)

List containinig non-string value (mixed data type)

If the list contains non-string values, which are int here, Pylance shows an error and it can’t be executed.

# Expression value is unused Pylance(reportUnusedExpression)
# "join" is not defined Pylance(reportUndefinedVariable)
",",join([1,2,3])

# Argument of type "list[str | int]" cannot be assigned to parameter "__iterable" of type "Iterable[str]" in function "join"
#   "Literal[2]" is incompatible with "str" Pylance(reportGeneralTypeIssues)
",",join(["1",2,3])

Solutions

I will try to define several functions. To check the behavior, I use the following dataset. The expected output is written as a comment.

TEST_DATASET = [
    [1, 2, 3],              # "123"
    ["1", "2", "3"],        # "123"
    ["1", 2, 3],            # "123"
    [["first"], ["second"]],# firstsecond
    [["1", "2"], ["3"]],    # "123"
    [[1, 2, 3], [4, 5, 6]], # "123456"
    [["1", 2, 3], [4, "5", 6]], # "123456"
    [[["fir"], ["st"]], [["se", "co"], ["nd"]]], # firstsecond
    [[[11], [21, 22]], [[31, 32], [41, 42]]],    # "11212231324142"
]

The function to run the test is the following.

def run_test(callback, values):
    try:
        intermediate, result = callback(values)
        print(f"intermediate: {intermediate}")
        print(f"RESULT: {values} -> {result}")
    except Exception as err:
        print(f"ERROR: {values}, {format(err)}")
    finally:
        print()

Solution 1 for-in loop only for string list

The first solution is to use a for-in loop.

def solution1(values):
    intermediate = ["".join(element) for element in values]
    result = "".join(intermediate)
    return intermediate, result


[run_test(solution1, values) for values in TEST_DATASET]
# ERROR: [1, 2, 3], can only join an iterable       

# intermediate: ['1', '2', '3']
# RESULT: ['1', '2', '3'] -> 123

# ERROR: ['1', 2, 3], can only join an iterable     

# intermediate: ['first', 'second']
# RESULT: [['first'], ['second']] -> firstsecond    

# intermediate: ['12', '3']
# RESULT: [['1', '2'], ['3']] -> 123

# ERROR: [[1, 2, 3], [4, 5, 6]], sequence item 0: expected str instance, int found

# ERROR: [['1', 2, 3], [4, '5', 6]], sequence item 1: expected str instance, int found

# ERROR: [[['fir'], ['st']], [['se', 'co'], ['nd']]], sequence item 0: expected str instance, list found

# ERROR: [[[11], [21, 22]], [[31, 32], [41, 42]]], sequence item 0: expected str instance, list found

If the list contains only string values, this solution works well.

Solution 2 cast to str in for-in for mixed data type

If you need to process a list that has mixed data types, you need to cast the value to string.

def solution2(values):
    intermediate = ["".join(str(element)) for element in values]
    result = "".join(intermediate)
    return intermediate, result


[run_test(solution2, values) for values in TEST_DATASET]
# intermediate: ['1', '2', '3']
# RESULT: [1, 2, 3] -> 123

# intermediate: ['1', '2', '3']
# RESULT: ['1', '2', '3'] -> 123

# intermediate: ['1', '2', '3']
# RESULT: ['1', 2, 3] -> 123

# intermediate: ["['first']", "['second']"]
# RESULT: [['first'], ['second']] -> ['first']['second']

# intermediate: ["['1', '2']", "['3']"]
# RESULT: [['1', '2'], ['3']] -> ['1', '2']['3']

# intermediate: ['[1, 2, 3]', '[4, 5, 6]']
# RESULT: [[1, 2, 3], [4, 5, 6]] -> [1, 2, 3][4, 5, 6]

# intermediate: ["['1', 2, 3]", "[4, '5', 6]"]
# RESULT: [['1', 2, 3], [4, '5', 6]] -> ['1', 2, 3][4, '5', 6]

# intermediate: ["[['fir'], ['st']]", "[['se', 'co'], ['nd']]"]
# RESULT: [[['fir'], ['st']], [['se', 'co'], ['nd']]] -> [['fir'], ['st']][['se', 'co'], ['nd']]

# intermediate: ['[[11], [21, 22]]', '[[31, 32], [41, 42]]']
# RESULT: [[[11], [21, 22]], [[31, 32], [41, 42]]] -> [[11], [21, 22]][[31, 32], [41, 42]]

This solution works for a list that has int values, mixed data types, and string values. But not for a list of list.

Solution 3 using map with cast

def solution3(values):
    intermediate = ["".join(map(str, element)) for element in values]
    result = "".join(intermediate)
    return intermediate, result


[run_test(solution3, values) for values in TEST_DATASET]
# ERROR: [1, 2, 3], 'int' object is not iterable

# intermediate: ['1', '2', '3']
# RESULT: ['1', '2', '3'] -> 123

# ERROR: ['1', 2, 3], 'int' object is not iterable

# intermediate: ['first', 'second']
# RESULT: [['first'], ['second']] -> firstsecond

# intermediate: ['12', '3']
# RESULT: [['1', '2'], ['3']] -> 123

# intermediate: ['123', '456']
# RESULT: [[1, 2, 3], [4, 5, 6]] -> 123456

# intermediate: ['123', '456']
# RESULT: [['1', 2, 3], [4, '5', 6]] -> 123456

# intermediate: ["['fir']['st']", "['se', 'co']['nd']"]
# RESULT: [[['fir'], ['st']], [['se', 'co'], ['nd']]] -> ['fir']['st']['se', 'co']['nd']

# intermediate: ['[11][21, 22]', '[31, 32][41, 42]']
# RESULT: [[[11], [21, 22]], [[31, 32], [41, 42]]] -> [11][21, 22][31, 32][41, 42]

This solution solves a couple of problems that solution 2 can’t solve but doesn’t work for a list including int values.

Solution 4 flatten the list first (perfect solution)

If a list has another list, casting to str results in something like "[1, 2, 3]". To solve this problem, what we can do is to flatten the list before processing.

def flat(element) -> list:
    has_list = any([isinstance(x, list) for x in element])
    if not has_list:
        return element

    flatten_list = []
    for x in element:
        if isinstance(x, list):
            val = flat(x)
            flatten_list.extend(val)
        else:
            flatten_list.append(x)

    return flatten_list


[print(flat(values)) for values in TEST_DATASET]
# [1, 2, 3]
# ['1', '2', '3']
# ['1', 2, 3]
# ['first', 'second']
# ['1', '2', '3']
# [1, 2, 3, 4, 5, 6]
# ['1', 2, 3, 4, '5', 6]
# ['fir', 'st', 'se', 'co', 'nd']
# [11, 21, 22, 31, 32, 41, 42]

Check the following article if you want to know other ways to flatten list.

Python Three ways to flatten a list
There seems not to be a built-in function to flatten a list in Python. So I implemented it.List containing lists th...

All lists are flattened as expected. If we use this function, join function works perfectly.

def solution4(values):
    flatten_list = flat(values)

    intermediate = ["".join(str(element)) for element in flatten_list]
    result = "".join(intermediate)
    return intermediate, result


[run_test(solution4, values) for values in TEST_DATASET]
# intermediate: ['1', '2', '3']
# RESULT: [1, 2, 3] -> 123

# intermediate: ['1', '2', '3']
# RESULT: ['1', '2', '3'] -> 123

# intermediate: ['1', '2', '3']
# RESULT: ['1', 2, 3] -> 123

# intermediate: ['first', 'second']
# RESULT: [['first'], ['second']] -> firstsecond

# intermediate: ['1', '2', '3']
# RESULT: [['1', '2'], ['3']] -> 123

# intermediate: ['1', '2', '3', '4', '5', '6']
# RESULT: [[1, 2, 3], [4, 5, 6]] -> 123456

# intermediate: ['1', '2', '3', '4', '5', '6']
# RESULT: [['1', 2, 3], [4, '5', 6]] -> 123456

# intermediate: ['fir', 'st', 'se', 'co', 'nd']
# RESULT: [[['fir'], ['st']], [['se', 'co'], ['nd']]] -> firstsecond

# intermediate: ['11', '21', '22', '31', '32', '41', '42']
# RESULT: [[[11], [21, 22]], [[31, 32], [41, 42]]] -> 11212231324142

Overview

  • String list
  • List containing string list
"".join(["".join(element) for element in values])

# Available for the following lists
# ['1', '2', '3'] -> 123
# [['first'], ['second']] -> firstsecond
# [['1', '2'], ['3']] -> 123
  • List containing non-string data type
  • NG for list containing list
"".join(["".join(str(element)) for element in values])

# Available for the following lists
# [1, 2, 3] -> 123
# ['1', '2', '3'] -> 123
# ['1', 2, 3] -> 123
  • List containing list with non-string data type
  • NG for a flat list with non-string data type
"".join(["".join(map(str, element)) for element in values])

# ['1', '2', '3'] -> 123
# [['first'], ['second']] -> firstsecond
# [['1', '2'], ['3']] -> 123
# [[1, 2, 3], [4, 5, 6]] -> 123456
# [['1', 2, 3], [4, '5', 6]] -> 123456
  • Can be used for all list types
"".join(["".join(str(element)) for element in flat(values)])
# [1, 2, 3] -> 123
# ['1', '2', '3'] -> 123
# ['1', 2, 3] -> 123
# [['first'], ['second']] -> firstsecond
# [['1', '2'], ['3']] -> 123
# [[1, 2, 3], [4, 5, 6]] -> 123456
# [['1', 2, 3], [4, '5', 6]] -> 123456
# [[['fir'], ['st']], [['se', 'co'], ['nd']]] -> firstsecond
# [[[11], [21, 22]], [[31, 32], [41, 42]]] -> 11212231324142

Comments

Copied title and URL