Making a flat list out of list of lists in Python

Question

I wonder whether there is a shortcut to make a simple list out of list of lists in Python.

I can do that in a for loop, but maybe there is some cool "one-liner"? I tried it with reduce, but I get an error.

Code

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
reduce(lambda x, y: x.extend(y), l)

Error message

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'extend'

Duplicate: stackoverflow.com/questions/406121/…, stackoverflow.com/questions/457215/…, stackoverflow.com/questions/120886/… — S.Lott, Jun 4 '09 at 20:39
There's an in-depth discussion of this here: rightfootin.blogspot.com/2006/09/more-on-python-flatten.html‌, discussing several methods of flattening arbitrarily nested lists of lists. An interesting read! — RichieHindle, Jun 4 '09 at 20:41
Some other answers are better but the reason yours fails is that the 'extend' method always returns None. For a list with length 2, it will work but return None. For a longer list, it will consume the first 2 args, which returns None. It then continues with None.extend(<third arg>), which causes this erro — mehtunguh, Jun 11 '13 at 21:48
agree with @PeerStritzinger and Izkata, this is the right place to give/find the answer to this question. It bothers me because I have another solution to provide... — Juh_, Dec 6 '13 at 15:11
@shawn-chin solution is the more pythonic here, but if you need to preserve the sequence type, say you have a tuple of tuples rather than a list of lists, then you should use reduce(operator.concat, tuple_of_tuples). Using operator.concat with tuples seems to perform faster than chain.from_iterables with list. — Meitham, Oct 6 '14 at 21:46

Guillaume Jacquenot · Accepted Answer · 2016-11-28 08:52:37Z

up vote 1760 down vote accepted

[item for sublist in l for item in sublist]

is faster than the shortcuts posted so far. (l is the list to flatten.)

Here is a the corresponding function:

flatten = lambda l: [item for sublist in l for item in sublist]

For evidence, as always, you can use the timeit module in the standard library:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the shortcuts based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So (for simplicity and without actual loss of generality) say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

edited Nov 28 '16 at 8:52

Guillaume Jacquenot

2,29921331

answered Jun 4 '09 at 20:37

Alex Martelli

496k918911170

248

I tried a test with the same data, using itertools.chain.from_iterable : $ python -mtimeit -s'from itertools import chain; l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'list(chain.from_iterable(l))'. It runs a bit more than twice as fast as the nested list comprehension that's the fastest of the alternatives shown here. – intuited Oct 15 '10 at 1:21

158

I found the syntax hard to understand until I realized you can think of it exactly like nested for loops. for sublist in l: for item in sublist: yield item – Rob Crowell Jul 27 '11 at 16:43

13

@BorisChervenkov: Notice that I wrapped the call in list() to realize the iterator into a list. – intuited May 20 '12 at 22:56

71

[leaf for tree in forest for leaf in tree] might be easier to comprehend and apply. – John Mee Aug 29 '13 at 1:38

23

@Joel, actually nowadays list(itertools.chain.from_iterable(l)) is best -- as noticed in other comments and Shawn's answer. – Alex Martelli Jan 4 '15 at 15:40

| show 19 more comments

Shawn Chin · Answer 2 · 2013-03-12 17:15:50Z

up vote 738 down vote

You can use itertools.chain():

>>> import itertools
>>> list2d = [[1,2,3],[4,5,6], [7], [8,9]]
>>> merged = list(itertools.chain(*list2d))

or, on Python >=2.6, use itertools.chain.from_iterable() which doesn't require unpacking the list:

>>> import itertools
>>> list2d = [[1,2,3],[4,5,6], [7], [8,9]]
>>> merged = list(itertools.chain.from_iterable(list2d))

This approach is arguably more readable than [item for sublist in l for item in sublist] and appears to be faster too:

[me@home]$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99;import itertools' 'list(itertools.chain.from_iterable(l))'
10000 loops, best of 3: 24.2 usec per loop
[me@home]$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 45.2 usec per loop
[me@home]$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 488 usec per loop
[me@home]$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 522 usec per loop
[me@home]$ python --version
Python 2.7.3

edited Mar 12 '13 at 17:15

answered Jun 4 '09 at 21:06

Shawn Chin

45.3k11100148

6

And if someone wants to know why other solutions based on + operator (concatenation) are inefficient, see How not to Flatten a List of Lists – Mathieu Larose Jun 26 '13 at 18:26

6

@ShawnChin BTW, piece of hardware you had when answering this question, my current workstation is half as fast and is been 4 years. – Manuel Gutierrez Sep 24 '13 at 15:18

@alexandre see docs.python.org/2/tutorial/… – Shawn Chin Jul 23 '14 at 20:15

3

The * is the tricky thing that makes chain less straightforward than the list comprehension. You have to know that chain only joins together the iterables passed as parameters, and the * causes the top-level list to be expanded into parameters, so chain joins together all those iterables, but doesn't descend further. I think this makes the comprehension more readable than the use of chain in this case. – Tim Dierks Sep 3 '14 at 14:13

6

@TimDierks: I'm not sure "this requires you to understand Python syntax" is an argument against using a given technique in Python. Sure, complex usage could confuse, but the "splat" operator is generally useful in many circumstances, and this isn't using it in a particularly obscure way; rejecting all language features that aren't necessarily obvious to beginning users means you're tying one hand behind your back. May as well throw out list comprehensions too while you're at it; users from other backgrounds would find a for loop that repeatedly appends more obvious. – ShadowRanger Nov 12 '15 at 20:26

| show 2 more comments

Triptych · Answer 3 · 2009-06-04 20:42:15Z

up vote 407 down vote

>>> sum(l, [])
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Note that only works on lists of lists. For lists of lists of lists, you'll need another solution.

edited Jun 4 '09 at 20:42

answered Jun 4 '09 at 20:35

Triptych

117k25122154

65

What a cleverly implicit use of the overloaded (+) operator! – Nick Retallack Jun 4 '09 at 20:39

45

that's pretty neat and clever but I wouldn't use it because it's confusing to read. – andrewrk Jun 15 '10 at 18:55

6

@curious: This just sums the elements of iterable passed in the first argument, treating second argument as the initial value of the sum (if not given, 0 is used instead and this case will give you an error). Because you are summing nested lists, you actually get [1,3]+[2,4] as a result of sum([[1,3],[2,4]],[]), which is equal to [1,3,2,4]. – Tadeck Mar 23 '12 at 18:21

49

This is a Shlemiel the painter's algorithm joelonsoftware.com/articles/fog0000000319.html -- unnecessarily inefficient as well as unnecessarily ugly. – Mike Graham Apr 25 '12 at 18:24

10

The append operation on lists forms a Monoid, which is one of the most convenient abstractions for thinking of a + operation in a general sense (not limited to numbers only). So this answer deserves a +1 from me for (correct) treatment of lists as a monoid. The performance is concerning though... – ulidtko Dec 3 '14 at 10:35

| show 8 more comments

2 revs · Answer 4 · 2009-06-04 21:07:45Z

up vote 59 down vote

@Nadia: You have to use much longer lists. Then you see the difference quite strikingly! My results for len(l) = 1600

A took 14.323 ms
B took 13.437 ms
C took 1.135 ms

where:

A = reduce(lambda x,y: x+y,l)
B = sum(l, [])
C = [item for sublist in l for item in sublist]

edited Jun 4 '09 at 21:07

community wiki

2 revs
jacob

1

It only gets worse and worse, as the algorithmic complexity of A and B are different (worse) than that of C. – Mike Graham Apr 25 '12 at 18:27

add a comment |

agf · Answer 5 · 2011-09-24 10:14:14Z

up vote 49 down vote

>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(lambda x,y: x+y,l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

The extend() method in your example modifies x instead of returning a useful value (which reduce() expects).

A faster way to do the reduce version would be

>>> import operator
>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(operator.add, l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

edited Sep 24 '11 at 10:14

agf

77.7k21174183

answered Jun 4 '09 at 20:35

Greg Hewgill

531k1118991060

3

reduce(operator.add, l) would be the correct way to do the reduce version. Built-ins are faster than lambdas. – agf Sep 24 '11 at 10:04

1

@agf here is how: *

timeit.timeit('reduce(operator.add, l)', 'import operator; l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]', number=10000)

0.017956018447875977 *

timeit.timeit('reduce(lambda x, y: x+y, l)', 'import operator; l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]', number=10000)

0.025218963623046875 – lukmdo Mar 20 '12 at 22:13

6

This is a Shlemiel the painter's algorithm joelonsoftware.com/articles/fog0000000319.html – Mike Graham Apr 25 '12 at 18:26

2

this can use only for integers. But what if list contains string? – Freddy Sep 11 '15 at 7:16

2

@Freddy: The operator.add function works equally well for both lists of integers and lists of strings. – Greg Hewgill Sep 11 '15 at 7:38

add a comment |

devsaw · Answer 6 · 2013-12-23 09:14:58Z

I take my statement back. sum is not the winner. Although it is faster when the list is small. But the performance degrades significantly with larger lists.

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10000'
    ).timeit(100)
2.0440959930419922

The sum version is still running for more than a minute and it hasn't done processing yet!

For medium lists:

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
20.126545906066895
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
22.242258071899414
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
16.449732065200806

Using small lists and timeit: number=1000000

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
2.4598159790039062
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.5289170742034912
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.0598428249359131

for a truly miniscule list, e.g. one with 3 sublists, maybe -- but since sum's performance goes with O(N**2) while the list comprehension's goes with O(N), just growing the input list a little will reverse things -- indeed the LC will be "infinitely faster" than sum at the limit as N grows. I was responsible for designing sum and doing its first implementation in the Python runtime, and I still wish I had found a way to effectively restrict it to summing numbers (what it's really good at) and block the "attractive nuisance" it offers to people who want to "sum" lists;-). — Alex Martelli, Jun 4 '09 at 21:07

MERose · Answer 7 · 2014-12-31 12:07:09Z

up vote 21 down vote

Why do you use extend?

reduce(lambda x, y: x+y, l)

This should work fine.

edited Dec 31 '14 at 12:07

MERose

1,16321332

answered Jun 4 '09 at 20:38

Andrea Ambu

12k114170

5

This probably creates many, many, intermediate lists. – Reut Sharabani May 23 '16 at 19:33

for python3 from functools import reduce – andorov Jan 19 at 18:15

add a comment |

Igor Krivokon · Answer 8 · 2009-06-04 20:47:13Z

up vote 9 down vote

The reason your function didn't work: the extend extends array in-place and doesn't return it. You can still return x from lambda, using some trick:

reduce(lambda x,y: x.extend(y) or x, l)

Note: extend is more efficient than + on lists.

answered Jun 4 '09 at 20:47

Igor Krivokon

8,22012338

4

extend is better used as newlist = [], extend = newlist.extend, for sublist in l: extend(l) as it avoids the (rather large) overhead of the lambda, the attribute lookup on x, and the or. – agf Sep 24 '11 at 10:12

add a comment |

Meitham · Answer 9 · 2016-09-14 15:09:16Z

There seems to be a confusion with operator.add! When you add two lists together, the correct term for that is concat, not add. operator.concat is what you need to use.

If you're thinking functional, it is as easy as this::

>>> list2d = ((1,2,3),(4,5,6), (7,), (8,9))
>>> reduce(operator.concat, list2d)
(1, 2, 3, 4, 5, 6, 7, 8, 9)

You see reduce respects the sequence type, so when you supply a tuple, you get back a tuple. let's try with a list::

>>> list2d = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(operator.concat, list2d)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Aha, you get back a list.

How about performance::

>>> list2d = [[1,2,3],[4,5,6], [7], [8,9]]
>>> %timeit list(itertools.chain.from_iterable(list2d))
1000000 loops, best of 3: 1.36 µs per loop

from_iterable is pretty fast! But it's no comparison to reduce with concat.

>>> list2d = ((1,2,3),(4,5,6), (7,), (8,9))
>>> %timeit reduce(operator.concat, list2d)
1000000 loops, best of 3: 492 ns per loop

mdh · Answer 10 · 2016-11-02 09:59:40Z

up vote 4 down vote

One can also use NumPy's flat:

import numpy as np
list(np.array(l).flat)

Edit 11/02/2016: Only works when sublists have identical dimensions.

edited Nov 2 '16 at 9:59

answered Jul 17 '16 at 12:57

mdh

272210

would that be the optimal solution ? – RetroCode Sep 22 '16 at 20:08

1

@Sean well you learned something, didn't you ? :) – Salem Jan 13 at 9:36

Thanks for the edition. :) – Sean Jan 14 at 15:58

add a comment |

Canuck · Answer 11 · 2016-11-28 21:09:09Z

up vote 3 down vote

Fastest solution I have found (for large list anyway):

import numpy as np
#turn list into an array and flatten()
np.array(l).flatten()

Done! You can of course turn it back into a list by executing list(l)

answered Nov 28 '16 at 21:09

Canuck

746

add a comment |

Guillaume Jacquenot · Answer 12 · 2016-11-28 08:48:33Z

up vote 2 down vote

def flatten(l, a):
    for i in l:
        if isinstance(i, list):
            flatten(i, a)
        else:
            a.append(i)
    return a

print(flatten([[[1, [1,1, [3, [4,5,]]]], 2, 3], [4, 5],6], []))

# [1, 1, 1, 3, 4, 5, 2, 3, 4, 5, 6]

edited Nov 28 '16 at 8:48

Guillaume Jacquenot

2,29921331

answered Oct 26 '16 at 1:12

Anil

591619

Doesn't work for me unless I manually specify a=[]: >>> flatten([[1,2,3],[4,5,6]]) [1, 2, 3, 4, 5, 6] >>> flatten([[1,2,3],[4,5,6]]) [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6] – Jeff Nov 6 '16 at 22:52

@Jeff My answer was edited by @ deleet ... Check my original answer and it works... – Anil Nov 8 '16 at 3:04

1

nice, thanks. I rolled back @deleet's edit – Jeff Nov 8 '16 at 18:21

2

Yes, it was buggy as I found out later! I did test it, but the bug only happens after the first run. The reason is that the default argument [] gets treated as a consistent object in Python. So next time you run the function, it begins with the list you used last time! Very nasty bug, hard to figure out. In R (which I mostly use) this would have worked due to copy semantics. Does anyone know a Python solution? Having to manually specify an empty list every time is not a good design. I need this function for my own project, so I hope someone knows. :) – Deleet Nov 9 '16 at 3:27

I posted a fixed version now. – Deleet Nov 11 '16 at 11:53

add a comment |

Will · Answer 13 · 2016-10-27 03:31:54Z

If you are willing to give up a tiny amount of speed for a cleaner look, then you could use numpy.concatenate().tolist() or numpy.concatenate().ravel().tolist():

import numpy

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]] * 99

%timeit numpy.concatenate(l).ravel().tolist()
1000 loops, best of 3: 313 µs per loop

%timeit numpy.concatenate(l).tolist()
1000 loops, best of 3: 312 µs per loop

%timeit [item for sublist in l for item in sublist]
1000 loops, best of 3: 31.5 µs per loop

You can find out more here in the docs numpy.concatenate and numpy.ravel

Deleet · Answer 14 · 2016-11-29 03:45:43Z

An bad feature of Anil's function above is that it requires the user to always manually specify the second argument to be an empty list []. This should instead be a default. Due to the way Python objects work, these should be set inside the function, not in the arguments.

Here's a working function:

def list_flatten(l, a=None):
    #check a
    if a is None:
        #initialize with empty list
        a = []

    for i in l:
        if isinstance(i, list):
            list_flatten(i, a)
        else:
            a.append(i)
    return a

Testing:

In [2]: lst = [1, 2, [3], [[4]],[5,[6]]]

In [3]: lst
Out[3]: [1, 2, [3], [[4]], [5, [6]]]

In [11]: list_flatten(lst)
Out[11]: [1, 2, 3, 4, 5, 6]

pylang · Answer 15 · 2016-12-02 18:35:17Z

Consider installing the more_itertools package, which ships with an implementation for flatten:

import more_itertools

# Using flatten()
l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.flatten(l))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

As of version 2.4, you can flatten more complicated nested iterables with more_itertools.collapse (contributed by abarnet).

# Using collapse()
l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]         # given example 
list(more_itertools.collapse(l)) 
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

l = [[1, 2, 3], [[4, 5, 6]], [[[7]]], 8, 9]     # complex nesting
list(more_itertools.collapse(l))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

MSeifert · Answer 16 · 2017-01-16 03:10:23Z

If you want to flatten a data-structure where you don't know how deep it's nested you could use iteration_utilities.deepflatten (note: I'm the author of that external library):

>>> from iteration_utilities import deepflatten

>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(deepflatten(l, depth=1))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> l = [[1, 2, 3], [4, [5, 6]], 7, [8, 9]]
>>> list(deepflatten(l, depth=1))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

It's a generator so you need to cast the result to a list or explicitly iterate over it.

To flatten only one level and if each of the items is itself iterable you can also use iteration_utilities.flatten which itself is just a thin wrapper around itertools.chain.from_iterable:

>>> from iteration_utilities import flatten
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(flatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

pylang · Answer 17 · 2016-11-29 04:27:39Z

Here is a general approach that applies to lists of lists, numbers and other mixed containers types.

from collections import Iterable

def flatten(items):
    """Yield items from any nested iterable; see REF."""
    for x in items:
        if isinstance(x, Iterable):
            yield from flatten(x)
        else:
            yield x

list(flatten(l))                               # list of lists
#[1, 2, 3, 4, 5, 6, 7, 8, 9]

items = [[1, [2]], (3, 4, {5, 6}, 7), 8, 9]    # numbers & mixed containers
list(flatten(items))
#[1, 2, 3, 4, 5, 6, 7, 8, 9]

This solution employs Python 3's powerful yield from keyword, which extracts items from sub-generators. Note, this solution does not apply to strings.

REF: solution modified from Beazley, D. and B. Jones. Recipe 14.4, Python Cookbook 2nd Ed., O'Reilly Media Inc. Sebastopol, CA: 2013.

asked	7 years ago
viewed	634541 times
active	15 days ago

current community

your communities

more stack exchange communities

Making a flat list out of list of lists in Python

17 Answers 17

protected by Ashwini Chaudhary Jan 11 '13 at 13:22

Not the answer you're looking for? Browse other questions tagged python list or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Making a flat list out of list of lists in Python

17 Answers 17

protected by Ashwini Chaudhary Jan 11 '13 at 13:22

Not the answer you're looking for? Browse other questions tagged python list or ask your own question.

Linked

Related

Hot Network Questions