Dismiss

Announcing Stack Overflow Documentation

We started with Q&A. Technical documentation is next, and we need your help.

Whether you're a beginner or an experienced developer, you can contribute.

Memory Usage, Filling Pandas DataFrame using Dict vs using key and value Lists

up vote 3 down vote favorite

I am making a package that reads a binary file and returns data that can be used to initialize a DataFrame, I am now wondering if it is best to return a dict or two lists (one that holds the keys and one that holds the values).

The package I am making is not supposed to be entirely reliant on a DataFrame object, which is why my package currently outputs the data as a dict (for easy access). If there could be some memory and speed savings (which is paramount for my application as I am dealing with millions of data points), I would like to output the key and value lists instead. These iterables would then be used to initialize a DataFrame.

Here is a simple example:

In [1]: d = {(1,1,1): '111',
   ...: (2,2,2): '222',
   ...: (3,3,3): '333',
   ...: (4,4,4): '444'}

In [2]: keystup=[(1,1,1),(2,2,2),(3,3,3),(4,4,4)]

In [3]: valstup=['111','222','333','444']

In [4]: import pandas as pd

In [5]: dfdict=pd.DataFrame(d.values(),  index=pd.MultiIndex.from_tuples(d.keys(), names=['a','b','c']))

In [6]: dfdict
Out[6]: 
         0
a b c     
3 3 3  333
2 2 2  222
1 1 1  111
4 4 4  444

In [7]: dfpair=pd.DataFrame(valstup,  index=pd.MultiIndex.from_tuples(keystup, names=['a','b','c']))

In [8]: dfpair
Out[8]: 
         0
a b c     
1 1 1  111
2 2 2  222
3 3 3  333
4 4 4  444

It is my understanding that d.values() and d.keys() is creating a new copy of the data. If we disregard the fact the a dict takes more memory then a list, does using d.values() and d.keys() lead to more memory usage then the list pair implementation?

edited 1 hour ago

asked 1 hour ago

snowleopard

14317

Why not use numpy arrays instead? They have a much lower memory footprint than both lists and dictionaries – keiv.fly 1 hour ago

I am not using numpy since I do not know the size of the data, so I have to populate a list or a dict, and then initialize a numpy array or pandas Dataframe. – snowleopard 1 hour ago

I will write a benchmark of memory usage of lists vs dicts – keiv.fly 1 hour ago

Doesn't this also depend on the datatypes -- str, int and floats.. – Merlin 1 hour ago

You can directly convert your dict to a DataFrame with dfdict = pd.DataFrame.from_dict(d, orient='index') – jonnat 1 hour ago

| show 6 more comments

Your Answer

Sign up or log in

Post as a guest

Name

Post as a guest

Name

discard

By posting your answer, you agree to the privacy policy and terms of service.

Browse other questions tagged python list pandas dictionary dataframe or ask your own question.

question feed

asked	today
viewed	22 times

current community

your communities

more stack exchange communities

Memory Usage, Filling Pandas DataFrame using Dict vs using key and value Lists

Your Answer

Browse other questions tagged python list pandas dictionary dataframe or ask your own question.

Hot Network Questions

current community

your communities

more stack exchange communities

Memory Usage, Filling Pandas DataFrame using Dict vs using key and value Lists

Know someone who can answer? Share a link to this question via email, Google+, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged python list pandas dictionary dataframe or ask your own question.

Related

Hot Network Questions