Speed-up Parquet I/O of Pandas by 5x

Dataframes are often stored in parquet files and read using Pandas' 𝐫𝐞𝐚𝐝_𝐩𝐚𝐫𝐪𝐮𝐞𝐭() method.

Rather than using Pandas, which relies on a single-core, use fastparquet. It offers immense speedups for I/O on parquet files using parallel processing.

Find more info here: Docs.

Share this post on LinkedIn: Post Link.

Find the code for my tips here: GitHub.

I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn.

Reply

or to participate.