Because of it's stunningly fast speed, Polars is an extremely popular open-source library for DataFrame operations in Python. Kinda unreal to have Ritchie Vink, Polars' creator, as today's guest!
Ritchie:
• Is CEO and Co-Founder of Polars, Inc., a startup that has raised $4m in seed funding to support his Polars open-source project.
• Previously worked as an ML Engineer, Data Scientist and Data Engineer at companies like adidas and KLM Royal Dutch Airlines.
• Holds a Master’s in Structural Engineering and worked as a civil engineer prior to catching the data-science bug.
Today’s episode will appeal most to hands-on practitioners like data scientists and ML engineers. In it, Ritchie details:
• How Polars regularly achieves 5-20x (sometimes 100x!) speed improvements over Pandas for most DataFrame operations.
• The Eager and Lazy execution APIs Polars offers and when you should use one or the other.
• Ritchie's vision for scaling Polars to handle massive distributed datasets.
• How we can continue to make data-processing efficiency gains even as Moore's Law slows down.
The SuperDataScience podcast is available on all major podcasting platforms, YouTube, and at SuperDataScience.com.