3. What is Pandas?#

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Pandas capabilities

  • A fast and efficient DataFrame object for data manipulation with integrated indexing;

  • Tools for reading and writing data between in-memory data structures and different formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format;

  • Intelligent data alignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form;

  • Flexible reshaping and pivoting of data sets;

  • Intelligent label-based slicing, fancy indexing, and subsetting of large data sets;

  • Columns can be inserted and deleted from data structures for size mutability;

  • Aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets;

  • High performance merging and joining of data sets;

  • Hierarchical axis indexing provides an intuitive way of working with high-dimensional data in a lower-dimensional data structure;

  • Time series-functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging. Even create domain-specific time offsets and join time series without losing data;

  • Highly optimized for performance, with critical code paths written in Cython or C.

  • Python with pandas is in use in a wide variety of academic and commercial domains, including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics, and more.