An overview of python data science libraries

Numpy

Numpy is a library which supports dealing with multi-dimensional arrays, which is common in when dealing with larger data sets. Beyond the arrays themselves, it provides a large number of functions to efficiently operate on those arrays. Because of the importance of fast operations when dealing with large data sets, the bulk of the code is written in highly-optimized C code. As the name implies, the arrays we deal with are often numeric in nature.

Scipy

Scipy is a package of algorithms to operate on numpy objects. Of the functions it provides, clustering and statistical packages are the most interesting to me. This package assumes a fair amount of understanding of the mathematical concepts that underpin it.

Pandas

Pandas operates (usually one or two) dimensional arrays. Unlike many other python data structures, pandas is column oriented. If we were to parse a pile of json, each property would be it’s own column (i.e. numpy array). Each index would correspond with the json object. So to see the nth object’s properties again, you’d look at the nth entry of each column.

In practice, it allows for efficient mechanisms loading data files and subsequently slicing and dicing them, such as turning time-series data from one one time box to another (e.g. secondly to 5 minutely).

Matplotlib

Matplotlib is a low-ish level graphing toolkit. It knows about colors, axes and the composition of figures. Users can change the color and format in many interesting ways, so it looks nice. It expects the data to be already sorted into a series ready to be plotted.

Seaborn

Seaborn is a higher level API on top of matplotlib. It is primarily focused on statistical graphing and has default themes for matplotlib which are aesthetically pleasing and modern. Unlike matplotlib, seaborn can operate on dataframes, and better understands that there is a relationship between the series in it.

re

re is the regular expressions module for Python. There’s nothing particularly special about it on that front, though it does have a nicely expressive syntax.

Beautiful Soup

Beautiful soup is a library to help with web scraping and parsing. I’ve primarily used it in the past for pulling data from HTML webpages, but it can be used to look through plain XML as well. Instead of knowing the specific path you want to traverse through an XML tree, you can search through the tree to find structures that match some sort of predicate.

The notes of Justin Abrahms

Recently updated

tests for quartz

Zero Knowledge Proofs (ZKP)

Sprint Ceremony input/outputs

Explorer