Had some weird behavior happening in my sklearn pipelines yesterday which was due to the strange way pandas sometimes copies and sometimes uses references (apparently even when I explictly tell it to deep copy.) So, to clarify it a bit, decided to take a short detour into Pandas internals.
A Series is a one-dimensional array-like object containing a sequence of values (of similar types to NumPy types) and an associated array of data labels, called its index.
A series can also be thought of as a fixed-length, ordered dict, as it is a mapping of index values to data values. It can be used in many contexts where you might use a dict.
Both the Series object itself and its index have a name attribute, which integrates with other key ares of pandas functionality:
The DataFrame has both a row and column index; it can be thought of as a dict of Series all sharing the same index.
In the first stage of the process, data contained in a pandas object, whether a Series, Data‐ Frame, or otherwise, is split into groups based on one or more keys that you provide. The splitting is performed on a particular axis of an object. For example, a DataFrame can be grouped on its rows (axis=0) or its columns (axis=1). Once this is done, a function is applied to each group, producing a new value. Finally, the results of all those function applications are combined into a result object. The form of the resulting object will usually depend on what’s being done to the data.