pyspark.pandas.DataFrame.drop#
- DataFrame.drop(labels=None, axis=0, index=None, columns=None)[source]#
Drop specified labels from columns.
Remove rows and/or columns by specifying label names and corresponding axis, or by specifying directly index and/or column names. Drop rows of a MultiIndex DataFrame is not supported yet.
- Parameters
- labelssingle label or list-like
Column labels to drop.
- axis{0 or ‘index’, 1 or ‘columns’}, default 0
Changed in version 3.3: Set dropping by index is default.
- indexsingle label or list-like
Alternative to specifying axis (
labels, axis=0
is equivalent toindex=columns
).Changed in version 3.3: Added dropping rows by ‘index’.
- columnssingle label or list-like
Alternative to specifying axis (
labels, axis=1
is equivalent tocolumns=labels
).
- Returns
- droppedDataFrame
See also
Notes
Currently, dropping rows of a MultiIndex DataFrame is not supported yet.
Examples
>>> df = ps.DataFrame(np.arange(12).reshape(3, 4), columns=['A', 'B', 'C', 'D']) >>> df A B C D 0 0 1 2 3 1 4 5 6 7 2 8 9 10 11
Drop columns
>>> df.drop(['B', 'C'], axis=1) A D 0 0 3 1 4 7 2 8 11
>>> df.drop(columns=['B', 'C']) A D 0 0 3 1 4 7 2 8 11
Drop a row by index
>>> df.drop([0, 1]) A B C D 2 8 9 10 11
>>> df.drop(index=[0, 1], columns='A') B C D 2 9 10 11
Also support dropping columns for MultiIndex
>>> df = ps.DataFrame({'x': [1, 2], 'y': [3, 4], 'z': [5, 6], 'w': [7, 8]}, ... columns=['x', 'y', 'z', 'w']) >>> columns = [('a', 'x'), ('a', 'y'), ('b', 'z'), ('b', 'w')] >>> df.columns = pd.MultiIndex.from_tuples(columns) >>> df a b x y z w 0 1 3 5 7 1 2 4 6 8 >>> df.drop(labels='a', axis=1) b z w 0 5 7 1 6 8