pyspark.pandas.Series.spark.apply¶

spark.apply(func: Callable[[pyspark.sql.column.Column], pyspark.sql.column.Column]) → ps.Series¶

Applies a function that takes and returns a Spark column. It allows to natively apply a Spark function and column APIs with the Spark column internally used in Series or Index.

Note

It forces to lose the index and end up using the default index. It is preferred to use Series.spark.transform() or :meth:`DataFrame.spark.apply with specifying the index_col.

Note

It does not require to have the same length of the input and output. However, it requires to create a new DataFrame internally which will require to set compute.ops_on_diff_frames to compute even with the same origin DataFrame is expensive, whereas Series.spark.transform() does not require it.

Parameters

funcfunction: Function to apply the function against the data by using Spark columns.

Returns

Series

Raises

ValueErrorIf the output from the function is not a Spark column.

Examples

>>> from pyspark import pandas as ps
>>> from pyspark.sql.functions import count, lit
>>> df = ps.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, columns=["a", "b"])
>>> df
   a  b
0  1  4
1  2  5
2  3  6

>>> df.a.spark.apply(lambda c: count(c))
0    3
Name: a, dtype: int64

>>> df.a.spark.apply(lambda c: c + df.b.spark.column)
0    5
1    7
2    9
Name: a, dtype: int64

pyspark.pandas.Series.spark.transform

pyspark.pandas.Series.dt.date