pyspark.sql.functions.count_if#
- pyspark.sql.functions.count_if(col)[source]#
Aggregate function: Returns the number of TRUE values for the col.
New in version 3.5.0.
- Parameters
- col
Column
or str target column to work on.
- col
- Returns
Column
the number of TRUE values for the col.
Examples
Example 1: Counting the number of even numbers in a numeric column
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([("a", 1), ("a", 2), ("a", 3), ("b", 8), ("b", 2)], ["c1", "c2"]) >>> df.select(sf.count_if(sf.col('c2') % 2 == 0)).show() +------------------------+ |count_if(((c2 % 2) = 0))| +------------------------+ | 3| +------------------------+
Example 2: Counting the number of rows where a string column starts with a certain letter
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame( ... [("apple",), ("banana",), ("cherry",), ("apple",), ("banana",)], ["fruit"]) >>> df.select(sf.count_if(sf.col('fruit').startswith('a'))).show() +------------------------------+ |count_if(startswith(fruit, a))| +------------------------------+ | 2| +------------------------------+
Example 3: Counting the number of rows where a numeric column is greater than a certain value
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1,), (2,), (3,), (4,), (5,)], ["num"]) >>> df.select(sf.count_if(sf.col('num') > 3)).show() +-------------------+ |count_if((num > 3))| +-------------------+ | 2| +-------------------+
Example 4: Counting the number of rows where a boolean column is True
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(True,), (False,), (True,), (False,), (True,)], ["bool"]) >>> df.select(sf.count_if(sf.col('bool'))).show() +--------------+ |count_if(bool)| +--------------+ | 3| +--------------+