Object

org.apache.spark.sql

functions

Related Doc: package sql

Permalink

object functions

Functions available for DataFrame operations.

Annotations
@Stable()
Source
functions.scala
Since

1.3.0

Linear Supertypes
AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. functions
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def abs(e: Column): Column

    Permalink

    Computes the absolute value.

    Computes the absolute value.

    Since

    1.3.0

  5. def acos(columnName: String): Column

    Permalink

    Computes the cosine inverse of the given column; the returned angle is in the range 0.0 through pi.

    Computes the cosine inverse of the given column; the returned angle is in the range 0.0 through pi.

    Since

    1.4.0

  6. def acos(e: Column): Column

    Permalink

    Computes the cosine inverse of the given value; the returned angle is in the range 0.0 through pi.

    Computes the cosine inverse of the given value; the returned angle is in the range 0.0 through pi.

    Since

    1.4.0

  7. def add_months(startDate: Column, numMonths: Int): Column

    Permalink

    Returns the date that is numMonths after startDate.

    Returns the date that is numMonths after startDate.

    Since

    1.5.0

  8. def approx_count_distinct(columnName: String, rsd: Double): Column

    Permalink

    Aggregate function: returns the approximate number of distinct items in a group.

    Aggregate function: returns the approximate number of distinct items in a group.

    rsd

    maximum estimation error allowed (default = 0.05)

    Since

    2.1.0

  9. def approx_count_distinct(e: Column, rsd: Double): Column

    Permalink

    Aggregate function: returns the approximate number of distinct items in a group.

    Aggregate function: returns the approximate number of distinct items in a group.

    rsd

    maximum estimation error allowed (default = 0.05)

    Since

    2.1.0

  10. def approx_count_distinct(columnName: String): Column

    Permalink

    Aggregate function: returns the approximate number of distinct items in a group.

    Aggregate function: returns the approximate number of distinct items in a group.

    Since

    2.1.0

  11. def approx_count_distinct(e: Column): Column

    Permalink

    Aggregate function: returns the approximate number of distinct items in a group.

    Aggregate function: returns the approximate number of distinct items in a group.

    Since

    2.1.0

  12. def array(colName: String, colNames: String*): Column

    Permalink

    Creates a new array column.

    Creates a new array column. The input columns must all have the same data type.

    Annotations
    @varargs()
    Since

    1.4.0

  13. def array(cols: Column*): Column

    Permalink

    Creates a new array column.

    Creates a new array column. The input columns must all have the same data type.

    Annotations
    @varargs()
    Since

    1.4.0

  14. def array_contains(column: Column, value: Any): Column

    Permalink

    Returns null if the array is null, true if the array contains value, and false otherwise.

    Returns null if the array is null, true if the array contains value, and false otherwise.

    Since

    1.5.0

  15. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  16. def asc(columnName: String): Column

    Permalink

    Returns a sort expression based on ascending order of the column.

    Returns a sort expression based on ascending order of the column.

    df.sort(asc("dept"), desc("age"))
    Since

    1.3.0

  17. def asc_nulls_first(columnName: String): Column

    Permalink

    Returns a sort expression based on ascending order of the column, and null values return before non-null values.

    Returns a sort expression based on ascending order of the column, and null values return before non-null values.

    df.sort(asc_nulls_last("dept"), desc("age"))
    Since

    2.1.0

  18. def asc_nulls_last(columnName: String): Column

    Permalink

    Returns a sort expression based on ascending order of the column, and null values appear after non-null values.

    Returns a sort expression based on ascending order of the column, and null values appear after non-null values.

    df.sort(asc_nulls_last("dept"), desc("age"))
    Since

    2.1.0

  19. def ascii(e: Column): Column

    Permalink

    Computes the numeric value of the first character of the string column, and returns the result as an int column.

    Computes the numeric value of the first character of the string column, and returns the result as an int column.

    Since

    1.5.0

  20. def asin(columnName: String): Column

    Permalink

    Computes the sine inverse of the given column; the returned angle is in the range -pi/2 through pi/2.

    Computes the sine inverse of the given column; the returned angle is in the range -pi/2 through pi/2.

    Since

    1.4.0

  21. def asin(e: Column): Column

    Permalink

    Computes the sine inverse of the given value; the returned angle is in the range -pi/2 through pi/2.

    Computes the sine inverse of the given value; the returned angle is in the range -pi/2 through pi/2.

    Since

    1.4.0

  22. def atan(columnName: String): Column

    Permalink

    Computes the tangent inverse of the given column.

    Computes the tangent inverse of the given column.

    Since

    1.4.0

  23. def atan(e: Column): Column

    Permalink

    Computes the tangent inverse of the given value.

    Computes the tangent inverse of the given value.

    Since

    1.4.0

  24. def atan2(l: Double, rightName: String): Column

    Permalink

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Since

    1.4.0

  25. def atan2(l: Double, r: Column): Column

    Permalink

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Since

    1.4.0

  26. def atan2(leftName: String, r: Double): Column

    Permalink

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Since

    1.4.0

  27. def atan2(l: Column, r: Double): Column

    Permalink

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Since

    1.4.0

  28. def atan2(leftName: String, rightName: String): Column

    Permalink

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Since

    1.4.0

  29. def atan2(leftName: String, r: Column): Column

    Permalink

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Since

    1.4.0

  30. def atan2(l: Column, rightName: String): Column

    Permalink

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Since

    1.4.0

  31. def atan2(l: Column, r: Column): Column

    Permalink

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Returns the angle theta from the conversion of rectangular coordinates (x, y) to polar coordinates (r, theta).

    Since

    1.4.0

  32. def avg(columnName: String): Column

    Permalink

    Aggregate function: returns the average of the values in a group.

    Aggregate function: returns the average of the values in a group.

    Since

    1.3.0

  33. def avg(e: Column): Column

    Permalink

    Aggregate function: returns the average of the values in a group.

    Aggregate function: returns the average of the values in a group.

    Since

    1.3.0

  34. def base64(e: Column): Column

    Permalink

    Computes the BASE64 encoding of a binary column and returns it as a string column.

    Computes the BASE64 encoding of a binary column and returns it as a string column. This is the reverse of unbase64.

    Since

    1.5.0

  35. def bin(columnName: String): Column

    Permalink

    An expression that returns the string representation of the binary value of the given long column.

    An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".

    Since

    1.5.0

  36. def bin(e: Column): Column

    Permalink

    An expression that returns the string representation of the binary value of the given long column.

    An expression that returns the string representation of the binary value of the given long column. For example, bin("12") returns "1100".

    Since

    1.5.0

  37. def bitwiseNOT(e: Column): Column

    Permalink

    Computes bitwise NOT.

    Computes bitwise NOT.

    Since

    1.4.0

  38. def broadcast[T](df: Dataset[T]): Dataset[T]

    Permalink

    Marks a DataFrame as small enough for use in broadcast joins.

    Marks a DataFrame as small enough for use in broadcast joins.

    The following example marks the right DataFrame for broadcast hash join using joinKey.

    // left and right are DataFrames
    left.join(broadcast(right), "joinKey")
    Since

    1.5.0

  39. def bround(e: Column, scale: Int): Column

    Permalink

    Round the value of e to scale decimal places with HALF_EVEN round mode if scale is greater than or equal to 0 or at integral part when scale is less than 0.

    Round the value of e to scale decimal places with HALF_EVEN round mode if scale is greater than or equal to 0 or at integral part when scale is less than 0.

    Since

    2.0.0

  40. def bround(e: Column): Column

    Permalink

    Returns the value of the column e rounded to 0 decimal places with HALF_EVEN round mode.

    Returns the value of the column e rounded to 0 decimal places with HALF_EVEN round mode.

    Since

    2.0.0

  41. def callUDF(udfName: String, cols: Column*): Column

    Permalink

    Call an user-defined function.

    Call an user-defined function. Example:

    import org.apache.spark.sql._
    
    val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value")
    val spark = df.sparkSession
    spark.udf.register("simpleUDF", (v: Int) => v * v)
    df.select($"id", callUDF("simpleUDF", $"value"))
    Annotations
    @varargs()
    Since

    1.5.0

  42. def cbrt(columnName: String): Column

    Permalink

    Computes the cube-root of the given column.

    Computes the cube-root of the given column.

    Since

    1.4.0

  43. def cbrt(e: Column): Column

    Permalink

    Computes the cube-root of the given value.

    Computes the cube-root of the given value.

    Since

    1.4.0

  44. def ceil(columnName: String): Column

    Permalink

    Computes the ceiling of the given column.

    Computes the ceiling of the given column.

    Since

    1.4.0

  45. def ceil(e: Column): Column

    Permalink

    Computes the ceiling of the given value.

    Computes the ceiling of the given value.

    Since

    1.4.0

  46. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  47. def coalesce(e: Column*): Column

    Permalink

    Returns the first column that is not null, or null if all inputs are null.

    Returns the first column that is not null, or null if all inputs are null.

    For example, coalesce(a, b, c) will return a if a is not null, or b if a is null and b is not null, or c if both a and b are null but c is not null.

    Annotations
    @varargs()
    Since

    1.3.0

  48. def col(colName: String): Column

    Permalink

    Returns a Column based on the given column name.

    Returns a Column based on the given column name.

    Since

    1.3.0

  49. def collect_list(columnName: String): Column

    Permalink

    Aggregate function: returns a list of objects with duplicates.

    Aggregate function: returns a list of objects with duplicates.

    Since

    1.6.0

  50. def collect_list(e: Column): Column

    Permalink

    Aggregate function: returns a list of objects with duplicates.

    Aggregate function: returns a list of objects with duplicates.

    Since

    1.6.0

  51. def collect_set(columnName: String): Column

    Permalink

    Aggregate function: returns a set of objects with duplicate elements eliminated.

    Aggregate function: returns a set of objects with duplicate elements eliminated.

    Since

    1.6.0

  52. def collect_set(e: Column): Column

    Permalink

    Aggregate function: returns a set of objects with duplicate elements eliminated.

    Aggregate function: returns a set of objects with duplicate elements eliminated.

    Since

    1.6.0

  53. def column(colName: String): Column

    Permalink

    Returns a Column based on the given column name.

    Returns a Column based on the given column name. Alias of col.

    Since

    1.3.0

  54. def concat(exprs: Column*): Column

    Permalink

    Concatenates multiple input string columns together into a single string column.

    Concatenates multiple input string columns together into a single string column.

    Annotations
    @varargs()
    Since

    1.5.0

  55. def concat_ws(sep: String, exprs: Column*): Column

    Permalink

    Concatenates multiple input string columns together into a single string column, using the given separator.

    Concatenates multiple input string columns together into a single string column, using the given separator.

    Annotations
    @varargs()
    Since

    1.5.0

  56. def conv(num: Column, fromBase: Int, toBase: Int): Column

    Permalink

    Convert a number in a string column from one base to another.

    Convert a number in a string column from one base to another.

    Since

    1.5.0

  57. def corr(columnName1: String, columnName2: String): Column

    Permalink

    Aggregate function: returns the Pearson Correlation Coefficient for two columns.

    Aggregate function: returns the Pearson Correlation Coefficient for two columns.

    Since

    1.6.0

  58. def corr(column1: Column, column2: Column): Column

    Permalink

    Aggregate function: returns the Pearson Correlation Coefficient for two columns.

    Aggregate function: returns the Pearson Correlation Coefficient for two columns.

    Since

    1.6.0

  59. def cos(columnName: String): Column

    Permalink

    Computes the cosine of the given column.

    Computes the cosine of the given column.

    Since

    1.4.0

  60. def cos(e: Column): Column

    Permalink

    Computes the cosine of the given value.

    Computes the cosine of the given value.

    Since

    1.4.0

  61. def cosh(columnName: String): Column

    Permalink

    Computes the hyperbolic cosine of the given column.

    Computes the hyperbolic cosine of the given column.

    Since

    1.4.0

  62. def cosh(e: Column): Column

    Permalink

    Computes the hyperbolic cosine of the given value.

    Computes the hyperbolic cosine of the given value.

    Since

    1.4.0

  63. def count(columnName: String): TypedColumn[Any, Long]

    Permalink

    Aggregate function: returns the number of items in a group.

    Aggregate function: returns the number of items in a group.

    Since

    1.3.0

  64. def count(e: Column): Column

    Permalink

    Aggregate function: returns the number of items in a group.

    Aggregate function: returns the number of items in a group.

    Since

    1.3.0

  65. def countDistinct(columnName: String, columnNames: String*): Column

    Permalink

    Aggregate function: returns the number of distinct items in a group.

    Aggregate function: returns the number of distinct items in a group.

    Annotations
    @varargs()
    Since

    1.3.0

  66. def countDistinct(expr: Column, exprs: Column*): Column

    Permalink

    Aggregate function: returns the number of distinct items in a group.

    Aggregate function: returns the number of distinct items in a group.

    Annotations
    @varargs()
    Since

    1.3.0

  67. def covar_pop(columnName1: String, columnName2: String): Column

    Permalink

    Aggregate function: returns the population covariance for two columns.

    Aggregate function: returns the population covariance for two columns.

    Since

    2.0.0

  68. def covar_pop(column1: Column, column2: Column): Column

    Permalink

    Aggregate function: returns the population covariance for two columns.

    Aggregate function: returns the population covariance for two columns.

    Since

    2.0.0

  69. def covar_samp(columnName1: String, columnName2: String): Column

    Permalink

    Aggregate function: returns the sample covariance for two columns.

    Aggregate function: returns the sample covariance for two columns.

    Since

    2.0.0

  70. def covar_samp(column1: Column, column2: Column): Column

    Permalink

    Aggregate function: returns the sample covariance for two columns.

    Aggregate function: returns the sample covariance for two columns.

    Since

    2.0.0

  71. def crc32(e: Column): Column

    Permalink

    Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.

    Calculates the cyclic redundancy check value (CRC32) of a binary column and returns the value as a bigint.

    Since

    1.5.0

  72. def cume_dist(): Column

    Permalink

    Window function: returns the cumulative distribution of values within a window partition, i.e.

    Window function: returns the cumulative distribution of values within a window partition, i.e. the fraction of rows that are below the current row.

    N = total number of rows in the partition
    cumeDist(x) = number of values before (and including) x / N
    Since

    1.6.0

  73. def current_date(): Column

    Permalink

    Returns the current date as a date column.

    Returns the current date as a date column.

    Since

    1.5.0

  74. def current_timestamp(): Column

    Permalink

    Returns the current timestamp as a timestamp column.

    Returns the current timestamp as a timestamp column.

    Since

    1.5.0

  75. def date_add(start: Column, days: Int): Column

    Permalink

    Returns the date that is days days after start

    Returns the date that is days days after start

    Since

    1.5.0

  76. def date_format(dateExpr: Column, format: String): Column

    Permalink

    Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.

    Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.

    A pattern could be for instance dd.MM.yyyy and could return a string like '18.03.1993'. All pattern letters of java.text.SimpleDateFormat can be used.

    Since

    1.5.0

    Note

    Use when ever possible specialized functions like year. These benefit from a specialized implementation.

  77. def date_sub(start: Column, days: Int): Column

    Permalink

    Returns the date that is days days before start

    Returns the date that is days days before start

    Since

    1.5.0

  78. def datediff(end: Column, start: Column): Column

    Permalink

    Returns the number of days from start to end.

    Returns the number of days from start to end.

    Since

    1.5.0

  79. def dayofmonth(e: Column): Column

    Permalink

    Extracts the day of the month as an integer from a given date/timestamp/string.

    Extracts the day of the month as an integer from a given date/timestamp/string.

    Since

    1.5.0

  80. def dayofyear(e: Column): Column

    Permalink

    Extracts the day of the year as an integer from a given date/timestamp/string.

    Extracts the day of the year as an integer from a given date/timestamp/string.

    Since

    1.5.0

  81. def decode(value: Column, charset: String): Column

    Permalink

    Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').

    Computes the first argument into a string from a binary using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.

    Since

    1.5.0

  82. def degrees(columnName: String): Column

    Permalink

    Converts an angle measured in radians to an approximately equivalent angle measured in degrees.

    Converts an angle measured in radians to an approximately equivalent angle measured in degrees.

    Since

    2.1.0

  83. def degrees(e: Column): Column

    Permalink

    Converts an angle measured in radians to an approximately equivalent angle measured in degrees.

    Converts an angle measured in radians to an approximately equivalent angle measured in degrees.

    Since

    2.1.0

  84. def dense_rank(): Column

    Permalink

    Window function: returns the rank of rows within a window partition, without any gaps.

    Window function: returns the rank of rows within a window partition, without any gaps.

    The difference between rank and dense_rank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.

    This is equivalent to the DENSE_RANK function in SQL.

    Since

    1.6.0

  85. def desc(columnName: String): Column

    Permalink

    Returns a sort expression based on the descending order of the column.

    Returns a sort expression based on the descending order of the column.

    df.sort(asc("dept"), desc("age"))
    Since

    1.3.0

  86. def desc_nulls_first(columnName: String): Column

    Permalink

    Returns a sort expression based on the descending order of the column, and null values appear before non-null values.

    Returns a sort expression based on the descending order of the column, and null values appear before non-null values.

    df.sort(asc("dept"), desc_nulls_first("age"))
    Since

    2.1.0

  87. def desc_nulls_last(columnName: String): Column

    Permalink

    Returns a sort expression based on the descending order of the column, and null values appear after non-null values.

    Returns a sort expression based on the descending order of the column, and null values appear after non-null values.

    df.sort(asc("dept"), desc_nulls_last("age"))
    Since

    2.1.0

  88. def encode(value: Column, charset: String): Column

    Permalink

    Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').

    Computes the first argument into a binary from a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null.

    Since

    1.5.0

  89. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  90. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  91. def exp(columnName: String): Column

    Permalink

    Computes the exponential of the given column.

    Computes the exponential of the given column.

    Since

    1.4.0

  92. def exp(e: Column): Column

    Permalink

    Computes the exponential of the given value.

    Computes the exponential of the given value.

    Since

    1.4.0

  93. def explode(e: Column): Column

    Permalink

    Creates a new row for each element in the given array or map column.

    Creates a new row for each element in the given array or map column.

    Since

    1.3.0

  94. def explode_outer(e: Column): Column

    Permalink

    Creates a new row for each element in the given array or map column.

    Creates a new row for each element in the given array or map column. Unlike explode, if the array/map is null or empty then null is produced.

    Since

    2.2.0

  95. def expm1(columnName: String): Column

    Permalink

    Computes the exponential of the given column.

    Computes the exponential of the given column.

    Since

    1.4.0

  96. def expm1(e: Column): Column

    Permalink

    Computes the exponential of the given value minus one.

    Computes the exponential of the given value minus one.

    Since

    1.4.0

  97. def expr(expr: String): Column

    Permalink

    Parses the expression string into the column that it represents, similar to DataFrame.selectExpr

    Parses the expression string into the column that it represents, similar to DataFrame.selectExpr

    // get the number of words of each length
    df.groupBy(expr("length(word)")).count()
  98. def factorial(e: Column): Column

    Permalink

    Computes the factorial of the given value.

    Computes the factorial of the given value.

    Since

    1.5.0

  99. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  100. def first(columnName: String): Column

    Permalink

    Aggregate function: returns the first value of a column in a group.

    Aggregate function: returns the first value of a column in a group.

    The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Since

    1.3.0

  101. def first(e: Column): Column

    Permalink

    Aggregate function: returns the first value in a group.

    Aggregate function: returns the first value in a group.

    The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Since

    1.3.0

  102. def first(columnName: String, ignoreNulls: Boolean): Column

    Permalink

    Aggregate function: returns the first value of a column in a group.

    Aggregate function: returns the first value of a column in a group.

    The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Since

    2.0.0

  103. def first(e: Column, ignoreNulls: Boolean): Column

    Permalink

    Aggregate function: returns the first value in a group.

    Aggregate function: returns the first value in a group.

    The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Since

    2.0.0

  104. def floor(columnName: String): Column

    Permalink

    Computes the floor of the given column.

    Computes the floor of the given column.

    Since

    1.4.0

  105. def floor(e: Column): Column

    Permalink

    Computes the floor of the given value.

    Computes the floor of the given value.

    Since

    1.4.0

  106. def format_number(x: Column, d: Int): Column

    Permalink

    Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.

    Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places with HALF_EVEN round mode, and returns the result as a string column.

    If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null.

    Since

    1.5.0

  107. def format_string(format: String, arguments: Column*): Column

    Permalink

    Formats the arguments in printf-style and returns the result as a string column.

    Formats the arguments in printf-style and returns the result as a string column.

    Annotations
    @varargs()
    Since

    1.5.0

  108. def from_json(e: Column, schema: String, options: Map[String, String]): Column

    Permalink

    Parses a column containing a JSON string into a StructType or ArrayType of StructTypes with the specified schema.

    Parses a column containing a JSON string into a StructType or ArrayType of StructTypes with the specified schema. Returns null, in the case of an unparseable string.

    e

    a string column containing JSON data.

    schema

    the schema to use when parsing the json string as a json string. In Spark 2.1, the user-provided schema has to be in JSON format. Since Spark 2.2, the DDL format is also supported for the schema.

    Since

    2.1.0

  109. def from_json(e: Column, schema: DataType): Column

    Permalink

    Parses a column containing a JSON string into a StructType or ArrayType of StructTypes with the specified schema.

    Parses a column containing a JSON string into a StructType or ArrayType of StructTypes with the specified schema. Returns null, in the case of an unparseable string.

    e

    a string column containing JSON data.

    schema

    the schema to use when parsing the json string

    Since

    2.2.0

  110. def from_json(e: Column, schema: StructType): Column

    Permalink

    Parses a column containing a JSON string into a StructType with the specified schema.

    Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

    e

    a string column containing JSON data.

    schema

    the schema to use when parsing the json string

    Since

    2.1.0

  111. def from_json(e: Column, schema: DataType, options: Map[String, String]): Column

    Permalink

    (Java-specific) Parses a column containing a JSON string into a StructType or ArrayType of StructTypes with the specified schema.

    (Java-specific) Parses a column containing a JSON string into a StructType or ArrayType of StructTypes with the specified schema. Returns null, in the case of an unparseable string.

    e

    a string column containing JSON data.

    schema

    the schema to use when parsing the json string

    options

    options to control how the json is parsed. accepts the same options and the json data source.

    Since

    2.2.0

  112. def from_json(e: Column, schema: StructType, options: Map[String, String]): Column

    Permalink

    (Java-specific) Parses a column containing a JSON string into a StructType with the specified schema.

    (Java-specific) Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

    e

    a string column containing JSON data.

    schema

    the schema to use when parsing the json string

    options

    options to control how the json is parsed. accepts the same options and the json data source.

    Since

    2.1.0

  113. def from_json(e: Column, schema: DataType, options: Map[String, String]): Column

    Permalink

    (Scala-specific) Parses a column containing a JSON string into a StructType or ArrayType of StructTypes with the specified schema.

    (Scala-specific) Parses a column containing a JSON string into a StructType or ArrayType of StructTypes with the specified schema. Returns null, in the case of an unparseable string.

    e

    a string column containing JSON data.

    schema

    the schema to use when parsing the json string

    options

    options to control how the json is parsed. accepts the same options and the json data source.

    Since

    2.2.0

  114. def from_json(e: Column, schema: StructType, options: Map[String, String]): Column

    Permalink

    (Scala-specific) Parses a column containing a JSON string into a StructType with the specified schema.

    (Scala-specific) Parses a column containing a JSON string into a StructType with the specified schema. Returns null, in the case of an unparseable string.

    e

    a string column containing JSON data.

    schema

    the schema to use when parsing the json string

    options

    options to control how the json is parsed. Accepts the same options as the json data source.

    Since

    2.1.0

  115. def from_unixtime(ut: Column, f: String): Column

    Permalink

    Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.

    Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.

    Since

    1.5.0

  116. def from_unixtime(ut: Column): Column

    Permalink

    Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.

    Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.

    Since

    1.5.0

  117. def from_utc_timestamp(ts: Column, tz: String): Column

    Permalink

    Given a timestamp, which corresponds to a certain time of day in UTC, returns another timestamp that corresponds to the same time of day in the given timezone.

    Given a timestamp, which corresponds to a certain time of day in UTC, returns another timestamp that corresponds to the same time of day in the given timezone.

    Since

    1.5.0

  118. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  119. def get_json_object(e: Column, path: String): Column

    Permalink

    Extracts json object from a json string based on json path specified, and returns json string of the extracted json object.

    Extracts json object from a json string based on json path specified, and returns json string of the extracted json object. It will return null if the input json string is invalid.

    Since

    1.6.0

  120. def greatest(columnName: String, columnNames: String*): Column

    Permalink

    Returns the greatest value of the list of column names, skipping null values.

    Returns the greatest value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

    Annotations
    @varargs()
    Since

    1.5.0

  121. def greatest(exprs: Column*): Column

    Permalink

    Returns the greatest value of the list of values, skipping null values.

    Returns the greatest value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

    Annotations
    @varargs()
    Since

    1.5.0

  122. def grouping(columnName: String): Column

    Permalink

    Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.

    Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.

    Since

    2.0.0

  123. def grouping(e: Column): Column

    Permalink

    Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.

    Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.

    Since

    2.0.0

  124. def grouping_id(colName: String, colNames: String*): Column

    Permalink

    Aggregate function: returns the level of grouping, equals to

    Aggregate function: returns the level of grouping, equals to

    (grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)
    Since

    2.0.0

    Note

    The list of columns should match with grouping columns exactly.

  125. def grouping_id(cols: Column*): Column

    Permalink

    Aggregate function: returns the level of grouping, equals to

    Aggregate function: returns the level of grouping, equals to

    (grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)
    Since

    2.0.0

    Note

    The list of columns should match with grouping columns exactly, or empty (means all the grouping columns).

  126. def hash(cols: Column*): Column

    Permalink

    Calculates the hash code of given columns, and returns the result as an int column.

    Calculates the hash code of given columns, and returns the result as an int column.

    Annotations
    @varargs()
    Since

    2.0

  127. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  128. def hex(column: Column): Column

    Permalink

    Computes hex value of the given column.

    Computes hex value of the given column.

    Since

    1.5.0

  129. def hour(e: Column): Column

    Permalink

    Extracts the hours as an integer from a given date/timestamp/string.

    Extracts the hours as an integer from a given date/timestamp/string.

    Since

    1.5.0

  130. def hypot(l: Double, rightName: String): Column

    Permalink

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Since

    1.4.0

  131. def hypot(l: Double, r: Column): Column

    Permalink

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Since

    1.4.0

  132. def hypot(leftName: String, r: Double): Column

    Permalink

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Since

    1.4.0

  133. def hypot(l: Column, r: Double): Column

    Permalink

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Since

    1.4.0

  134. def hypot(leftName: String, rightName: String): Column

    Permalink

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Since

    1.4.0

  135. def hypot(leftName: String, r: Column): Column

    Permalink

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Since

    1.4.0

  136. def hypot(l: Column, rightName: String): Column

    Permalink

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Since

    1.4.0

  137. def hypot(l: Column, r: Column): Column

    Permalink

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Computes sqrt(a2 + b2) without intermediate overflow or underflow.

    Since

    1.4.0

  138. def initcap(e: Column): Column

    Permalink

    Returns a new string column by converting the first letter of each word to uppercase.

    Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.

    For example, "hello world" will become "Hello World".

    Since

    1.5.0

  139. def input_file_name(): Column

    Permalink

    Creates a string column for the file name of the current Spark task.

    Creates a string column for the file name of the current Spark task.

    Since

    1.6.0

  140. def instr(str: Column, substring: String): Column

    Permalink

    Locate the position of the first occurrence of substr column in the given string.

    Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null.

    Since

    1.5.0

    Note

    The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.

  141. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  142. def isnan(e: Column): Column

    Permalink

    Return true iff the column is NaN.

    Return true iff the column is NaN.

    Since

    1.6.0

  143. def isnull(e: Column): Column

    Permalink

    Return true iff the column is null.

    Return true iff the column is null.

    Since

    1.6.0

  144. def json_tuple(json: Column, fields: String*): Column

    Permalink

    Creates a new row for a json column according to the given field names.

    Creates a new row for a json column according to the given field names.

    Annotations
    @varargs()
    Since

    1.6.0

  145. def kurtosis(columnName: String): Column

    Permalink

    Aggregate function: returns the kurtosis of the values in a group.

    Aggregate function: returns the kurtosis of the values in a group.

    Since

    1.6.0

  146. def kurtosis(e: Column): Column

    Permalink

    Aggregate function: returns the kurtosis of the values in a group.

    Aggregate function: returns the kurtosis of the values in a group.

    Since

    1.6.0

  147. def lag(e: Column, offset: Int, defaultValue: Any): Column

    Permalink

    Window function: returns the value that is offset rows before the current row, and defaultValue if there is less than offset rows before the current row.

    Window function: returns the value that is offset rows before the current row, and defaultValue if there is less than offset rows before the current row. For example, an offset of one will return the previous row at any given point in the window partition.

    This is equivalent to the LAG function in SQL.

    Since

    1.4.0

  148. def lag(columnName: String, offset: Int, defaultValue: Any): Column

    Permalink

    Window function: returns the value that is offset rows before the current row, and defaultValue if there is less than offset rows before the current row.

    Window function: returns the value that is offset rows before the current row, and defaultValue if there is less than offset rows before the current row. For example, an offset of one will return the previous row at any given point in the window partition.

    This is equivalent to the LAG function in SQL.

    Since

    1.4.0

  149. def lag(columnName: String, offset: Int): Column

    Permalink

    Window function: returns the value that is offset rows before the current row, and null if there is less than offset rows before the current row.

    Window function: returns the value that is offset rows before the current row, and null if there is less than offset rows before the current row. For example, an offset of one will return the previous row at any given point in the window partition.

    This is equivalent to the LAG function in SQL.

    Since

    1.4.0

  150. def lag(e: Column, offset: Int): Column

    Permalink

    Window function: returns the value that is offset rows before the current row, and null if there is less than offset rows before the current row.

    Window function: returns the value that is offset rows before the current row, and null if there is less than offset rows before the current row. For example, an offset of one will return the previous row at any given point in the window partition.

    This is equivalent to the LAG function in SQL.

    Since

    1.4.0

  151. def last(columnName: String): Column

    Permalink

    Aggregate function: returns the last value of the column in a group.

    Aggregate function: returns the last value of the column in a group.

    The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Since

    1.3.0

  152. def last(e: Column): Column

    Permalink

    Aggregate function: returns the last value in a group.

    Aggregate function: returns the last value in a group.

    The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Since

    1.3.0

  153. def last(columnName: String, ignoreNulls: Boolean): Column

    Permalink

    Aggregate function: returns the last value of the column in a group.

    Aggregate function: returns the last value of the column in a group.

    The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Since

    2.0.0

  154. def last(e: Column, ignoreNulls: Boolean): Column

    Permalink

    Aggregate function: returns the last value in a group.

    Aggregate function: returns the last value in a group.

    The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

    Since

    2.0.0

  155. def last_day(e: Column): Column

    Permalink

    Given a date column, returns the last day of the month which the given date belongs to.

    Given a date column, returns the last day of the month which the given date belongs to. For example, input "2015-07-27" returns "2015-07-31" since July 31 is the last day of the month in July 2015.

    Since

    1.5.0

  156. def lead(e: Column, offset: Int, defaultValue: Any): Column

    Permalink

    Window function: returns the value that is offset rows after the current row, and defaultValue if there is less than offset rows after the current row.

    Window function: returns the value that is offset rows after the current row, and defaultValue if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition.

    This is equivalent to the LEAD function in SQL.

    Since

    1.4.0

  157. def lead(columnName: String, offset: Int, defaultValue: Any): Column

    Permalink

    Window function: returns the value that is offset rows after the current row, and defaultValue if there is less than offset rows after the current row.

    Window function: returns the value that is offset rows after the current row, and defaultValue if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition.

    This is equivalent to the LEAD function in SQL.

    Since

    1.4.0

  158. def lead(e: Column, offset: Int): Column

    Permalink

    Window function: returns the value that is offset rows after the current row, and null if there is less than offset rows after the current row.

    Window function: returns the value that is offset rows after the current row, and null if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition.

    This is equivalent to the LEAD function in SQL.

    Since

    1.4.0

  159. def lead(columnName: String, offset: Int): Column

    Permalink

    Window function: returns the value that is offset rows after the current row, and null if there is less than offset rows after the current row.

    Window function: returns the value that is offset rows after the current row, and null if there is less than offset rows after the current row. For example, an offset of one will return the next row at any given point in the window partition.

    This is equivalent to the LEAD function in SQL.

    Since

    1.4.0

  160. def least(columnName: String, columnNames: String*): Column

    Permalink

    Returns the least value of the list of column names, skipping null values.

    Returns the least value of the list of column names, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

    Annotations
    @varargs()
    Since

    1.5.0

  161. def least(exprs: Column*): Column

    Permalink

    Returns the least value of the list of values, skipping null values.

    Returns the least value of the list of values, skipping null values. This function takes at least 2 parameters. It will return null iff all parameters are null.

    Annotations
    @varargs()
    Since

    1.5.0

  162. def length(e: Column): Column

    Permalink

    Computes the length of a given string or binary column.

    Computes the length of a given string or binary column.

    Since

    1.5.0

  163. def levenshtein(l: Column, r: Column): Column

    Permalink

    Computes the Levenshtein distance of the two given string columns.

    Computes the Levenshtein distance of the two given string columns.

    Since

    1.5.0

  164. def lit(literal: Any): Column

    Permalink

    Creates a Column of literal value.

    Creates a Column of literal value.

    The passed in object is returned directly if it is already a Column. If the object is a Scala Symbol, it is converted into a Column also. Otherwise, a new Column is created to represent the literal value.

    Since

    1.3.0

  165. def locate(substr: String, str: Column, pos: Int): Column

    Permalink

    Locate the position of the first occurrence of substr in a string column, after position pos.

    Locate the position of the first occurrence of substr in a string column, after position pos.

    Since

    1.5.0

    Note

    The position is not zero based, but 1 based index. returns 0 if substr could not be found in str.

  166. def locate(substr: String, str: Column): Column

    Permalink

    Locate the position of the first occurrence of substr.

    Locate the position of the first occurrence of substr.

    Since

    1.5.0

    Note

    The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.

  167. def log(base: Double, columnName: String): Column

    Permalink

    Returns the first argument-base logarithm of the second argument.

    Returns the first argument-base logarithm of the second argument.

    Since

    1.4.0

  168. def log(base: Double, a: Column): Column

    Permalink

    Returns the first argument-base logarithm of the second argument.

    Returns the first argument-base logarithm of the second argument.

    Since

    1.4.0

  169. def log(columnName: String): Column

    Permalink

    Computes the natural logarithm of the given column.

    Computes the natural logarithm of the given column.

    Since

    1.4.0

  170. def log(e: Column): Column

    Permalink

    Computes the natural logarithm of the given value.

    Computes the natural logarithm of the given value.

    Since

    1.4.0

  171. def log10(columnName: String): Column

    Permalink

    Computes the logarithm of the given value in base 10.

    Computes the logarithm of the given value in base 10.

    Since

    1.4.0

  172. def log10(e: Column): Column

    Permalink

    Computes the logarithm of the given value in base 10.

    Computes the logarithm of the given value in base 10.

    Since

    1.4.0

  173. def log1p(columnName: String): Column

    Permalink

    Computes the natural logarithm of the given column plus one.

    Computes the natural logarithm of the given column plus one.

    Since

    1.4.0

  174. def log1p(e: Column): Column

    Permalink

    Computes the natural logarithm of the given value plus one.

    Computes the natural logarithm of the given value plus one.

    Since

    1.4.0

  175. def log2(columnName: String): Column

    Permalink

    Computes the logarithm of the given value in base 2.

    Computes the logarithm of the given value in base 2.

    Since

    1.5.0

  176. def log2(expr: Column): Column

    Permalink

    Computes the logarithm of the given column in base 2.

    Computes the logarithm of the given column in base 2.

    Since

    1.5.0

  177. def lower(e: Column): Column

    Permalink

    Converts a string column to lower case.

    Converts a string column to lower case.

    Since

    1.3.0

  178. def lpad(str: Column, len: Int, pad: String): Column

    Permalink

    Left-pad the string column with pad to a length of len.

    Left-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.

    Since

    1.5.0

  179. def ltrim(e: Column): Column

    Permalink

    Trim the spaces from left end for the specified string value.

    Trim the spaces from left end for the specified string value.

    Since

    1.5.0

  180. def map(cols: Column*): Column

    Permalink

    Creates a new map column.

    Creates a new map column. The input columns must be grouped as key-value pairs, e.g. (key1, value1, key2, value2, ...). The key columns must all have the same data type, and can't be null. The value columns must all have the same data type.

    Annotations
    @varargs()
    Since

    2.0

  181. def max(columnName: String): Column

    Permalink

    Aggregate function: returns the maximum value of the column in a group.

    Aggregate function: returns the maximum value of the column in a group.

    Since

    1.3.0

  182. def max(e: Column): Column

    Permalink

    Aggregate function: returns the maximum value of the expression in a group.

    Aggregate function: returns the maximum value of the expression in a group.

    Since

    1.3.0

  183. def md5(e: Column): Column

    Permalink

    Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.

    Calculates the MD5 digest of a binary column and returns the value as a 32 character hex string.

    Since

    1.5.0

  184. def mean(columnName: String): Column

    Permalink

    Aggregate function: returns the average of the values in a group.

    Aggregate function: returns the average of the values in a group. Alias for avg.

    Since

    1.4.0

  185. def mean(e: Column): Column

    Permalink

    Aggregate function: returns the average of the values in a group.

    Aggregate function: returns the average of the values in a group. Alias for avg.

    Since

    1.4.0

  186. def min(columnName: String): Column

    Permalink

    Aggregate function: returns the minimum value of the column in a group.

    Aggregate function: returns the minimum value of the column in a group.

    Since

    1.3.0

  187. def min(e: Column): Column

    Permalink

    Aggregate function: returns the minimum value of the expression in a group.

    Aggregate function: returns the minimum value of the expression in a group.

    Since

    1.3.0

  188. def minute(e: Column): Column

    Permalink

    Extracts the minutes as an integer from a given date/timestamp/string.

    Extracts the minutes as an integer from a given date/timestamp/string.

    Since

    1.5.0

  189. def monotonically_increasing_id(): Column

    Permalink

    A column expression that generates monotonically increasing 64-bit integers.

    A column expression that generates monotonically increasing 64-bit integers.

    The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.

    As an example, consider a DataFrame with two partitions, each with 3 records. This expression would return the following IDs:

    0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
    Since

    1.6.0

  190. def month(e: Column): Column

    Permalink

    Extracts the month as an integer from a given date/timestamp/string.

    Extracts the month as an integer from a given date/timestamp/string.

    Since

    1.5.0

  191. def months_between(date1: Column, date2: Column): Column

    Permalink

    Returns number of months between dates date1 and date2.

    Returns number of months between dates date1 and date2.

    Since

    1.5.0

  192. def nanvl(col1: Column, col2: Column): Column

    Permalink

    Returns col1 if it is not NaN, or col2 if col1 is NaN.

    Returns col1 if it is not NaN, or col2 if col1 is NaN.

    Both inputs should be floating point columns (DoubleType or FloatType).

    Since

    1.5.0

  193. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  194. def negate(e: Column): Column

    Permalink

    Unary minus, i.e.

    Unary minus, i.e. negate the expression.

    // Select the amount column and negates all values.
    // Scala:
    df.select( -df("amount") )
    
    // Java:
    df.select( negate(df.col("amount")) );
    Since

    1.3.0

  195. def next_day(date: Column, dayOfWeek: String): Column

    Permalink

    Given a date column, returns the first date which is later than the value of the date column that is on the specified day of the week.

    Given a date column, returns the first date which is later than the value of the date column that is on the specified day of the week.

    For example, next_day('2015-07-27', "Sunday") returns 2015-08-02 because that is the first Sunday after 2015-07-27.

    Day of the week parameter is case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun".

    Since

    1.5.0

  196. def not(e: Column): Column

    Permalink

    Inversion of boolean expression, i.e.

    Inversion of boolean expression, i.e. NOT.

    // Scala: select rows that are not active (isActive === false)
    df.filter( !df("isActive") )
    
    // Java:
    df.filter( not(df.col("isActive")) );
    Since

    1.3.0

  197. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  198. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  199. def ntile(n: Int): Column

    Permalink

    Window function: returns the ntile group id (from 1 to n inclusive) in an ordered window partition.

    Window function: returns the ntile group id (from 1 to n inclusive) in an ordered window partition. For example, if n is 4, the first quarter of the rows will get value 1, the second quarter will get 2, the third quarter will get 3, and the last quarter will get 4.

    This is equivalent to the NTILE function in SQL.

    Since

    1.4.0

  200. def percent_rank(): Column

    Permalink

    Window function: returns the relative rank (i.e.

    Window function: returns the relative rank (i.e. percentile) of rows within a window partition.

    This is computed by:

    (rank of row in its partition - 1) / (number of rows in the partition - 1)

    This is equivalent to the PERCENT_RANK function in SQL.

    Since

    1.6.0

  201. def pmod(dividend: Column, divisor: Column): Column

    Permalink

    Returns the positive value of dividend mod divisor.

    Returns the positive value of dividend mod divisor.

    Since

    1.5.0

  202. def posexplode(e: Column): Column

    Permalink

    Creates a new row for each element with position in the given array or map column.

    Creates a new row for each element with position in the given array or map column.

    Since

    2.1.0

  203. def posexplode_outer(e: Column): Column

    Permalink

    Creates a new row for each element with position in the given array or map column.

    Creates a new row for each element with position in the given array or map column. Unlike posexplode, if the array/map is null or empty then the row (null, null) is produced.

    Since

    2.2.0

  204. def pow(l: Double, rightName: String): Column

    Permalink

    Returns the value of the first argument raised to the power of the second argument.

    Returns the value of the first argument raised to the power of the second argument.

    Since

    1.4.0

  205. def pow(l: Double, r: Column): Column

    Permalink

    Returns the value of the first argument raised to the power of the second argument.

    Returns the value of the first argument raised to the power of the second argument.

    Since

    1.4.0

  206. def pow(leftName: String, r: Double): Column

    Permalink

    Returns the value of the first argument raised to the power of the second argument.

    Returns the value of the first argument raised to the power of the second argument.

    Since

    1.4.0

  207. def pow(l: Column, r: Double): Column

    Permalink

    Returns the value of the first argument raised to the power of the second argument.

    Returns the value of the first argument raised to the power of the second argument.

    Since

    1.4.0

  208. def pow(leftName: String, rightName: String): Column

    Permalink

    Returns the value of the first argument raised to the power of the second argument.

    Returns the value of the first argument raised to the power of the second argument.

    Since

    1.4.0

  209. def pow(leftName: String, r: Column): Column

    Permalink

    Returns the value of the first argument raised to the power of the second argument.

    Returns the value of the first argument raised to the power of the second argument.

    Since

    1.4.0

  210. def pow(l: Column, rightName: String): Column

    Permalink

    Returns the value of the first argument raised to the power of the second argument.

    Returns the value of the first argument raised to the power of the second argument.

    Since

    1.4.0

  211. def pow(l: Column, r: Column): Column

    Permalink

    Returns the value of the first argument raised to the power of the second argument.

    Returns the value of the first argument raised to the power of the second argument.

    Since

    1.4.0

  212. def quarter(e: Column): Column

    Permalink

    Extracts the quarter as an integer from a given date/timestamp/string.

    Extracts the quarter as an integer from a given date/timestamp/string.

    Since

    1.5.0

  213. def radians(columnName: String): Column

    Permalink

    Converts an angle measured in degrees to an approximately equivalent angle measured in radians.

    Converts an angle measured in degrees to an approximately equivalent angle measured in radians.

    Since

    2.1.0

  214. def radians(e: Column): Column

    Permalink

    Converts an angle measured in degrees to an approximately equivalent angle measured in radians.

    Converts an angle measured in degrees to an approximately equivalent angle measured in radians.

    Since

    2.1.0

  215. def rand(): Column

    Permalink

    Generate a random column with independent and identically distributed (i.i.d.) samples from U[0.0, 1.0].

    Generate a random column with independent and identically distributed (i.i.d.) samples from U[0.0, 1.0].

    Since

    1.4.0

  216. def rand(seed: Long): Column

    Permalink

    Generate a random column with independent and identically distributed (i.i.d.) samples from U[0.0, 1.0].

    Generate a random column with independent and identically distributed (i.i.d.) samples from U[0.0, 1.0].

    Since

    1.4.0

    Note

    This is indeterministic when data partitions are not fixed.

  217. def randn(): Column

    Permalink

    Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

    Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

    Since

    1.4.0

  218. def randn(seed: Long): Column

    Permalink

    Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

    Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

    Since

    1.4.0

    Note

    This is indeterministic when data partitions are not fixed.

  219. def rank(): Column

    Permalink

    Window function: returns the rank of rows within a window partition.

    Window function: returns the rank of rows within a window partition.

    The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.

    This is equivalent to the RANK function in SQL.

    Since

    1.4.0

  220. def regexp_extract(e: Column, exp: String, groupIdx: Int): Column

    Permalink

    Extract a specific group matched by a Java regex, from the specified string column.

    Extract a specific group matched by a Java regex, from the specified string column. If the regex did not match, or the specified group did not match, an empty string is returned.

    Since

    1.5.0

  221. def regexp_replace(e: Column, pattern: Column, replacement: Column): Column

    Permalink

    Replace all substrings of the specified string value that match regexp with rep.

    Replace all substrings of the specified string value that match regexp with rep.

    Since

    2.1.0

  222. def regexp_replace(e: Column, pattern: String, replacement: String): Column

    Permalink

    Replace all substrings of the specified string value that match regexp with rep.

    Replace all substrings of the specified string value that match regexp with rep.

    Since

    1.5.0

  223. def repeat(str: Column, n: Int): Column

    Permalink

    Repeats a string column n times, and returns it as a new string column.

    Repeats a string column n times, and returns it as a new string column.

    Since

    1.5.0

  224. def reverse(str: Column): Column

    Permalink

    Reverses the string column and returns it as a new string column.

    Reverses the string column and returns it as a new string column.

    Since

    1.5.0

  225. def rint(columnName: String): Column

    Permalink

    Returns the double value that is closest in value to the argument and is equal to a mathematical integer.

    Returns the double value that is closest in value to the argument and is equal to a mathematical integer.

    Since

    1.4.0

  226. def rint(e: Column): Column

    Permalink

    Returns the double value that is closest in value to the argument and is equal to a mathematical integer.

    Returns the double value that is closest in value to the argument and is equal to a mathematical integer.

    Since

    1.4.0

  227. def round(e: Column, scale: Int): Column

    Permalink

    Round the value of e to scale decimal places with HALF_UP round mode if scale is greater than or equal to 0 or at integral part when scale is less than 0.

    Round the value of e to scale decimal places with HALF_UP round mode if scale is greater than or equal to 0 or at integral part when scale is less than 0.

    Since

    1.5.0

  228. def round(e: Column): Column

    Permalink

    Returns the value of the column e rounded to 0 decimal places with HALF_UP round mode.

    Returns the value of the column e rounded to 0 decimal places with HALF_UP round mode.

    Since

    1.5.0

  229. def row_number(): Column

    Permalink

    Window function: returns a sequential number starting at 1 within a window partition.

    Window function: returns a sequential number starting at 1 within a window partition.

    Since

    1.6.0

  230. def rpad(str: Column, len: Int, pad: String): Column

    Permalink

    Right-pad the string column with pad to a length of len.

    Right-pad the string column with pad to a length of len. If the string column is longer than len, the return value is shortened to len characters.

    Since

    1.5.0

  231. def rtrim(e: Column): Column

    Permalink

    Trim the spaces from right end for the specified string value.

    Trim the spaces from right end for the specified string value.

    Since

    1.5.0

  232. def second(e: Column): Column

    Permalink

    Extracts the seconds as an integer from a given date/timestamp/string.

    Extracts the seconds as an integer from a given date/timestamp/string.

    Since

    1.5.0

  233. def sha1(e: Column): Column

    Permalink

    Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.

    Calculates the SHA-1 digest of a binary column and returns the value as a 40 character hex string.

    Since

    1.5.0

  234. def sha2(e: Column, numBits: Int): Column

    Permalink

    Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.

    Calculates the SHA-2 family of hash functions of a binary column and returns the value as a hex string.

    e

    column to compute SHA-2 on.

    numBits

    one of 224, 256, 384, or 512.

    Since

    1.5.0

  235. def shiftLeft(e: Column, numBits: Int): Column

    Permalink

    Shift the given value numBits left.

    Shift the given value numBits left. If the given value is a long value, this function will return a long value else it will return an integer value.

    Since

    1.5.0

  236. def shiftRight(e: Column, numBits: Int): Column

    Permalink

    (Signed) shift the given value numBits right.

    (Signed) shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.

    Since

    1.5.0

  237. def shiftRightUnsigned(e: Column, numBits: Int): Column

    Permalink

    Unsigned shift the given value numBits right.

    Unsigned shift the given value numBits right. If the given value is a long value, it will return a long value else it will return an integer value.

    Since

    1.5.0

  238. def signum(columnName: String): Column

    Permalink

    Computes the signum of the given column.

    Computes the signum of the given column.

    Since

    1.4.0

  239. def signum(e: Column): Column

    Permalink

    Computes the signum of the given value.

    Computes the signum of the given value.

    Since

    1.4.0

  240. def sin(columnName: String): Column

    Permalink

    Computes the sine of the given column.

    Computes the sine of the given column.

    Since

    1.4.0

  241. def sin(e: Column): Column

    Permalink

    Computes the sine of the given value.

    Computes the sine of the given value.

    Since

    1.4.0

  242. def sinh(columnName: String): Column

    Permalink

    Computes the hyperbolic sine of the given column.

    Computes the hyperbolic sine of the given column.

    Since

    1.4.0

  243. def sinh(e: Column): Column

    Permalink

    Computes the hyperbolic sine of the given value.

    Computes the hyperbolic sine of the given value.

    Since

    1.4.0

  244. def size(e: Column): Column

    Permalink

    Returns length of array or map.

    Returns length of array or map.

    Since

    1.5.0

  245. def skewness(columnName: String): Column

    Permalink

    Aggregate function: returns the skewness of the values in a group.

    Aggregate function: returns the skewness of the values in a group.

    Since

    1.6.0

  246. def skewness(e: Column): Column

    Permalink

    Aggregate function: returns the skewness of the values in a group.

    Aggregate function: returns the skewness of the values in a group.

    Since

    1.6.0

  247. def sort_array(e: Column, asc: Boolean): Column

    Permalink

    Sorts the input array for the given column in ascending or descending order, according to the natural ordering of the array elements.

    Sorts the input array for the given column in ascending or descending order, according to the natural ordering of the array elements.

    Since

    1.5.0

  248. def sort_array(e: Column): Column

    Permalink

    Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements.

    Sorts the input array for the given column in ascending order, according to the natural ordering of the array elements.

    Since

    1.5.0

  249. def soundex(e: Column): Column

    Permalink

    * Return the soundex code for the specified expression.

    * Return the soundex code for the specified expression.

    Since

    1.5.0

  250. def spark_partition_id(): Column

    Permalink

    Partition ID.

    Partition ID.

    Since

    1.6.0

    Note

    This is indeterministic because it depends on data partitioning and task scheduling.

  251. def split(str: Column, pattern: String): Column

    Permalink

    Splits str around pattern (pattern is a regular expression).

    Splits str around pattern (pattern is a regular expression).

    Since

    1.5.0

    Note

    Pattern is a string representation of the regular expression.

  252. def sqrt(colName: String): Column

    Permalink

    Computes the square root of the specified float value.

    Computes the square root of the specified float value.

    Since

    1.5.0

  253. def sqrt(e: Column): Column

    Permalink

    Computes the square root of the specified float value.

    Computes the square root of the specified float value.

    Since

    1.3.0

  254. def stddev(columnName: String): Column

    Permalink

    Aggregate function: alias for stddev_samp.

    Aggregate function: alias for stddev_samp.

    Since

    1.6.0

  255. def stddev(e: Column): Column

    Permalink

    Aggregate function: alias for stddev_samp.

    Aggregate function: alias for stddev_samp.

    Since

    1.6.0

  256. def stddev_pop(columnName: String): Column

    Permalink

    Aggregate function: returns the population standard deviation of the expression in a group.

    Aggregate function: returns the population standard deviation of the expression in a group.

    Since

    1.6.0

  257. def stddev_pop(e: Column): Column

    Permalink

    Aggregate function: returns the population standard deviation of the expression in a group.

    Aggregate function: returns the population standard deviation of the expression in a group.

    Since

    1.6.0

  258. def stddev_samp(columnName: String): Column

    Permalink

    Aggregate function: returns the sample standard deviation of the expression in a group.

    Aggregate function: returns the sample standard deviation of the expression in a group.

    Since

    1.6.0

  259. def stddev_samp(e: Column): Column

    Permalink

    Aggregate function: returns the sample standard deviation of the expression in a group.

    Aggregate function: returns the sample standard deviation of the expression in a group.

    Since

    1.6.0

  260. def struct(colName: String, colNames: String*): Column

    Permalink

    Creates a new struct column that composes multiple input columns.

    Creates a new struct column that composes multiple input columns.

    Annotations
    @varargs()
    Since

    1.4.0

  261. def struct(cols: Column*): Column

    Permalink

    Creates a new struct column.

    Creates a new struct column. If the input column is a column in a DataFrame, or a derived column expression that is named (i.e. aliased), its name would be remained as the StructField's name, otherwise, the newly generated StructField's name would be auto generated as col with a suffix index + 1, i.e. col1, col2, col3, ...

    Annotations
    @varargs()
    Since

    1.4.0

  262. def substring(str: Column, pos: Int, len: Int): Column

    Permalink

    Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type

    Substring starts at pos and is of length len when str is String type or returns the slice of byte array that starts at pos in byte and is of length len when str is Binary type

    Since

    1.5.0

  263. def substring_index(str: Column, delim: String, count: Int): Column

    Permalink

    Returns the substring from string str before count occurrences of the delimiter delim.

    Returns the substring from string str before count occurrences of the delimiter delim. If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. substring_index performs a case-sensitive match when searching for delim.

  264. def sum(columnName: String): Column

    Permalink

    Aggregate function: returns the sum of all values in the given column.

    Aggregate function: returns the sum of all values in the given column.

    Since

    1.3.0

  265. def sum(e: Column): Column

    Permalink

    Aggregate function: returns the sum of all values in the expression.

    Aggregate function: returns the sum of all values in the expression.

    Since

    1.3.0

  266. def sumDistinct(columnName: String): Column

    Permalink

    Aggregate function: returns the sum of distinct values in the expression.

    Aggregate function: returns the sum of distinct values in the expression.

    Since

    1.3.0

  267. def sumDistinct(e: Column): Column

    Permalink

    Aggregate function: returns the sum of distinct values in the expression.

    Aggregate function: returns the sum of distinct values in the expression.

    Since

    1.3.0

  268. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  269. def tan(columnName: String): Column

    Permalink

    Computes the tangent of the given column.

    Computes the tangent of the given column.

    Since

    1.4.0

  270. def tan(e: Column): Column

    Permalink

    Computes the tangent of the given value.

    Computes the tangent of the given value.

    Since

    1.4.0

  271. def tanh(columnName: String): Column

    Permalink

    Computes the hyperbolic tangent of the given column.

    Computes the hyperbolic tangent of the given column.

    Since

    1.4.0

  272. def tanh(e: Column): Column

    Permalink

    Computes the hyperbolic tangent of the given value.

    Computes the hyperbolic tangent of the given value.

    Since

    1.4.0

  273. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  274. def to_date(e: Column, fmt: String): Column

    Permalink

    Converts the column into a DateType with a specified format (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html]) return null if fail.

    Converts the column into a DateType with a specified format (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html]) return null if fail.

    Since

    2.2.0

  275. def to_date(e: Column): Column

    Permalink

    Converts the column into DateType.

    Converts the column into DateType.

    Since

    1.5.0

  276. def to_json(e: Column): Column

    Permalink

    Converts a column containing a StructType or ArrayType of StructTypes into a JSON string with the specified schema.

    Converts a column containing a StructType or ArrayType of StructTypes into a JSON string with the specified schema. Throws an exception, in the case of an unsupported type.

    e

    a column containing a struct or array of the structs.

    Since

    2.1.0

  277. def to_json(e: Column, options: Map[String, String]): Column

    Permalink

    (Java-specific) Converts a column containing a StructType or ArrayType of StructTypes into a JSON string with the specified schema.

    (Java-specific) Converts a column containing a StructType or ArrayType of StructTypes into a JSON string with the specified schema. Throws an exception, in the case of an unsupported type.

    e

    a column containing a struct or array of the structs.

    options

    options to control how the struct column is converted into a json string. accepts the same options and the json data source.

    Since

    2.1.0

  278. def to_json(e: Column, options: Map[String, String]): Column

    Permalink

    (Scala-specific) Converts a column containing a StructType or ArrayType of StructTypes into a JSON string with the specified schema.

    (Scala-specific) Converts a column containing a StructType or ArrayType of StructTypes into a JSON string with the specified schema. Throws an exception, in the case of an unsupported type.

    e

    a column containing a struct or array of the structs.

    options

    options to control how the struct column is converted into a json string. accepts the same options and the json data source.

    Since

    2.1.0

  279. def to_timestamp(s: Column, fmt: String): Column

    Permalink

    Convert time string to a Unix timestamp (in seconds) with a specified format (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html]) to Unix timestamp (in seconds), return null if fail.

    Convert time string to a Unix timestamp (in seconds) with a specified format (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html]) to Unix timestamp (in seconds), return null if fail.

    Since

    2.2.0

  280. def to_timestamp(s: Column): Column

    Permalink

    Convert time string to a Unix timestamp (in seconds).

    Convert time string to a Unix timestamp (in seconds). Uses the pattern "yyyy-MM-dd HH:mm:ss" and will return null on failure.

    Since

    2.2.0

  281. def to_utc_timestamp(ts: Column, tz: String): Column

    Permalink

    Given a timestamp, which corresponds to a certain time of day in the given timezone, returns another timestamp that corresponds to the same time of day in UTC.

    Given a timestamp, which corresponds to a certain time of day in the given timezone, returns another timestamp that corresponds to the same time of day in UTC.

    Since

    1.5.0

  282. def translate(src: Column, matchingString: String, replaceString: String): Column

    Permalink

    Translate any character in the src by a character in replaceString.

    Translate any character in the src by a character in replaceString. The characters in replaceString correspond to the characters in matchingString. The translate will happen when any character in the string matches the character in the matchingString.

    Since

    1.5.0

  283. def trim(e: Column): Column

    Permalink

    Trim the spaces from both ends for the specified string column.

    Trim the spaces from both ends for the specified string column.

    Since

    1.5.0

  284. def trunc(date: Column, format: String): Column

    Permalink

    Returns date truncated to the unit specified by the format.

    Returns date truncated to the unit specified by the format.

    Since

    1.5.0

  285. def typedLit[T](literal: T)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): Column

    Permalink

    Creates a Column of literal value.

    Creates a Column of literal value.

    The passed in object is returned directly if it is already a Column. If the object is a Scala Symbol, it is converted into a Column also. Otherwise, a new Column is created to represent the literal value. The difference between this function and lit is that this function can handle parameterized scala types e.g.: List, Seq and Map.

    Since

    2.2.0

  286. def udf(f: AnyRef, dataType: DataType): UserDefinedFunction

    Permalink

    Defines a user-defined function (UDF) using a Scala closure.

    Defines a user-defined function (UDF) using a Scala closure. For this variant, the caller must specify the output data type, and there is no automatic input type coercion.

    f

    A closure in Scala

    dataType

    The output data type of the UDF

    Since

    2.0.0

  287. def udf[RT, A1, A2, A3, A4, A5, A6, A7, A8, A9, A10](f: (A1, A2, A3, A4, A5, A6, A7, A8, A9, A10) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3], arg4: scala.reflect.api.JavaUniverse.TypeTag[A4], arg5: scala.reflect.api.JavaUniverse.TypeTag[A5], arg6: scala.reflect.api.JavaUniverse.TypeTag[A6], arg7: scala.reflect.api.JavaUniverse.TypeTag[A7], arg8: scala.reflect.api.JavaUniverse.TypeTag[A8], arg9: scala.reflect.api.JavaUniverse.TypeTag[A9], arg10: scala.reflect.api.JavaUniverse.TypeTag[A10]): UserDefinedFunction

    Permalink

    Defines a user-defined function of 10 arguments as user-defined function (UDF).

    Defines a user-defined function of 10 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.

    Since

    1.3.0

  288. def udf[RT, A1, A2, A3, A4, A5, A6, A7, A8, A9](f: (A1, A2, A3, A4, A5, A6, A7, A8, A9) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3], arg4: scala.reflect.api.JavaUniverse.TypeTag[A4], arg5: scala.reflect.api.JavaUniverse.TypeTag[A5], arg6: scala.reflect.api.JavaUniverse.TypeTag[A6], arg7: scala.reflect.api.JavaUniverse.TypeTag[A7], arg8: scala.reflect.api.JavaUniverse.TypeTag[A8], arg9: scala.reflect.api.JavaUniverse.TypeTag[A9]): UserDefinedFunction

    Permalink

    Defines a user-defined function of 9 arguments as user-defined function (UDF).

    Defines a user-defined function of 9 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.

    Since

    1.3.0

  289. def udf[RT, A1, A2, A3, A4, A5, A6, A7, A8](f: (A1, A2, A3, A4, A5, A6, A7, A8) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3], arg4: scala.reflect.api.JavaUniverse.TypeTag[A4], arg5: scala.reflect.api.JavaUniverse.TypeTag[A5], arg6: scala.reflect.api.JavaUniverse.TypeTag[A6], arg7: scala.reflect.api.JavaUniverse.TypeTag[A7], arg8: scala.reflect.api.JavaUniverse.TypeTag[A8]): UserDefinedFunction

    Permalink

    Defines a user-defined function of 8 arguments as user-defined function (UDF).

    Defines a user-defined function of 8 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.

    Since

    1.3.0

  290. def udf[RT, A1, A2, A3, A4, A5, A6, A7](f: (A1, A2, A3, A4, A5, A6, A7) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3], arg4: scala.reflect.api.JavaUniverse.TypeTag[A4], arg5: scala.reflect.api.JavaUniverse.TypeTag[A5], arg6: scala.reflect.api.JavaUniverse.TypeTag[A6], arg7: scala.reflect.api.JavaUniverse.TypeTag[A7]): UserDefinedFunction

    Permalink

    Defines a user-defined function of 7 arguments as user-defined function (UDF).

    Defines a user-defined function of 7 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.

    Since

    1.3.0

  291. def udf[RT, A1, A2, A3, A4, A5, A6](f: (A1, A2, A3, A4, A5, A6) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3], arg4: scala.reflect.api.JavaUniverse.TypeTag[A4], arg5: scala.reflect.api.JavaUniverse.TypeTag[A5], arg6: scala.reflect.api.JavaUniverse.TypeTag[A6]): UserDefinedFunction

    Permalink

    Defines a user-defined function of 6 arguments as user-defined function (UDF).

    Defines a user-defined function of 6 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.

    Since

    1.3.0

  292. def udf[RT, A1, A2, A3, A4, A5](f: (A1, A2, A3, A4, A5) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3], arg4: scala.reflect.api.JavaUniverse.TypeTag[A4], arg5: scala.reflect.api.JavaUniverse.TypeTag[A5]): UserDefinedFunction

    Permalink

    Defines a user-defined function of 5 arguments as user-defined function (UDF).

    Defines a user-defined function of 5 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.

    Since

    1.3.0

  293. def udf[RT, A1, A2, A3, A4](f: (A1, A2, A3, A4) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3], arg4: scala.reflect.api.JavaUniverse.TypeTag[A4]): UserDefinedFunction

    Permalink

    Defines a user-defined function of 4 arguments as user-defined function (UDF).

    Defines a user-defined function of 4 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.

    Since

    1.3.0

  294. def udf[RT, A1, A2, A3](f: (A1, A2, A3) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2], arg3: scala.reflect.api.JavaUniverse.TypeTag[A3]): UserDefinedFunction

    Permalink

    Defines a user-defined function of 3 arguments as user-defined function (UDF).

    Defines a user-defined function of 3 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.

    Since

    1.3.0

  295. def udf[RT, A1, A2](f: (A1, A2) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1], arg2: scala.reflect.api.JavaUniverse.TypeTag[A2]): UserDefinedFunction

    Permalink

    Defines a user-defined function of 2 arguments as user-defined function (UDF).

    Defines a user-defined function of 2 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.

    Since

    1.3.0

  296. def udf[RT, A1](f: (A1) ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT], arg1: scala.reflect.api.JavaUniverse.TypeTag[A1]): UserDefinedFunction

    Permalink

    Defines a user-defined function of 1 arguments as user-defined function (UDF).

    Defines a user-defined function of 1 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.

    Since

    1.3.0

  297. def udf[RT](f: () ⇒ RT)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[RT]): UserDefinedFunction

    Permalink

    Defines a user-defined function of 0 arguments as user-defined function (UDF).

    Defines a user-defined function of 0 arguments as user-defined function (UDF). The data types are automatically inferred based on the function's signature.

    Since

    1.3.0

  298. def unbase64(e: Column): Column

    Permalink

    Decodes a BASE64 encoded string column and returns it as a binary column.

    Decodes a BASE64 encoded string column and returns it as a binary column. This is the reverse of base64.

    Since

    1.5.0

  299. def unhex(column: Column): Column

    Permalink

    Inverse of hex.

    Inverse of hex. Interprets each pair of characters as a hexadecimal number and converts to the byte representation of number.

    Since

    1.5.0

  300. def unix_timestamp(s: Column, p: String): Column

    Permalink

    Convert time string with given pattern (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html]) to Unix time stamp (in seconds), return null if fail.

    Convert time string with given pattern (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html]) to Unix time stamp (in seconds), return null if fail.

    Since

    1.5.0

  301. def unix_timestamp(s: Column): Column

    Permalink

    Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale, return null if fail.

    Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds), using the default timezone and the default locale, return null if fail.

    Since

    1.5.0

  302. def unix_timestamp(): Column

    Permalink

    Gets current Unix timestamp in seconds.

    Gets current Unix timestamp in seconds.

    Since

    1.5.0

  303. def upper(e: Column): Column

    Permalink

    Converts a string column to upper case.

    Converts a string column to upper case.

    Since

    1.3.0

  304. def var_pop(columnName: String): Column

    Permalink

    Aggregate function: returns the population variance of the values in a group.

    Aggregate function: returns the population variance of the values in a group.

    Since

    1.6.0

  305. def var_pop(e: Column): Column

    Permalink

    Aggregate function: returns the population variance of the values in a group.

    Aggregate function: returns the population variance of the values in a group.

    Since

    1.6.0

  306. def var_samp(columnName: String): Column

    Permalink

    Aggregate function: returns the unbiased variance of the values in a group.

    Aggregate function: returns the unbiased variance of the values in a group.

    Since

    1.6.0

  307. def var_samp(e: Column): Column

    Permalink

    Aggregate function: returns the unbiased variance of the values in a group.

    Aggregate function: returns the unbiased variance of the values in a group.

    Since

    1.6.0

  308. def variance(columnName: String): Column

    Permalink

    Aggregate function: alias for var_samp.

    Aggregate function: alias for var_samp.

    Since

    1.6.0

  309. def variance(e: Column): Column

    Permalink

    Aggregate function: alias for var_samp.

    Aggregate function: alias for var_samp.

    Since

    1.6.0

  310. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  311. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  312. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  313. def weekofyear(e: Column): Column

    Permalink

    Extracts the week number as an integer from a given date/timestamp/string.

    Extracts the week number as an integer from a given date/timestamp/string.

    Since

    1.5.0

  314. def when(condition: Column, value: Any): Column

    Permalink

    Evaluates a list of conditions and returns one of multiple possible result expressions.

    Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.

    // Example: encoding gender string column into integer.
    
    // Scala:
    people.select(when(people("gender") === "male", 0)
      .when(people("gender") === "female", 1)
      .otherwise(2))
    
    // Java:
    people.select(when(col("gender").equalTo("male"), 0)
      .when(col("gender").equalTo("female"), 1)
      .otherwise(2))
    Since

    1.4.0

  315. def window(timeColumn: Column, windowDuration: String): Column

    Permalink

    Generates tumbling time windows given a timestamp specifying column.

    Generates tumbling time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The windows start beginning at 1970-01-01 00:00:00 UTC. The following example takes the average stock price for a one minute tumbling window:

    val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType
    df.groupBy(window($"time", "1 minute"), $"stockId")
      .agg(mean("price"))

    The windows will look like:

    09:00:00-09:01:00
    09:01:00-09:02:00
    09:02:00-09:03:00 ...

    For a streaming query, you may use the function current_timestamp to generate windows on processing time.

    timeColumn

    The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType.

    windowDuration

    A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers.

    Since

    2.0.0

  316. def window(timeColumn: Column, windowDuration: String, slideDuration: String): Column

    Permalink

    Bucketize rows into one or more time windows given a timestamp specifying column.

    Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The windows start beginning at 1970-01-01 00:00:00 UTC. The following example takes the average stock price for a one minute window every 10 seconds:

    val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType
    df.groupBy(window($"time", "1 minute", "10 seconds"), $"stockId")
      .agg(mean("price"))

    The windows will look like:

    09:00:00-09:01:00
    09:00:10-09:01:10
    09:00:20-09:01:20 ...

    For a streaming query, you may use the function current_timestamp to generate windows on processing time.

    timeColumn

    The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType.

    windowDuration

    A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example, 1 day always means 86,400,000 milliseconds, not a calendar day.

    slideDuration

    A string specifying the sliding interval of the window, e.g. 1 minute. A new window will be generated every slideDuration. Must be less than or equal to the windowDuration. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar.

    Since

    2.0.0

  317. def window(timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String): Column

    Permalink

    Bucketize rows into one or more time windows given a timestamp specifying column.

    Bucketize rows into one or more time windows given a timestamp specifying column. Window starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in the order of months are not supported. The following example takes the average stock price for a one minute window every 10 seconds starting 5 seconds after the hour:

    val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType
    df.groupBy(window($"time", "1 minute", "10 seconds", "5 seconds"), $"stockId")
      .agg(mean("price"))

    The windows will look like:

    09:00:05-09:01:05
    09:00:15-09:01:15
    09:00:25-09:01:25 ...

    For a streaming query, you may use the function current_timestamp to generate windows on processing time.

    timeColumn

    The column or the expression to use as the timestamp for windowing by time. The time column must be of TimestampType.

    windowDuration

    A string specifying the width of the window, e.g. 10 minutes, 1 second. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. Note that the duration is a fixed length of time, and does not vary over time according to a calendar. For example, 1 day always means 86,400,000 milliseconds, not a calendar day.

    slideDuration

    A string specifying the sliding interval of the window, e.g. 1 minute. A new window will be generated every slideDuration. Must be less than or equal to the windowDuration. Check org.apache.spark.unsafe.types.CalendarInterval for valid duration identifiers. This duration is likewise absolute, and does not vary according to a calendar.

    startTime

    The offset with respect to 1970-01-01 00:00:00 UTC with which to start window intervals. For example, in order to have hourly tumbling windows that start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide startTime as 15 minutes.

    Since

    2.0.0

  318. def year(e: Column): Column

    Permalink

    Extracts the year as an integer from a given date/timestamp/string.

    Extracts the year as an integer from a given date/timestamp/string.

    Since

    1.5.0

Deprecated Value Members

  1. def approxCountDistinct(columnName: String, rsd: Double): Column

    Permalink

    Annotations
    @deprecated
    Deprecated

    (Since version 2.1.0) Use approx_count_distinct

    Since

    1.3.0

  2. def approxCountDistinct(e: Column, rsd: Double): Column

    Permalink

    Annotations
    @deprecated
    Deprecated

    (Since version 2.1.0) Use approx_count_distinct

    Since

    1.3.0

  3. def approxCountDistinct(columnName: String): Column

    Permalink

    Annotations
    @deprecated
    Deprecated

    (Since version 2.1.0) Use approx_count_distinct

    Since

    1.3.0

  4. def approxCountDistinct(e: Column): Column

    Permalink

    Annotations
    @deprecated
    Deprecated

    (Since version 2.1.0) Use approx_count_distinct

    Since

    1.3.0

  5. def monotonicallyIncreasingId(): Column

    Permalink

    A column expression that generates monotonically increasing 64-bit integers.

    A column expression that generates monotonically increasing 64-bit integers.

    The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.

    As an example, consider a DataFrame with two partitions, each with 3 records. This expression would return the following IDs:

    0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.0) Use monotonically_increasing_id()

    Since

    1.4.0

  6. def toDegrees(columnName: String): Column

    Permalink

    Annotations
    @deprecated
    Deprecated

    (Since version 2.1.0) Use degrees

    Since

    1.4.0

  7. def toDegrees(e: Column): Column

    Permalink

    Annotations
    @deprecated
    Deprecated

    (Since version 2.1.0) Use degrees

    Since

    1.4.0

  8. def toRadians(columnName: String): Column

    Permalink

    Annotations
    @deprecated
    Deprecated

    (Since version 2.1.0) Use radians

    Since

    1.4.0

  9. def toRadians(e: Column): Column

    Permalink

    Annotations
    @deprecated
    Deprecated

    (Since version 2.1.0) Use radians

    Since

    1.4.0

Inherited from AnyRef

Inherited from Any

Aggregate functions

Collection functions

Date time functions

Math functions

Misc functions

Non-aggregate functions

Sorting functions

String functions

UDF functions

Window functions

Support functions for DataFrames