pyspark.sql.DataFrame.to

DataFrame.to(schema: pyspark.sql.types.StructType) → pyspark.sql.dataframe.DataFrame[source]

Returns a new DataFrame where each row is reconciled to match the specified schema.

New in version 3.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
schemaStructType

Specified schema.

Returns
DataFrame

Reconciled DataFrame.

Notes

  • Reorder columns and/or inner fields by name to match the specified schema.

  • Project away columns and/or inner fields that are not needed by the specified schema.

    Missing columns and/or inner fields (present in the specified schema but not input DataFrame) lead to failures.

  • Cast the columns and/or inner fields to match the data types in the specified schema,

    if the types are compatible, e.g., numeric to numeric (error if overflows), but not string to int.

  • Carry over the metadata from the specified schema, while the columns and/or inner fields

    still keep their own metadata if not overwritten by the specified schema.

  • Fail if the nullability is not compatible. For example, the column and/or inner field

    is nullable but the specified schema requires them to be not nullable.

Examples

>>> from pyspark.sql.types import StructField, StringType
>>> df = spark.createDataFrame([("a", 1)], ["i", "j"])
>>> df.schema
StructType([StructField('i', StringType(), True), StructField('j', LongType(), True)])
>>> schema = StructType([StructField("j", StringType()), StructField("i", StringType())])
>>> df2 = df.to(schema)
>>> df2.schema
StructType([StructField('j', StringType(), True), StructField('i', StringType(), True)])
>>> df2.show()
+---+---+
|  j|  i|
+---+---+
|  1|  a|
+---+---+