pyspark.RDD.takeOrdered#
- RDD.takeOrdered(num, key=None)[source]#
Get the N elements from an RDD ordered in ascending order or as specified by the optional key function.
New in version 1.0.0.
- Parameters
- numint
top N
- keyfunction, optional
a function used to generate key for comparing
- Returns
- list
the top N elements
Notes
This method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver’s memory.
Examples
>>> sc.parallelize([10, 1, 2, 9, 3, 4, 5, 6, 7]).takeOrdered(6) [1, 2, 3, 4, 5, 6] >>> sc.parallelize([10, 1, 2, 9, 3, 4, 5, 6, 7], 2).takeOrdered(6, key=lambda x: -x) [10, 9, 7, 6, 5, 4] >>> sc.emptyRDD().takeOrdered(3) []