pyspark.RDD.takeOrdered#

RDD.takeOrdered(num, key=None)[source]#

Get the N elements from an RDD ordered in ascending order or as specified by the optional key function.

New in version 1.0.0.

Parameters
numint

top N

keyfunction, optional

a function used to generate key for comparing

Returns
list

the top N elements

Notes

This method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver’s memory.

Examples

>>> sc.parallelize([10, 1, 2, 9, 3, 4, 5, 6, 7]).takeOrdered(6)
[1, 2, 3, 4, 5, 6]
>>> sc.parallelize([10, 1, 2, 9, 3, 4, 5, 6, 7], 2).takeOrdered(6, key=lambda x: -x)
[10, 9, 7, 6, 5, 4]
>>> sc.emptyRDD().takeOrdered(3)
[]