pyspark.RDD.pipe#

RDD.pipe(command, env=None, checkCode=False)[source]#

Return an RDD created by piping elements to a forked external process.

New in version 0.7.0.

Parameters

commandstr: command to run.
envdict, optional: environment variables to set.
checkCodebool, optional: whether to check the return value of the shell command.

Returns

Examples

>>> sc.parallelize(['1', '2', '', '3']).pipe('cat').collect()
['1', '2', '', '3']