Skip to content

Drop Column example is incorrect #14

@jdfrost

Description

@jdfrost

regarding item: src/main/scala/com/sparkbyexamples/spark/dataframe/examples/DropColumn.scala

I am running these examples in Azure PySpark 3.3 and I noticed that df.drop('colname') does NOT drop the column from the df dataframe. It only removes it from the value returned by the current pyspark statement.

Try these three lines in pyspark:

df.drop("first_name").printSchema() #prints the schema without the first_name column, same as in your examples.

df.drop("first_name"). #run this without displaying output.
df.printSchema(). #prints the schema WITH the first_name column.

Conclusion: the df.drop('col') statement does NOT change the df dataframe.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions