DataFrame Transformation
API
This method applies a transformation to a provided Spark DataFrame, the specifics of which are determined by the desc
parameter:
- param desc: A natural language string that outlines the specific transformation to be applied on the DataFrame.
- param cache: If
True
, fetches cached data, if available. IfFalse
, retrieves fresh data and updates cache. - return: Returns a new Spark DataFrame that is the result of applying the specified transformation on the input DataFrame.
Example
Given the following DataFrame df
:
df = spark_ai._spark.createDataFrame(
[
("Normal", "Cellphone", 6000),
("Normal", "Tablet", 1500),
("Mini", "Tablet", 5500),
("Mini", "Cellphone", 5000),
("Foldable", "Cellphone", 6500),
("Foldable", "Tablet", 2500),
("Pro", "Cellphone", 3000),
("Pro", "Tablet", 4000),
("Pro Max", "Cellphone", 4500)
],
["product", "category", "revenue"]
)
You can write English to perform transformations. For example:
df.ai.transform("What are the best-selling and the second best-selling products in every category?").show()
product | category | revenue |
---|---|---|
Foldable | Cellphone | 6500 |
Nromal | Cellphone | 6000 |
Mini | Tablet | 5500 |
Pro | Tablet | 4000 |
Category | Normal | Mini | Foldable | Pro | Pro Max |
---|---|---|---|---|---|
Cellphone | 6000 | 5000 | 6500 | 3000 | 4500 |
Tablet | 1500 | 5500 | 2500 | 4000 | null |
For a detailed walkthrough of the transformations, please refer to our transform_dataframe.ipynb notebook.