Data Ingestion
API
spark_ai.create_df(
desc: str,
columns: Optional[List[str]] = None,
cache: bool = True) -> DataFrame
Given a SparkAI instance spark_ai
, you can use this method to create a Spark DataFrame by querying an LLM from web search result.
- param desc: the description of the result DataFrame, which will be used for web searching
- param columns: the expected column names in the result DataFrame
- param cache: If
True
, fetches cached data, if available. IfFalse
, retrieves fresh data and updates cache. - return: a Spark DataFrame
Example
If you have set up the Google Python client, you can ingest data via search engine:
Otherwise, you can ingest data via URL:auto_df = spark_ai.create_df("https://www.carpro.com/blog/full-year-2022-national-auto-sales-by-brand")
Take a look at the data:
rank | brand | us_sales_2022 | sales_change_vs_2021 |
---|---|---|---|
1 | Toyota | 1849751 | -9 |
2 | Ford | 1767439 | -2 |
3 | Chevrolet | 1502389 | 6 |
4 | Honda | 881201 | -33 |
5 | Hyundai | 724265 | -2 |