Installation and Setup
pyspark-ai can be installed via pip from PyPI:
pyspark-ai can also be installed with optional dependencies to enable certain functionality. For example, to install pyspark-ai with the optional dependencies to plot data from a DataFrame:
For a full list of optional dependencies, see the Optional Dependencies section.
Configuring OpenAI LLMs
As of July 2023, we have found that the GPT-4 works optimally with the English SDK. This superior AI model is readily accessible to all developers through the OpenAI API.
To use OpenAI's Language Learning Models (LLMs), you can set your OpenAI secret key as the
OPENAI_API_KEY environment variable. This key can be found in your OpenAI account:
SparkAIinstances will use the GPT-4 model. However, you're encouraged to experiment with creating and implementing other LLMs, which can be passed during the initialization of
SparkAIinstances for various use-cases.
You can also pass other LLMs to construct the SparkAI instance. For example, by following this guide:
As per Microsoft's Data Privacy page, using the Azure OpenAI service can provide better data privacy and security.
pyspark-ai has many optional dependencies that are only used for specific methods.
For example, ingestion via
spark_ai.create_df("...") requires the
requests package, while plotting via
df.plot() requires the
If the optional dependency is not installed, pyspark-ai will raise an Exception if a method requiring that dependency is called.
If using pip, optional pyspark-ai dependencies can be installed as optional extras, e.g.
pip install "pyspark-ai[ingestion, plot]".
All optional dependencies can be installed with
pip install "pyspark-ai[all]".
Specific groups and their associated dependencies are listed below. For more details about groups, see the README.md.
|Plot||Generate visualizations for DataFrame||pandas, plotly||
|Vector Search||Improve query generation accuracy in transformations||faiss-cpu, sentence-transformers, torch||
|Ingestion||Ingest data into a DataFrame, from URLs or descriptions||requests, tiktoken, beautifulsoup4, google-api-python-client||
|Spark Connect||Support Spark Connect||grpcio, grpcio-status, pyarrow||