sparkSpark’s primary abstraction is a distributed collection of items called a Dataset. Datasets can be created from Hadoop InputFormats (such as HDFS files) or by transforming other Datasets.Downloading. Get Spark from the downloads page of the project website. This documentation is for Spark version 3.5.5. Spark uses Hadoop’s client libraries for HDFS and YARN. Downloads