As in the previous post we studied how to calculate the histogram on a RDD[String]. Chapter 10 Data Manipulation with SparkR Now that we have our two datasets saved as Spark DataFrames, we can conduct standard data manipulation techniques to visualize and explore our data. Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in: DataFrame (this class), Column, and functions. It can be used with the GROUP BY clause within SQL queries or DSL syntax within DataFrame/Dataset APIs. This is easiest to demonstrate with an example. While you cannot modify a column as such, you may operate on a column and return a new DataFrame reflecting that change. Spark automatically removes duplicated “DepartmentID” column, so column names are unique and one does not need to use table prefix to address them. createDataFrame(Seq( (1, 1, 2, 3, 8, 4, 5). Currently, both commands uses `result` as a column name. col( "age" ); // in Java Note that the Column type can also be manipulated through its various functions. While Parquet is able to compress most fields well, the quality scores are noisy and compress poorly without using lossy compression. This is the code that most similar to R I can come up with:. For further information on Delta Lake, see Delta Lake. aggregate(), PairRDDFunctions. groupBy() can be used in both unpaired & paired RDDs. This is similar to what we have in SQL like MAX, MIN, SUM etc. Spark DataFrame: Computing row-wise mean (or any aggregate operation) I have a Spark DataFrame loaded up in memory, and I want to take the mean (or any aggregate operation) over the columns. These examples are extracted from open source projects. 1 $\begingroup$. appName("Test"). This is a variant of groupBy that can only group by existing columns using column names (i. The names of the arguments to the case class are read using reflection and they become the names of the columns. = id scala> id. The CSV contains the list of restaurant inspections in NYC. Below example creates a “fname” column from “name. // IMPORT DEPENDENCIES import org. In Part 4 of this tutorial series, you'll learn how to link external and public data to your existing data to gain insights for your sales team. While Parquet is able to compress most fields well, the quality scores are noisy and compress poorly without using lossy compression. How to rename multiple columns of Dataframe in Spark Scala? If you need to select only some columns and rename it this is the another option. In the couple of months since, Spark has already gone from version 1. Introduction: The Big Data Problem. The aggregateByKey function is used to aggregate the values for each key and adds the potential to return a differnt value type. 4 release, DataFrames in Apache Spark provides improved support for statistical and mathematical functions, including random data generation, summary and descriptive statistics, sample covariance and correlation, cross tabulation, frequent items, and mathematical functions. User Defined Aggregate Functions - Scala. for example 100th row in above R equivalent codeThe getrows() function below should get the specific rows you want. What to do: [Contributed by Arijit Tarafdar and Lin Chan]. A pivot is an aggregation where one (or more in the general case) of the grouping columns has its distinct values transposed into individual columns. Second , about Scala vs R. Here are a few examples of parsing nested data structures in JSON using Spark DataFrames (examples here done with Spark 1. This means it lets us work in a context of rows and columns. So in this column we have the Scala types, so all of the types that we're used to already seeing in this course, and these are the types that SQL sees. Sparkour is an open-source collection of programming recipes for Apache Spark. One of the major abstractions in Apache Spark is the SparkSQL DataFrame, which is similar to the DataFrame construct found in R and Pandas. Drop a variable (column) Note: axis=1 denotes that we are referring to a column, not a row. This includes models deployed to the flow (re-run the training recipe), models in analysis (retrain them before deploying) and API package models (retrain the flow saved model and build a new package). Column class and define these methods yourself or leverage the spark-daria project. MAX(Column_Name) MIN(Column_Name) COUNT(Column_Name) AVG(Column_Name) Note: All the Aggregate functions are With Capital letters. e if we want to remove duplicates purely based on a subset of columns and retain all columns in the original data frame. See also: Multiple Aggregate operations on the same column of a spark dataframe. Apache Spark is a cluster computing system. MongoDB offers a variety of cloud products, including MongoDB Stitch, MongoDB Atlas, MongoDB Cloud Manager, and MongoDB Ops Manager. While there are many funtions in the PairRDDFunctions class, today we are going to focus on aggregateByKey. How to rename multiple columns of Dataframe in Spark Scala? If you need to select only some columns and rename it this is the another option. The aggregateByKey function requires 3 parameters:. I want to rename a column in pandas dataframe. for example 100th row in above R equivalent codeThe getrows() function below should get the specific rows you want. * // Compute the sum of earnings for each year by course with each course as a separate column. There are some Functions that can directly apply to columns. Documentation is available here. In spark, groupBy is a transformation operation. The team used hive to query,we try to move it to spark-sql. While there are many funtions in the PairRDDFunctions class, today we are going to focus on aggregateByKey. The available aggregate methods are defined in functions. You don't need to do add and delete steps. In Apache Spark, a DataFrame is a distributed collection of rows under named columns. How to do an aggregate function on a Spark Dataframe using collect_set In order to explain usage of collect_set, Lets create a Dataframe with 3 columns. Extracts a value or values from a complex type. How to do an aggregate function on a Spark Dataframe using collect_set In order to explain usage of collect_set, Lets create a Dataframe with 3 columns. Spark Dataframe Add Column If Not Exists Scala. baahu November 26, 2016 No Comments on SPARK :Add a new column to a DataFrame using UDF and withColumn() Tweet In this post I am going to describe with example code as to how we can add a new column to an existing DataFrame using withColumn() function of DataFrame. If you want to rename a single column and keep the rest as it is: from pyspark. Aggregate multiple columns at once Explain the aggregate functionality in Spark ; Adding two vectors by names Explanation of the aggregate scala function. Apache arises as a new engine and programming model for data analytics. The Scala foldLeft method can be used to iterate over a data structure and perform multiple operations on a Spark DataFrame. This article expects that you have enough hands on experience in ADO. For instance, this is the setting I use. When we generate a dataframe by doing grouping, and perform join on original dataframe with aggregate column, we get AnalysisException. So, in this post, we will walk through how we can add some additional columns with the source data. sparkContext) // Import Snappy extensions scala> import snappy. How can I do that. How to delete/rename the files/folder in Azure data lake and blob store using spark scala ? We could see unexpected behaviour of python logging in databricks; Files vs. # Manipulating data. Using Spark Scala APIs. Create a dummy RDD[String] and apply the aggregate method to calculate histogram The 2nd function of aggregate method is to merge 2. The Apache Spark DataFrame API provides a rich set of functions (select columns, filter, join, aggregate, and so on) that allow you to solve common data analysis problems efficiently. Today, we're going to continue talking about RDDs, Data Frames and Datasets in Azure Databricks. Let's try the simplest example of creating a dataset by applying a toDS() function to a sequence of numbers. Let’s have some overview first then we’ll understand this operation by some examples in Scala, Java and Python languages. Image Classification with Pipelines 7. 4 start supporting Window functions. Second , about Scala vs R. From your suggestion, maybe I can find the answer by comparing each columns and count the (duplicated rows == row count). SQLContext(sc) Example. If you are interested in using Python instead, check out Spark SQL JSON in Python tutorial page. Scala offers a cool feature where you can rename a class when you import it, including both Scala and Java classes. As in the previous post we studied how to calculate the histogram on a RDD[String]. It’s also possible to use R base functions, but they require more typing. expr res1: org. For the last 4 years, David has been the lead architect for the Watson Core UI & Tooling team based in Littleton, Massachusetts. This topic contains examples of a UDAF and how to register them for use in Apache Spark SQL. It's origin goes back to 2009, and the main reasons why it has gained so much importance in the past recent years are due to changes in enconomic factors that underline computer applications and hardware. Honestly, if you came here to learn about UDAFs because you are trying to use groupBy and want to do something more than a simple count or sum of the rows then stop everything, go to the org. For example, given a class Person with two fields, name (string) and age (int), an encoder is used to tell Spark to generate code at runtime to serialize the Person object into a binary structure. Lets take the below Data for demonstrating about how to use groupBy in Data Frame. To start a Spark’s interactive shell:. Thumbnail rendering works for any images successfully read in through the readImages function. sum val exprs = df. There is a lot of cool engineering behind Spark DataFrames such as code generation, manual memory management and Catalyst optimizer. Create a SnappySession SnappySession extends the SparkSession so you can mutate data, get much higher performance, etc. having great APIs for Java, Python. So, in this post, we will walk through how we can add some additional columns with the source data. It has been updated for Scala 2. Apache Spark - DZone - Refcardz Over a. Spark RDD groupBy function returns an RDD of grouped items. Alter Table or View — Databricks Documentation View Azure Databricks documentation Azure docs. as of now i come up with following code which only replaces a single column name. Machine Learning Deep Learning Python Statistics Scala PostgreSQL Command Line Regular Expressions Mathematics AWS. Spark Connector Scala Guide > Filters and Aggregation Depending on the dataset, filtering data using MongoDB's aggregation framework may perform more efficiently than the direct use of RDD filters and dataset filters. pass and spark. This method is very expensive and requires a complete reshuffle of all of your data to ensure all records with the same key end up on the same Spark Worker Node. Before Scala 2. * (Scala-specific) Pivots a column of the current [[DataFrame]] and preform the specified * aggregation. The case class defines the schema of the table. Left outer join is a very common operation, especially if there are nulls or gaps in a data. In addition to this, we will also check how to drop an existing column and rename the column in the spark data frame. Actually I want to count is all the duplicated columns in a dataframe. To rename a dataframe using Spark, you just have to make use of the withColumnRenamed() method. You can duplicate an existing Modeling Task, and create the copy in any project, attached to an analysis on any dataset. Is there a convenient way to rename multiple columns from a dataset? I thought about imposing a schema with as but the key column is a struct (due to the groupBy operation), and I can't find out how to define a case class with a StructType in it. Spark DataFrames provide an API to operate on tabular data. Column class and define these methods yourself or leverage the spark-daria project. GROUPING__ID function is the solution to that. Under the hood, it is an Apache Spark DSL (domain-specific language) wrapper for Apache Spark DataFrames. While Parquet is able to compress most fields well, the quality scores are noisy and compress poorly without using lossy compression. retainGroupColumns to false. def persist (self, storageLevel = StorageLevel. A dataframe is-Mutable. Window Functions. These examples are extracted from open source projects. The names of the arguments to the case class are read using reflection and they become the names of the columns. You can access the standard functions using the following import statement in your Scala application. In-Memory computation and Parallel-Processing are some of the major reasons that Apache Spark has become very popular in the big data industry to deal with data products at large scale and perform faster analysis. Use the following command to create SQLContext. Series to a scalar value, where each pandas. An example is if you want to find the minimum temperature of a city. Active 1 year, 11 months ago. Designed as an efficient way to navigate the intricacies of the Spark ecosystem, Sparkour aims to be an approachable, understandable, and actionable cookbook for distributed data processing. Before Scala 2. Second , about Scala vs R. Spark doesn’t provide a clean way to chain SQL function calls, so you will have to monkey patch the org. In Spark, dataframe is actually a wrapper around RDDs, the basic data structure in Spark. In this tutorial, you will learn how to rename the columns of a data frame in R. AggregateByKey. Following are the two important properties that an aggregation function should have. These operations are very similar to the operations available in the data frame abstraction in R or Python. With Safari, you learn the way you learn best. Over 12+ years of experience in Software development lifecycle - Software analysis, design, development, testing, deployment and maintenance. XGBoost models trained with prior versions of DSS must be retrained when upgrading to 5. option("inferSchema", "true"). In order to add on, it may not be the case that we want to groupBy all columns other than the column(s) in aggregate function i. binaryAsString flag tells Spark SQL to treat binary-encoded data as strings ( more doc ). You can vote up the examples you like and your votes will be used in our system to product more good examples. The basic syntax to rename a class on import looks like this: An interesting question is, "Why would I want to rename a class on import?" I've found that I do it to avoid namespace. DataFrame A distributed collection of data grouped into named columns. Seq list) A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments. This source is not for production use due to design contraints, e. For example, given a class Person with two fields, name (string) and age (int), an encoder is used to tell Spark to generate code at runtime to serialize the Person object into a binary structure. It builds on the usual Spark execution engine, where the main abstraction is the RDD: Resilient Distributed Dataset (you can think about it as a replicated, parallelised collection). SPARK-9576 is the ticket for Spark 1. Actually I want to count is all the duplicated columns in a dataframe. We’ll create the window by first ordering the data by the “uxt” column, and then creating a range that goes from 3600 milliseconds before this row’s timestamp to this row. A dataframe in Spark is similar to a SQL table, an R dataframe, or a pandas dataframe. Without them, if there were a column named alphabet, it would also match, and the replacement would be onebet. Note that the ^ and $ surrounding alpha are there to ensure that the entire string matches. Spark DataFrames were introduced in early 2015, in Spark 1. Spark – Add new column to Dataset. In the following example, we shall add a new column with name "new_col" with a constant value. How do I cast using a DataFrame? would creating a new column take more time than using Spark-SQL. Example - Spark - Add new column to Spark Dataset. Chapter 10 Data Manipulation with SparkR Now that we have our two datasets saved as Spark DataFrames, we can conduct standard data manipulation techniques to visualize and explore our data. How to rename multiple columns of Dataframe in Spark Scala? If you need to select only some columns and rename it this is the another option. 0+) Spark distribution will do. Here's an easy example of how to rename all columns in an Apache Spark DataFrame. having great APIs for Java, Python. will create the value for that given row in the DataFrame. option("inferSchema", "true"). Spark UDAF to calculate the most common element in a column or the Statistical Mode for a given column. Here is a quick exercise for doing it. scala> val snappy = new org. I recently took the Big Data Analysis with Scala and Spark course on Coursera and I highly recommend it. An Azure Databricks table is a collection of structured data. (Scala-specific) Returns a new DataFrame where each row has been expanded to zero or more rows by the provided function. sql import HiveContext, Row #Import Spark Hive SQL hiveCtx = HiveContext(sc) #Cosntruct SQL context. We can also perform aggregation on some specific columns which is equivalent to GROUP BY clause we have in typical SQL. In order to add on, it may not be the case that we want to groupBy all columns other than the column(s) in aggregate function i. The … - Selection from Scala and Spark for Big Data Analytics [Book]. _ import org. Sometimes it will display all the rows if you print the dataframe. This is similar to a LATERAL VIEW in HiveQL. Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in: DataFrame (this class), Column, and functions. Spark Dataframe Add Column If Not Exists Scala. The following examples train a tree to recognize digits (0 - 9) from the MNIST dataset of images of handwritten digits and then displays the tree. then rename the new column back to the original name. 2) as the associated value. You don't need to do add and delete steps. DataFrame A distributed collection of data grouped into named columns. Apache Spark has emerged as the premium tool for big data analysis and Scala is the preferred language for writing Spark applications. sql(query) but the syntax is a little cumbersome. Currently, both commands uses `result` as a column name. First method we can use is “agg”. js: Find user by username LIKE value. Spark data frames from CSV files: handling headers & column types Christos - Iraklis Tsatsoulis May 29, 2015 Big Data , Spark 15 Comments If you come from the R (or Python/pandas) universe, like me, you must implicitly think that working with CSV files must be one of the most natural and straightforward things to happen in a data analysis context. Apache arises as a new engine and programming model for data analytics. Pivot was first introduced in Apache Spark 1. The aggregateByKey function is used to aggregate the values for each key and adds the potential to return a differnt value type. NET Code 10/22/2013 4:29:19 PM. in a columnar format). There is at least one other way to move a file in Scala, but I found this to be the most direct approach. , array, map, and struct), and provides read and write access to ORC files. The aggregate function is applicable to both Scala's Mutable and Immutable collection data structures. Let's try the simplest example of creating a dataset by applying a toDS() function to a sequence of numbers. Below I've attached a piece of code and resulting exception to reproduce. TensorFrames is an open source created by Apache Spark contributors. This can be achieved using a plain SQL with spark. Dataset org. Example - Spark - Add new column to Spark Dataset. 1 Documentation - udf registration. Here's an easy example of how to rename all columns in an Apache Spark DataFrame. A pivot is an aggregation where one (or more in the general case) of the grouping columns has its distinct values transposed into individual columns. This article will only cover the usage of Window Functions with Scala DataFrame API. Note that the Spark Scala Shell // automatically creates a Spark context called "sc" and a // Spark SQL context called "sqlContext". ***You can control this behavior by setting some defaults of your own while importing Pandas. What is difference between class and interface in C#; Mongoose. So the better way to do this could be using dropDuplicates Dataframe API available in Spark 1. And setting up a cluster using just bare metal machines can be quite complicated and expensive. How to rename multiple columns of Dataframe in Spark Scala? If you need to select only some columns and rename it this is the another option. Make sure to study the simple examples in this. One of the major abstractions in Apache Spark is the SparkSQL DataFrame, which is similar to the DataFrame construct found in R and Pandas. Operating on Columns. 2 MongoDB 3. This blog post will show how to chain Spark SQL functions so you can avoid messy nested function calls that are hard to read. We will use both of these // in the commands below. Rename and Hide the Columns. ### What changes were proposed in this pull request? This PR adds an accumulator that computes a global aggregate over a number of rows. We have implemented SFO Crime Analysis with plain R, Shiny & R, and OpenRefine in the past and this time with Zeppelin & R. This may conflict in case the column itself has some null values. We can also perform aggregation on some specific columns which is equivalent to GROUP BY clause we have in typical SQL. How would I do that?. Drop a variable (column) Note: axis=1 denotes that we are referring to a column, not a row. 11 – Assessment Summary Databricks Certified Associate Developer for Apache Spark 2. 4 release, DataFrames in Apache Spark provides improved support for statistical and mathematical functions, including random data generation, summary and descriptive statistics, sample covariance and correlation, cross tabulation, frequent items, and mathematical functions. It can be used with the GROUP BY clause within SQL queries or DSL syntax within DataFrame/Dataset APIs. Configure Intellij Idea for Spark Application (SBT version) Before preparing for Spark environment in Intellij Idea, we should firstly make it possible to write scala in it. Occasionally this can be inconvenient when the output artifact for your Spark job is, say, a top % aggregate count of your data as a CSV that a non-technical co-worker needs for a report. While … - Selection from Scala and Spark for Big Data Analytics [Book]. how to remove/replace column name with whitespaces of a spark dataframe read from parquet file? Hot Network Questions Is the apartment I want to rent a scam?. Home » Spark Scala UDF to transform single Data frame column into multiple columns Protected: Spark Scala UDF to transform single Data frame column into multiple columns This content is password protected. Spark Dataframe Add Column If Not Exists Scala. So, here we are now, using Spark Machine Learning Library to solve a multi-class text classification problem, in particular, PySpark. _ import org. While there are many funtions in the PairRDDFunctions class, today we are going to focus on aggregateByKey. , array, map, and struct), and provides read and write access to ORC files. User Defined Aggregate Functions - Scala. I want a generic reduceBy function, that works like an RDD's reduceByKey, but will let me group data by any column in a Spark DataFrame. Dataframes from CSV files in Spark 1. 16, “How to Combine map and flatten with flatMap”. Alter Table or View — Databricks Documentation View Azure Databricks documentation Azure docs. A dataframe in Spark is similar to a SQL table, an R dataframe, or a pandas dataframe. XGBoost models trained with prior versions of DSS must be retrained when upgrading to 5. agg (exprs. Pivot was first introduced in Apache Spark 1. Improve Performance and Scalability of ADO. 1, I was trying to use the groupBy on the "count" column i have. These columns basically help to validate and analyze the data. Spark pair rdd reduceByKey, foldByKey and flatMap aggregation function example in scala and java – tutorial 3 November 1, 2017 adarsh Leave a comment When datasets are described in terms of key/value pairs, it is common to want to aggregate statistics across all elements with the same key. option("inferSchema", "true"). then rename the new column back to the original name. master("local. The CSV contains the list of restaurant inspections in NYC. "Apache Spark, Spark SQL, DataFrame, Dataset" Jan 15, 2017. Spark Scala how to process multiple columns in single loop 0 Aggregate multiple columns using methods that can't be called directly from GroupedData class (like “last()”) and rename them to original names. Spark RDD groupBy function returns an RDD of grouped items. Within the Visual ML tool, select Duplicate from a Modeling Task’s dropdown. Extracts a value or values from a complex type. sql Class DataFrame. Tables are equivalent to Apache Spark DataFrames. Within the Visual ML tool, select Duplicate from a Modeling Task’s dropdown. " Because you've created these aliases during the import process, the original (real) name of the class can't be used in your code. For image values generated. This blog post will show how to chain Spark SQL functions so you can avoid messy nested function calls that are hard to read. I’d like to compute aggregates on columns. …mands ## What changes were proposed in this pull request? This PR changes the name of columns returned by `SHOW PARTITION` and `SHOW COLUMNS` commands. This is similar to a LATERAL VIEW in HiveQL. When you have nested columns on Spark DatFrame and if you want to rename it, use withColumn on a data frame object to create a new column from an existing and we will need to drop the existing column. Reduce is an aggregation of elements using a function. 6, I’ve been working to add Pearson correlation aggregation functionality to Spark SQL. Series to a scalar value, where each pandas. spark dataset api with examples - tutorial 20 November 8, 2017 adarsh Leave a comment A Dataset is a strongly typed collection of domain-specific objects that can be transformed in parallel using functional or relational operations. Thanks for your reply, and I'm sorry I didn't describe it clear, it's my fault. The second code block appends the account name to the setting to specify credentials for a specific ADLS Gen 2 account. You can create Java objects, call their methods and inherit from Java classes transparently from Scala. NET Code 10/22/2013 4:29:19 PM. IntegerType)) With same column name, the column will be replaced with new one. a frame corresponding to the current row return a new value to for each row by an aggregate/window function Can use SQL grammar or DataFrame API. HOT QUESTIONS. If you want to make that code into a Scala method/function to move a file, here’s a little starter for you:. If you want to rename a single column and keep the rest as it is: from pyspark. The aggregateByKey function requires 3 parameters:. Histogram in Spark (1) By using implicit type conversion, we can add the helper method to the Map class and make the code looks better. Sometimes it will display all the rows if you print the dataframe. sum val exprs = df. Drop the “old” column. Though we have covered most of the examples in Scala here, the same concept can be used in PySpark to rename a DataFrame column (Python Spark). so clearly, the select operations have had an effect is terms of how the spark dataframe is used. The aggregation function is one of the expressions in Spark SQL. Apache Spark - DZone - Refcardz Over a. Apache Spark is a component of IBM Open Platform with Apache Spark and Apache Hadoop that includes Apache Spark. In this tutorial, I will show you how to configure Spark to connect to MongoDB, load data, and write queries. The names of the arguments to the case class are read using reflection and they become the names of the columns. Depending on your version of Scala, start the pyspark shell with a packages command line argument. This tutorial covers using Spark SQL with a JSON file input data source in Scala. Without them, if there were a column named alphabet, it would also match, and the replacement would be onebet. 1) as the key and the second item (i. Let's try the simplest example of creating a dataset by applying a toDS() function to a sequence of numbers. The first part of the…. Managed Tables - how data is stored physically? Unable to remove Data Lake Store; How to use data values of column as column names and filter results dynamically in u-sql. Note that the Spark Scala Shell // automatically creates a Spark context called "sc" and a // Spark SQL context called "sqlContext". Apache Spark is a cluster computing system. 5 alone; so, we thought it is a good time for revisiting the subject,. The Column class defines column operations, such as the minus operator shown below. In this post, we will look at withColumnRenamed() function in Apache Spark SQL API. Alter Table or View — Databricks Documentation View Azure Databricks documentation Azure docs. sql import HiveContext, Row #Import Spark Hive SQL hiveCtx = HiveContext(sc) #Cosntruct SQL context. It's origin goes back to 2009, and the main reasons why it has gained so much importance in the past recent years are due to changes in enconomic factors that underline computer applications and hardware. SURELY, there is/should be a simple, straightforward way to extract the current names of variables/columns in sparklyr, a la names() in base r. 4 release extends this powerful functionality of pivoting data to our SQL users as well. But, if we want to find the mean of a single column of our choice, we will use: >>> dataflair_df. In addition to this, we will also check how to drop an existing column and rename the column in the spark data frame. How to select particular column in Spark(pyspark)? Ask Question Asked 3 years, 9 months ago. As on date, if you Google for the Spark SQL data types, you won't be able to find a suitable document with the list of SQL data types and appropriate information about them. Written and test in Spark 2. for example 100th row in above R equivalent codeThe getrows() function below should get the specific rows you want. This is an excerpt from the Scala Cookbook (partially modified for the internet). If you want to rename a single column and keep the rest as it is: from pyspark. Histogram in Spark (1) By using implicit type conversion, we can add the helper method to the Map class and make the code looks better. scala> val snappy = new org. There are a few ways to read data into Spark as a dataframe. SparkSession. Pivot was first introduced in Apache Spark 1. Thumbnail rendering works for any images successfully read in through the readImages function.