 Learn about Scala in practice CSV Files in Scala 2025-03-22 04:15 UTC by Emmanouil Varvarigos 1. Introduction
In this tutorial, we’ll demonstrate different ways to read and write CSV files in Scala. Of course, our demonstration wouldn’t be complete if we didn’t explore some of the most popular libraries that provide CSV read and write capabilities.
Hence, we included sections for Scala CSV, Open CSV, and Apache Commons CSV libraries.
2. Building Blocks
To make our examples comparable and easier to work with, let’s define two Scala traits that every CSV library will implement, namely the CommaSeparatedValuesWriter and the CommaSeparatedValuesReader.
Specifically, the CommaSeparatedValuesWriter defines a method that yields a convenient CSV file digest:
trait CommaSeparatedValuesReader {
def read(file: File): CSVReadDigest
}
The CSVReadDigest, which is the return type of the read function, contains headers and rows as properties:
case class CSVReadDigest(headers: Seq[String], rows: Seq[Seq[String]])
Equally with the CommaSeparatedValuesWriter, the CommaSeparatedValuesWrites defines a method that writes headers and rows to a given file:
trait CommaSeparatedValuesWriter {
def write(
file: File,
headers: Seq[String],
rows: Seq[Seq[String]]
): Try[Unit]
}
3. PrintWriter and BufferedReader
Before we import any external libraries, it’s useful to showcase how we can write and read a CSV file using readily available Java tools, the PrintWriter and BufferedReader.
3.1. Write
The SimpleCSVWriter prints the file contents by iterating through the input data and appending them to the underlying PrintWriter:
class SimpleCSVWriter extends CommaSeparatedValuesWriter {
override def write(
file: File,
headers: Seq[String],
rows: Seq[Seq[String]]
): Try[Unit] = Try {
val writer = new PrintWriter(file)
writer.println(headers.mkString(","))
rows.foreach(row => writer.println(row.mkString(",")))
writer.close()
}
}
3.2. Read
The SimpleCSVReader reads a file’s contents by leveraging the BufferedReader interface. Additionally, the file contents are exhausted by recursively calling the readLinesRecursively method, which invokes the BufferedReader readline method for each line until the end of the file is reached:
class SimpleCSVReader extends CommaSeparatedValuesReader {
override def read(file: File): CSVReadDigest = {
val in = new InputStreamReader(new FileInputStream(file))
val bufferedReader = new BufferedReader(in)
@tailrec
def readLinesRecursively(
currentBufferedReader: BufferedReader,
result: Seq[Seq[String]]
): Seq[Seq[String]] = {
currentBufferedReader.readLine() match {
case null => result
case line =>
readLinesRecursively(
currentBufferedReader,
result :+ line.split(",").toSeq
)
}
}
val csvLines = readLinesRecursively(bufferedReader, List())
bufferedReader.close()
CSVReadDigest(
csvLines.head,
csvLines.tail
)
}
}
4. Scala CSV
Scala CSV is a Scala library that provides traits with methods that accept and return Scala data structures making CSV handling less cumbersome since no Java-to-Scala conversions are needed.
4.1. Write
The library’s trait that provides CSV read capabilities is the CSVWriter. The ScalaCSVWriter wraps the CSVWriter which trivializes the task of writing lines to a CSV file by exposing functions that accept Seq[Any] arguments:
class ScalaCSVWriter extends CommaSeparatedValuesWriter {
override def write(
file: File,
headers: Seq[String],
rows: Seq[Seq[String]]
): Try[Unit] = Try {
val writer = CSVWriter.open(file)
writer.writeRow(headers)
writer.writeAll(rows)
writer.close()
}
}
4.2. Read
Likewise, reading a file with ScalaCSV CSVReader is straightforward. The reader’s method all returns the file as a List[List[String]] hence the ScalaCSVReader wrapper implementation is quite short:
class ScalaCSVReader extends CommaSeparatedValuesReader {
override def read(file: File): CSVReadDigest = {
val reader = CSVReader.open(file)
val all = reader.all()
reader.close()
CSVReadDigest(all.head, all.tail)
}
}
5. OpenCSV
OpenCSV is a popular and widely used Java library for reading and writing CSV files thus we’ll proceed to create our own Scala wrapper implementations to showcase it.
5.1. Write
To use the OpenCSV CSVWriter’s interface writeAll method, our input must be first transformed to a Java Iterable. Let’s write the OpenCSVWriter:
class OpenCSVWriter extends CommaSeparatedValuesWriter {
override def write(
file: File,
headers: Seq[String],
rows: Seq[Seq[String]]
): Try[Unit] = Try(
new CSVWriter(new BufferedWriter(new FileWriter(file)))
).flatMap((csvWriter: CSVWriter) =>
Try {
csvWriter.writeAll(
(headers +: rows).map(_.toArray).asJava,
false
)
csvWriter.close()
}
)
}
Furthermore, the OpenCSV CSVWriter interface includes methods that accept java.sql.ResultSet arguments, thus making it extremely useful when dealing with data fetched from databases.
5.2. Read
Similar to the SimpleCSVReader, the OpenCSVReader uses recursion to read a file’s contents. The recursive method readLinesRecursively returns the CSV rows by using the readNext CSVReader‘s function as an iterator:
class OpenCSVReader extends CommaSeparatedValuesReader {
override def read(file: File): CSVReadDigest = {
val reader = new CSVReader(
new InputStreamReader(new FileInputStream(file))
)
@tailrec
def readLinesRecursively(
currentReader: CSVReader,
result: Seq[Seq[String]]
): Seq[Seq[String]] = {
currentReader.readNext() match {
case null => result
case line => readLinesRecursively(currentReader, result :+ line.toSeq)
}
}
val csvLines = readLinesRecursively(reader, List())
reader.close()
CSVReadDigest(
csvLines.head,
csvLines.tail
)
}
}
6. Apache Commons CSV
The Apache Commons CSV library enables us to read and write CSV files in various formats. It’s been evolving since 2014 and is widely used, mainly by Java projects.
6.1. Write
In contrast with the other implementations we’ve provided in our examples, the Apache Commons CSVWriter‘s format is configured using a second argument, the CSVFormat. The CSVFormat builder allows us to choose a format and configure it by overriding any property that suits us. In our example, namely the ApacheCommonsCSVWriter, we override the headers property of the default CSVFormat:
class ApacheCommonsCSVWriter extends CommaSeparatedValuesWriter {
override def write(
file: File,
headers: Seq[String],
rows: Seq[Seq[String]]
): Try[Unit] = Try {
val csvFormat = CSVFormat.DEFAULT
.builder()
.setHeader(headers: _*)
.build()
val out = new FileWriter(file)
val printer = new CSVPrinter(out, csvFormat)
rows.foreach(row => printer.printRecord(row: _*))
printer.close()
}
}
6.2. Read
The CSVFormat is also used for CSV parsing but in a different way. Let’s notice that in our implementation, the empty setHeader method call is the Apache Commons CSV way of telling the parser to automatically parse the headers from the first line of the file. The call to the parse function of the CSVFormat yields the CSVParser interface, which provides a variety of methods for reading the lines of the input file. So, let’s write our example without forgetting that the returned objects need Java-to-Scala transformations:
class ApacheCommonsCSVReader extends CommaSeparatedValuesReader {
override def read(file: File): CSVReadDigest = {
val in = new InputStreamReader(new FileInputStream(file))
val csvParser = CSVFormat.DEFAULT
.builder()
.setHeader()
.build()
.parse(in)
val result = CSVReadDigest(
csvParser.getHeaderNames.asScala.toSeq,
csvParser.getRecords.asScala.map(r => r.values().toSeq).toSeq
)
csvParser.close()
result
}
}
7. CSV Delimiters
Before we conclude our short tutorial, it’s crucial to mention the most frequent issue encountered in a CSV dataset is the comma presence in the delimited values leading to parse errors and wrong results.
A good compromise is the choice of a delimiter that is very unlikely to be present in our data such as a special character or a complex sequence of characters. Alternatively, some use quotes but then quotes must be escaped as well.
To conclude, we suggest that we should have a good knowledge of the data contents of our files before choosing a delimiter for our dataset.
8. Conclusion
In this short tutorial, we demonstrated how to read and write CSV files with Scala.
Additionally, we included some examples with Scala and Java libraries for good measure. In the end, we emphasized the common delimiter issue that developers usually have to provide a solution for. The post CSV Files in Scala first appeared on Baeldung on Scala.
Content mobilized by FeedBlitz RSS Services, the premium FeedBurner alternative. |