This CSV file reader avoids Java’s String.split () in favor of a more comprehensive approach to organizing and parsing string data for easy access and reading.
CSV files are extensively used in data interchange between applications. They’re especially useful when the only structure to the data being exchanged is rows and columns. This format is particularly popular as the data can be imported into Microsoft Excel and used for charts and visualization.
In this article, we present an easy-to-use class for parsing and reading CSV data in Java. The class allows retrieval of each row of the CSV file as an array of columns. This row can then be processed further for filtering, inserting into a database, etc.
The CSV format defines certain conventions which are commonly used in applications that import and export CSV data. One of the most common applications used for visualizing CSV data is Excel. Many applications including follow these conventions and hence a CSV reader must take these into consideration. Some of these are:
Commas are used to separate fields. And a Carriage-Return Line-Feed (CRLF) combination is used to separate rows.
When commas need to be included as a part of a field value, it must be quoted with double-quotes (“) .
Multi-line fields can be present in the CSV file and these fields must also be quoted.
Double quotes can be included within a field by repeating the double-quote character.
A CSV file generated from an application on Windows might include a BOM (Byte Order Mark) character at the very beginning of the file. This character, if present, can be used to determine the encoding of the file from among UTF-8, UTF-16BE (Big Ending) or UTF-16LE (Little Endian) . The CSV Reader module (presented next) uses a routine to strip the BOM from the CSV file.
Commas can be quoted with the double quote character and included in a field.
Single instances of double-quotes are read and stripped away from the field value; they are assumed to quote text containing commas and new-lines.
To include double quotes as part of the field, it must be repeated and enclosed within double quotes.
Text spanning across lines is sometimes included within CSV data. The class handles this case also without hiccups.