How I am coding a CSV file Import system in NodeJS?

In this post, I explain how I am coding a CSV file importation system using NodeJS.

As I have a wildcard on the technology to be used for importing and manipulating CSV files, I first start by checking the most popular CSV parsing libraries on Google.

There is a difference between parsing and generating libraries. Parsing means reading while generating, as the name suggests, means creating the file. Most libraries I've seen so far, do either one or the other.

I glance at the official npm urls and blogs for manipulating CSV files. The authors recommend using streaming for such operations which is more efficient for large data. From my short list, I have:

  1. csv-parse
  2. csv-parser
  3. fast-csv

I try csv-parse which looks quite elegant but I can't make it work. I don't understand the problem but at this point, I don't really have the time to investigate.

I move to the second option, csv-parser, which is straightforward to use.

As the users will upload multiple files, the idea is to loop each file, create a read stream and pipe an instance of the csv-parser. In addition, I use the separator or delimiter ";" as used in the CSV files. I also add UTF8 encoding as the CSV files use the French language.

If you don't know what streams are, you can check it here. This is how I've written the codes in CoffeeScript:

fs.createReadStream(file)
       .pipe(csv({ separator: ';', encoding: 'UTF-8' }))
       .on('headers', (headers) ->
       )
       .on('data', (row) ->
       )
       .on('end', () ->
       )

As I use promises, I resolve it on the end function with resolve true.

I find the encoding to be tricky in the sense that the file should already be in UTF8. Otherwise, we don't have the correct symbols. If the users don't upload files with such encoding, I can then use another library to change the encoding as such.

For the moment, the validation process is simple:

  • check if the file mimetype is text/csv
  • read the rows, and check if the headers match for a particular table (name passed through the multiform params)

I can also deduce from the headers the type of file - but at the same time, I prefer avoiding magic - because as the file importation features scale, this can become too complicated

If there are columns that are going to be required, I am then going to check if the value is present before inserting in the database. There are several ways to do that:

  • I can keep an object for each headers and check if the value is not empty
  • I can also use an array namely required_headers = []

As in any effective file importation system, there need to be an error and a success log or report. The way I've proceeded with that is to use two different arrays respectively. Whenever there is an error, it is added in the error array. For the success report, I just add each point in the array such as the number of lines added.