Building a CSV Importer - Part 1

Written By Jesse Schutt
Posted on

As with the majority of programming techniques there are a myriad of ways to accomplish any given task. Some of the approaches shine in certain areas while others may need to be tailored to fit a specific need.

I thought I'd share how I've handled a particular task, the importing of data from a CSV file, in hopes you might benefit from seeing another angle.

As I started writing out a sample app to demonstrate the flow I realized that there are a number of considerations that must be made, so I've broken this tutorial into 3 parts:

  • Uploading, storing, and mapping the CSV data
  • Processing and distributing the individual rows
  • Error handling

Each section depends on the previous, so make sure to go through them in order.

Setup

To get started we need to create the following models with corresponding migrations and factories.

CSVUpload

This model will hold the high-level details for an uploaded CSV, including all the data, a boolean to know if we should attempt to pull headers from the first row, and the filename.

<?php

use Illuminate\Support\Facades\Schema;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Database\Migrations\Migration;

class CreateCsvUploadsTable extends Migration
{

    public function up()
    {
        Schema::create('csv_uploads', function (Blueprint $table) {
            $table->increments('id');
            $table->string('original_filename');
            $table->boolean('has_headers')->default(false);
            $table->longText('file_contents');
            $table->text('column_mapping')->nullable();
            $table->timestamps();
        });
    }
    
    public function down()
    {
        Schema::dropIfExists('csv_uploads');
    }
}

And the corresponding factory:

 <?php
    
    use Faker\Generator as Faker;
    
    $factory->define(\App\CSVUpload::class, function(Faker $faker) {
        return [
            'original_filename' => $faker->word . '.csv',
            'has_headers'       => true,
            'file_contents'     => [],
            'column_mapping'    => []
        ];
    });    

CSVRow

After the initial CSVUpload has been created we will map each CSV row to individual CSVRow records to allow for additional processing.

<?php

use Illuminate\Support\Facades\Schema;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Database\Migrations\Migration;

class CreateCsvRowsTable extends Migration
{

    public function up()
    {
        Schema::create('csv_rows', function (Blueprint $table) {
            $table->increments('id');
            $table->text('contents');
            $table->unsignedInteger('csv_upload_id')->index();;
            $table->foreign('csv_upload_id')
                ->references('id')
                ->on('csv_uploads')
                ->onDelete('cascade');
            $table->dateTime('imported_at')->nullable();
            $table->dateTime('warned_at')->nullable();
            $table->dateTime('failed_at')->nullable();
            $table->timestamps();
        });
    }

    public function down()
    {
        Schema::dropIfExists('csv_rows');
    }
}

Factory:

<?php
    
use Faker\Generator as Faker;
    
$factory->define(\App\CSVRow::class, function (Faker $faker) {
    return [
        'contents'      => [],
        'csv_upload_id' => function () {
            return factory(\App\CSVUpload::class)->create()->id;
        }
    ];
});    

Uploading, Storing, and Mapping CSV Data

The process of uploading the CSV can be done in a variety of ways but for simplicity's sake I've used a simple Blade form that has two inputs: a file input and a checkbox indicating whether the first row holds the heading names for each column or not.

<form action="{{ route('csv-uploads.store') }}" method="post" class="bg-white shadow-md rounded px-8 pt-6 pb-8 mb-4"
      enctype="multipart/form-data">
    {{ csrf_field() }}
    {{ method_field('POST') }}
    <h3 class="pb-4 mt-0 font-bold">Upload CSV</h3>
    <div class="mb-4">
        <input type="file" name="csvFile" id="csvFile">
    </div>
    <div class="mb-4">
        <input id="hasHeaders" name="hasHeaders" type="checkbox">
        <label for="hasHeaders">File contains header row</label>
    </div>
    <div class="mt-4">
        <button class="bg-blue hover:bg-blue-dark text-white font-bold py-2 px-4 rounded" type="submit">Upload</button>
    </div>
</form>
Csv Importer Upload Form

This form posts to a controller method that does the following:

public function store(StoreCSVUploadRequest $request)
{
    $data = collect(array_map('str_getcsv', file($request->file('csvFile')->getRealPath())));

    if ($request->has('hasHeaders')) {
        $headerRow = $data->shift();

        $data->transform(function ($row) use ($headerRow) {
            return collect($row)->mapWithKeys(function ($value, $index) use ($headerRow) {
                return [$headerRow[$index] => $value];
            })->toArray();
        });
    }

    if ($data->count() >= 1) {
        $csvUpload = CSVUpload::create([
            'original_filename' => $request->file('csvFile')->getClientOriginalName(),
            'has_headers'       => $request->has('hasHeaders'),
            'file_contents'     => $data
        ]);

        return redirect(route('csv-uploads.map-columns.show', $csvUpload->getKey()));
    } else {
        return back()->withError('We were not able to locate any eligible rows. Check the document and try again.');
    }
}

The logic defined in this controller method is a good candidate for abstraction into a repository or service, but for brevity I've left it inline.

Let's talk through what's happening.

The StoreCSVUploadRequest asserts that the request has a valid CSV file.

<?php

namespace App\Requests\CSVUploads;

use Illuminate\Foundation\Http\FormRequest;

class StoreCSVUploadRequest extends FormRequest
{

    public function authorize()
    {
        return true;
    }

    public function rules()
    {
        return [
            'csvFile' => 'required|mimes:csv,txt'
        ];
    }
}

Assuming the CSV file passes validation, we use str_getcsv to pull the rows out of the uploaded file. Once we have an array of data we check to see if the user has specified that the file has provided a header row. If they have we use a Collection method to apply the header row to each of the keys in the rows.

To give some context: without a header row the data looks like this:

[
    [
        0 => 'Value 1',
        1 => 'Value 2',
        2 => 'Value 3',
    ]
]

With the header row applied we'd end up with this:

[
    [
        'header_1' => 'Value 1',
        'header_2' => 'Value 2',
        'header_3' => 'Value 3',
    ]
]	

Finally, the controller method checks to see if it was able to pull any valid rows out of the CSV. If so, it creates a new CSVUpload record with the processed data. If it could not find any rows it simply returns a redirect with an error status.

Mapping Columns to Uploaded Rows

Now that we have a CSVUpload record we can redirect the user to a view that allows them to specify which columns should be assigned which value within the data structure of the application.

In a simple case this might look like mapping a "First Name" column to "first_name" within the application.

public function show(CSVUpload $csvUpload)
{
    return view('csv-uploads.map-columns', compact('csvUpload'));
}
<form class="form-horizontal" method="POST" action="{{ route('csv-uploads.map-columns.store', $csvUpload->getKey() )}}">
    {{ csrf_field() }}
    {{ method_field('post') }}

    <table class="w-full mb-8 -mx-2">
        @if ($csvUpload->has_headers)
        <tr>
            @foreach ($csvUpload->headerRow as $headerField)
            <th class="text-left p-2">{{ $headerField }}</th>
            @endforeach
        </tr>
        @endif
        @foreach ($csvUpload->previewRows as $row)
        <tr>
            @foreach ($row as $key => $value)
            <td class="p-2 text-sm">{{ $value }}</td>
            @endforeach
        </tr>
        @endforeach
        @if($csvUpload->additionalRowCount)
        <tr>
            <td colspan="100" class="p-2">
                <div class="rounded border border-grey-lighter bg-grey-lightest py-2 text-xs text-center ">
                    +{{ $csvUpload->additionalRowCount }} more...
                </div>
            </td>
        </tr>
        @endif
        <tr>
            @foreach ($csvUpload->previewRows[0] as $key => $value)
            <td class="p-2">
                <div class="inline-block relative w-full">
                    <select name="fields[{{ $key }}]" class="block appearance-none w-full bg-white border border-grey-light hover:border-grey px-4 py-2 pr-8 rounded shadow leading-tight">
                        @foreach ($csvUpload->availableFields as $availableField)
                        <option value="{{ $availableField }}" @if ($key === $availableField) selected @endif>{{ $availableField }}</option>
                        @endforeach
                    </select>
                    <div class="pointer-events-none absolute pin-y pin-r flex items-center px-2 text-grey-darker">
                        <svg class="fill-current h-4 w-4" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20"><path d="M9.293 12.95l.707.707L15.657 8l-1.414-1.414L10 10.828 5.757 6.586 4.343 8z"/></svg>
                    </div>
                </div>
            </td>
            @endforeach
        </tr>
    </table>

    <button type="submit" class="bg-blue hover:bg-blue-dark text-white font-bold py-2 px-4 rounded">
        Import Data
    </button>

</form>	
Csv Importer Map Cols

There is a fair bit going on in this template but the underlying concept is this: Present the user with a few sample rows so they can say "this column represents this data".

You might notice that there are a few accessors in the blade template: $csvUpload->headerRow, $csv->additionalRowCount, and $csv->previewRows.

I added these as convenience methods on the CSVUpload model:

public function getHeaderRowAttribute()
{
    return array_keys($this->file_contents[0]);
}

public function getPreviewRowsAttribute()
{
    return array_slice($this->file_contents, 0, 5);
}

public function getAdditionalRowCountAttribute()
{
    return (count($this->file_contents) - 5) < 0 ? 0 : count($this->file_contents) - 5;
}

Perhaps the most complicated portion of the template is the <select> box that configures the column mapping.

@foreach ($csvUpload->previewRows[0] as $key => $value)
<td class="p-2">
    <div class="inline-block relative w-full">
        <select name="fields[{{ $key }}]" class="block appearance-none w-full bg-white border border-grey-light hover:border-grey px-4 py-2 pr-8 rounded shadow leading-tight">
            @foreach ($csvUpload->availableFields as $availableField)
            <option value="{{ $availableField }}" @if ($key === $availableField) selected @endif>{{ $availableField }}</option>
            @endforeach
        </select>
        <div class="pointer-events-none absolute pin-y pin-r flex items-center px-2 text-grey-darker">
            <svg class="fill-current h-4 w-4" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20"><path d="M9.293 12.95l.707.707L15.657 8l-1.414-1.414L10 10.828 5.757 6.586 4.343 8z"/></svg>
        </div>
    </div>
</td>
@endforeach

Essentially it is generating a <select> with all of the available columns and pushing them into an array that we will later use to map the CSV values into our application. The availableFields could be generated from anywhere, whether it be a config file, or, like I've used here, an accessor:

public function getAvailableFieldsAttribute()
{
    return [
        'first_name',
        'last_name',
        'email',
    ];
}

Once the user has defined which CSV columns should map to the application values, submitting the form posts to the following method on the MapColumnsController:

public function store(CSVUpload $csvUpload, StoreCSVUploadColumnMappingRequest $request)
{
    $csvUpload->update([
        'column_mapping' => $request->fields,
    ]);

    $this->dispatch(new DistributeCSVUploadContentIntoCSVRows($csvUpload));

    return redirect(route('csv-uploads.index'));
}

The column mapping is updated on the CSVUpload, a new job is dispatched, and the user is redirected back to the CSV Import Index.

Distributing CSV Data into Individual Rows

Once the column mapping is provided we have enough information to create the individual CSVRow records. I used a Job here so that we can queue the process using Laravel's queue.

<?php

namespace App\Jobs;

use App\CSVRow;
use App\CSVUpload;
use Illuminate\Bus\Queueable;
use Illuminate\Queue\SerializesModels;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;

class DistributeCSVUploadContentIntoCSVRows implements ShouldQueue
{

    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
    
    private $csvUpload;

    public function __construct(CSVUpload $csvUpload)
    {
        $this->csvUpload = $csvUpload;
    }

    public function handle()
    {
        collect($this->csvUpload->file_contents)
            ->each(function ($csvRow) {
                dispatch(new ImportCSVRow(CSVRow::create([
                    'csv_upload_id' => $this->csvUpload->getKey(),
                    'contents'      => $this->normalizeCSVRow($csvRow)
                ])));
            });
    }

    private function normalizeCSVRow(array $csvRow)
    {
        return collect($this->csvUpload->column_mapping)
            ->flatMap(function ($columnName, $index) use ($csvRow) {
                return [$columnName => $csvRow[$index]];
            })->toArray();
    }
}

Essentially this job takes each row in the file_contents column, applies the newly provided column mapping, creates a CSVRow record, and passes it off to the ImportCSVRow job.

The ImportCSVRow will distribute the CSV row into the proper locations in the application.

In Conclusion

Phew, that was a lot to cover! Here's the high level flow:

  1. Upload a CSV file
  2. Create a CSVUpload record to hold all the data
  3. Define the column mapping from the CSV to our application
  4. Split the rows into individual CSVRow records

In the next article we will cover how to handle the distribution of each row into the application!

  1. Hero Image
  2. The original concept for this article came from Povilas Korop.
  3. The code samples in this article have been truncated for clarity.