Building a CSV Importer - Part 1
As with the majority of programming techniques there are a myriad of ways to accomplish any given task.
Some of the approaches shine in certain areas while others may need to be tailored to fit a specific need.
I thought I'd share how I've handled a particular task, the importing of data from a CSV file, in hopes you might benefit from seeing another angle.
As I started writing out a sample app to demonstrate the flow I realized that there are a number of considerations that must be made, so I've broken this tutorial into 3 parts:
- Uploading, storing, and mapping the CSV data
- Processing and distributing the individual rows
- Error handling
Each section depends on the previous, so make sure to go through them in order.
Setup
To get started we need to create the following models with corresponding migrations and factories.
CSVUpload
This model will hold the high-level details for an uploaded CSV, including all the data, a boolean to know if we should attempt to pull headers from the first row, and the filename.
<?php
use Illuminate\Support\Facades\Schema;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Database\Migrations\Migration;
class CreateCsvUploadsTable extends Migration
{
public function up()
{
Schema::create('csv_uploads', function (Blueprint $table) {
$table->increments('id');
$table->string('original_filename');
$table->boolean('has_headers')->default(false);
$table->longText('file_contents');
$table->text('column_mapping')->nullable();
$table->timestamps();
});
}
public function down()
{
Schema::dropIfExists('csv_uploads');
}
}
And the corresponding factory:
<?php
use Faker\Generator as Faker;
$factory->define(\App\CSVUpload::class, function(Faker $faker) {
return [
'original_filename' => $faker->word . '.csv',
'has_headers' => true,
'file_contents' => [],
'column_mapping' => []
];
});
CSVRow
After the initial CSVUpload
has been created we will map each CSV row to individual CSVRow
records to allow for additional processing.
<?php
use Illuminate\Support\Facades\Schema;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Database\Migrations\Migration;
class CreateCsvRowsTable extends Migration
{
public function up()
{
Schema::create('csv_rows', function (Blueprint $table) {
$table->increments('id');
$table->text('contents');
$table->unsignedInteger('csv_upload_id')->index();;
$table->foreign('csv_upload_id')
->references('id')
->on('csv_uploads')
->onDelete('cascade');
$table->dateTime('imported_at')->nullable();
$table->dateTime('warned_at')->nullable();
$table->dateTime('failed_at')->nullable();
$table->timestamps();
});
}
public function down()
{
Schema::dropIfExists('csv_rows');
}
}
Factory:
<?php
use Faker\Generator as Faker;
$factory->define(\App\CSVRow::class, function (Faker $faker) {
return [
'contents' => [],
'csv_upload_id' => function () {
return factory(\App\CSVUpload::class)->create()->id;
}
];
});
Uploading, storing, and mapping CSV data
The process of uploading the CSV can be done in a variety of ways but for simplicity's sake I've used a simple Blade form that has two inputs: a file input and a checkbox indicating whether the first row holds the heading names for each column or not.
<form action="{{ route('csv-uploads.store') }}" method="post" class="bg-white shadow-md rounded px-8 pt-6 pb-8 mb-4"
enctype="multipart/form-data">
{{ csrf_field() }}
{{ method_field('POST') }}
<h3 class="pb-4 mt-0 font-bold">Upload CSV</h3>
<div class="mb-4">
<input type="file" name="csvFile" id="csvFile">
</div>
<div class="mb-4">
<input id="hasHeaders" name="hasHeaders" type="checkbox">
<label for="hasHeaders">File contains header row</label>
</div>
<div class="mt-4">
<button class="bg-blue hover:bg-blue-dark text-white font-bold py-2 px-4 rounded" type="submit">Upload</button>
</div>
</form>
This form posts to a controller method that does the following:
public function store(StoreCSVUploadRequest $request)
{
$data = collect(array_map('str_getcsv', file($request->file('csvFile')->getRealPath())));
if ($request->has('hasHeaders')) {
$headerRow = $data->shift();
$data->transform(function ($row) use ($headerRow) {
return collect($row)->mapWithKeys(function ($value, $index) use ($headerRow) {
return [$headerRow[$index] => $value];
})->toArray();
});
}
if ($data->count() >= 1) {
$csvUpload = CSVUpload::create([
'original_filename' => $request->file('csvFile')->getClientOriginalName(),
'has_headers' => $request->has('hasHeaders'),
'file_contents' => $data
]);
return redirect(route('csv-uploads.map-columns.show', $csvUpload->getKey()));
} else {
return back()->withError('We were not able to locate any eligible rows. Check the document and try again.');
}
}
The logic defined in this controller method is a good candidate for abstraction into a repository or service, but for brevity I've left it inline.
Let's talk through what's happening.
The StoreCSVUploadRequest
asserts that the request has a valid CSV file.
<?php
namespace App\Requests\CSVUploads;
use Illuminate\Foundation\Http\FormRequest;
class StoreCSVUploadRequest extends FormRequest
{
public function authorize()
{
return true;
}
public function rules()
{
return [
'csvFile' => 'required|mimes:csv,txt'
];
}
}
Assuming the CSV file passes validation, we use str_getcsv
to pull the rows out of the uploaded file. Once we have an array of data we check to see if the user has specified that the file has provided a header row. If they have we use a Collection method to apply the header row to each of the keys in the rows.
To give some context: without a header row the data looks like this:
[
[
0 => 'Value 1',
1 => 'Value 2',
2 => 'Value 3',
]
]
With the header row applied we'd end up with this:
[
[
'header_1' => 'Value 1',
'header_2' => 'Value 2',
'header_3' => 'Value 3',
]
]
Finally, the controller method checks to see if it was able to pull any valid rows out of the CSV. If so, it creates a new CSVUpload
record with the processed data. If it could not find any rows it simply returns a redirect with an error status.
Mapping columns to uploaded rows
Now that we have a CSVUpload
record we can redirect the user to a view that allows them to specify which columns should be assigned which value within the data structure of the application.
In a simple case this might look like mapping a "First Name" column to "first_name" within the application.
public function show(CSVUpload $csvUpload)
{
return view('csv-uploads.map-columns', compact('csvUpload'));
}
The Blade template:
<form class="form-horizontal" method="POST" action="{{ route('csv-uploads.map-columns.store', $csvUpload->getKey() )}}">
{{ csrf_field() }}
{{ method_field('post') }}
<table class="w-full mb-8 -mx-2">
@if ($csvUpload->has_headers)
<tr>
@foreach ($csvUpload->headerRow as $headerField)
<th class="text-left p-2">{{ $headerField }}</th>
@endforeach
</tr>
@endif
@foreach ($csvUpload->previewRows as $row)
<tr>
@foreach ($row as $key => $value)
<td class="p-2 text-sm">{{ $value }}</td>
@endforeach
</tr>
@endforeach
@if($csvUpload->additionalRowCount)
<tr>
<td colspan="100" class="p-2">
<div class="rounded border border-grey-lighter bg-grey-lightest py-2 text-xs text-center ">
+{{ $csvUpload->additionalRowCount }} more...
</div>
</td>
</tr>
@endif
<tr>
@foreach ($csvUpload->previewRows[0] as $key => $value)
<td class="p-2">
<div class="inline-block relative w-full">
<select name="fields[{{ $key }}]" class="block appearance-none w-full bg-white border border-grey-light hover:border-grey px-4 py-2 pr-8 rounded shadow leading-tight">
@foreach ($csvUpload->availableFields as $availableField)
<option value="{{ $availableField }}" @if ($key === $availableField) selected @endif>{{ $availableField }}</option>
@endforeach
</select>
<div class="pointer-events-none absolute pin-y pin-r flex items-center px-2 text-grey-darker">
<svg class="fill-current h-4 w-4" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20"><path d="M9.293 12.95l.707.707L15.657 8l-1.414-1.414L10 10.828 5.757 6.586 4.343 8z"/></svg>
</div>
</div>
</td>
@endforeach
</tr>
</table>
<button type="submit" class="bg-blue hover:bg-blue-dark text-white font-bold py-2 px-4 rounded">
Import Data
</button>
</form>
There is a fair bit going on in this template but the underlying concept is this: Present the user with a few sample rows so they can say "this column represents this data".
You might notice that there are a few accessors in the blade template: $csvUpload->headerRow
, $csv->additionalRowCount
, and $csv->previewRows
.
I added these as convenience methods on the CSVUpload
model:
public function getHeaderRowAttribute()
{
return array_keys($this->file_contents[0]);
}
public function getPreviewRowsAttribute()
{
return array_slice($this->file_contents, 0, 5);
}
public function getAdditionalRowCountAttribute()
{
return (count($this->file_contents) - 5) < 0 ? 0 : count($this->file_contents) - 5;
}
Perhaps the most complicated portion of the template is the <select>
box that configures the column mapping.
@foreach ($csvUpload->previewRows[0] as $key => $value)
<td class="p-2">
<div class="inline-block relative w-full">
<select name="fields[{{ $key }}]" class="block appearance-none w-full bg-white border border-grey-light hover:border-grey px-4 py-2 pr-8 rounded shadow leading-tight">
@foreach ($csvUpload->availableFields as $availableField)
<option value="{{ $availableField }}" @if ($key === $availableField) selected @endif>{{ $availableField }}</option>
@endforeach
</select>
<div class="pointer-events-none absolute pin-y pin-r flex items-center px-2 text-grey-darker">
<svg class="fill-current h-4 w-4" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20"><path d="M9.293 12.95l.707.707L15.657 8l-1.414-1.414L10 10.828 5.757 6.586 4.343 8z"/></svg>
</div>
</div>
</td>
@endforeach
Essentially it is generating a <select>
with all of the available columns and pushing them into an array that we will later use to map the CSV values into our application. The availableFields
could be generated from anywhere, whether it be a config file, or, like I've used here, an accessor:
public function getAvailableFieldsAttribute()
{
return [
'first_name',
'last_name',
'email',
];
}
Once the user has defined which CSV columns should map to the application values, submitting the form posts to the following method on the MapColumnsController
:
public function store(CSVUpload $csvUpload, StoreCSVUploadColumnMappingRequest $request)
{
$csvUpload->update([
'column_mapping' => $request->fields,
]);
$this->dispatch(new DistributeCSVUploadContentIntoCSVRows($csvUpload));
return redirect(route('csv-uploads.index'));
}
The column mapping is updated on the CSVUpload
, a new job is dispatched, and the user is redirected back to the CSV Import Index.
Distributing CSV data into individual rows
Once the column mapping is provided we have enough information to create the individual CSVRow
records. I used a Job here so that we can queue the process using Laravel's queue.
<?php
namespace App\Jobs;
use App\CSVRow;
use App\CSVUpload;
use Illuminate\Bus\Queueable;
use Illuminate\Queue\SerializesModels;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
class DistributeCSVUploadContentIntoCSVRows implements ShouldQueue
{
use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;
private $csvUpload;
public function __construct(CSVUpload $csvUpload)
{
$this->csvUpload = $csvUpload;
}
public function handle()
{
collect($this->csvUpload->file_contents)
->each(function ($csvRow) {
dispatch(new ImportCSVRow(CSVRow::create([
'csv_upload_id' => $this->csvUpload->getKey(),
'contents' => $this->normalizeCSVRow($csvRow)
])));
});
}
private function normalizeCSVRow(array $csvRow)
{
return collect($this->csvUpload->column_mapping)
->flatMap(function ($columnName, $index) use ($csvRow) {
return [$columnName => $csvRow[$index]];
})->toArray();
}
}
Essentially this job takes each row in the file_contents
column, applies the newly provided column mapping, creates a CSVRow
record, and passes it off to the ImportCSVRow
job.
The ImportCSVRow
will distribute the CSV row into the proper locations in the application.
In conclusion
Phew, that was a lot to cover! Here's the high level flow:
- Upload a CSV file
- Create a
CSVUpload
record to hold all the data - Define the column mapping from the CSV to our application
- Split the rows into individual
CSVRow
records
In the next article we will cover how to handle the distribution of each row into the application!
(The original concept for this article came from Povilas Korop.)
(The code samples in this article have been truncated for clarity.)
Want to read more tips and insights on working with a website development team that wants to help your organization grow for good? Sign up for our bimonthly newsletter.
By Jesse Schutt
Director of Engineering
Jesse is our resident woodworker. His signature is to find the deeper meaning in a project and the right tool for the job.