Skip to main content

By Jesse Schutt

Building a CSV Importer - Part 2

In the previous article, we created the CSVUpload, gathered the column mapping, and split the data into individual CSVRow records. At this point, we are ready to implement the business-specific rules for integrating the data into our application.

If you recall, we left off with the DistributeCSVUploadContentIntoCSVRows job pulling each row out of the CSV file and storing it in CSVRow records.

public function handle()
{
    collect($this->csvUpload->file_contents)
        ->each(function ($csvRow) {
            dispatch(new ImportCSVRow(CSVRow::create([
                'csv_upload_id' => $this->csvUpload->getKey(),
                'contents'      => $this->normalizeCSVRow($csvRow)
            ])));
        });
}

The ImportCSVRow job that is dispatched at the time of the new row creation passes the CSVRow off to a CSVImporter class that will initiate the integration of the CSV row data into the application. Again, this is handled in a job so that we have the freedom to push it onto a queue.

Introducing the Laravel pipeline

This could be, and has been, the topic of a blog series on its own, but we'll go through it in light of our specific use-case.

Essentially a pipeline in Laravel takes "pipes", or individual classes, that receive a data object, act on it, and return the data to be processed by the next "pipe". If you've ever had a long method that you can trace through and say "First it does A, then it does B, and finally it does C", you may have a good candidate for a pipeline!

Setup

This particular pipeline flow requires the following scaffolding:

  • CSVImporter
  • CSVTraveler
  • Individual "pipes"

Let's walk through them, starting with the CSVImporter class.

<?php

namespace App\Services\CSVImporter;

use App\CSVRow;
use App\Services\CSVImporter\Pipes\AssignEmergencyContact;
use App\Services\CSVImporter\Pipes\ImportParticipant;
use App\Services\CSVImporter\Pipes\MapAllergies;
use Illuminate\Pipeline\Pipeline;

class CSVImporter
{

    private $traveler;

    public function __construct(CSVImportTraveler $traveler)
    {
        $this->traveler = $traveler;
    }

    public function importRow(CSVRow $row)
    {
        return app(Pipeline::class)
            ->send($this->traveler->setRow($row))
            ->through([
                ImportParticipant::class,
                MapAllergies::class,
                AssignEmergencyContact::class
            ])->then(function ($traveler) {
                return $traveler;
            });
    }
}

The service is primarily bootstrap code that creates a new CSVImportTraveler instance, sets the CSVRow on it, and passes it through a series of pipes, finally returning the traveler object at the end.

<?php
    
namespace App\Services\CSVImporter;
    
use App\CSVRow;
    
class CSVImportTraveler
{

    private $row;

    public function setRow(CSVRow $row): CSVImportTraveler
    {
        $this->row = $row;

        return $this;
    }

    public function getRow(): CSVRow
    {
        return $this->row;
    }
}

I find it helpful to think of the CSVImportTraveler object as a way to move data between the pipes. For example, in the first pipe we might do something like create a Participant model, and since the successive pipes may need access to that model we can add a getter/setter on the traveler class to store the Participant.

The pipes

For our example, let's say we have a CSV file containing some contact details, allergy requirements, and emergency contact information for a list of participants in an upcoming event.

first_name,last_name,email,allergies,emergency_contact_name,emergency_contact_phone
John,Doe,[email protected],"peanuts, gluten",Richard Doe,5551234567
Jane,Doe,[email protected],soy,Richard Doe,5551234567
Billy,Johnson,[email protected],,Paul Johnson,555-555-5555
Amanda,Johnson,[email protected],"gluten, dairy",Anita Johnson,555-555-4567

The business rules say that we must import the participants, correctly identify their allergies, and assign them to their emergency contacts.

Let's break down the steps into discrete classes like this:

Step 1: ImportParticipant
Step 2: MapAllergies
Step 3: AssignEmergencyContact

Separating these concerns allows us to focus the logic on each step independently of the others, and it gives an easy way to test that each portion is processed correctly. Here is an example of what might go in the ImportParticipant pipe. Of course this is going to be dependent on what your application needs to do. (You can see the other pipe examples in the repository.)

<?php
    
namespace App\Services\CSVImporter\Pipes;
    
use App\Participant;
use App\Services\CSVImporter\CSVImportTraveler;
    
class ImportParticipant implements CSVImporterPipe
{
   
    public function handle(CSVImportTraveler $traveler, \Closure $next)
    {
        $participant = Participant::firstOrCreate([
            'email' => $traveler->getRow()->contents['email']
        ], [
            'first_name' => $traveler->getRow()->contents['first_name'],
            'last_name'  => $traveler->getRow()->contents['last_name']
        ]);

        $traveler->setParticipant($participant);

        return $next($traveler);
    }
}

Testing pipes

Along with the previously mentioned benefits, using a pipeline allows us to write tests that assert each portion of the import is working properly. Simply set up the $traveler, fire the handle() method on the pipe, and assert that the outcome is as expected!

<?php

namespace Tests\Services\CSVImporter\Pipes;

use App\CSVRow;
use App\Services\CSVImporter\CSVImportTraveler;
use App\Services\CSVImporter\Pipes\ImportParticipant;
use Illuminate\Foundation\Testing\RefreshDatabase;
use Tests\TestCase;

class ImportParticipantTest extends TestCase
{
    use RefreshDatabase;

    /** @test */
    public function it_imports_a_new_participant_from_a_csv_row()
    {
        $rowContents = [
            'first_name' => 'John',
            'last_name' => 'Doe',
            'email' => '[email protected]'
        ];

        $csvRow = factory(CSVRow::class)->create(['contents' => $rowContents]);

        (new ImportParticipant())->handle(
            (new CSVImportTraveler())->setRow($csvRow), 
            function () {}
        );

        $this->assertDatabaseHas('participants', $rowContents);
    }
}

Conclusion

In this article we covered how to use the Laravel Pipeline flow to process individual CSV rows and distribute the content across our application however the business requirements dictate. In the follow-up article I'll show a way to handle logging events to keep track of what happened on each row!

Want to read more tips and insights on working with a website development team that wants to help your organization grow for good? Sign up for our bimonthly newsletter.

By Jesse Schutt

Director of Engineering

Jesse is our resident woodworker. His signature is to find the deeper meaning in a project and the right tool for the job.