If an API is missing, fragile, or simply not worth the integration work, SFTP is a clean way to move ETL files between vendors, customers, and third parties. Think of it as a controlled handoff point: drop files in, validate them, load them, and archive them, all on a repeatable loop.


What ETL means in practice

ETL stands for Extract, Transform, Load. It is the usual pattern for pulling data from one or more sources, reshaping it into something usable, and loading it into a destination such as a data warehouse.

You will also hear ELT (Extract, Load, Transform). In practice, the label matters less than the reality: data needs to arrive in a form you can trust, on a cadence you can rely on, with enough consistency that you are not firefighting every week.

There are plenty of orchestration tools for this, from long-standing platforms like Informatica to newer cloud services like Integrate.io and home-grown pipelines. But the part that tends to cause the real pain is not the transformation logic. It is the handoff between systems and organisations.


Where ETL breaks when the data is not yours

As soon as the data comes from outside your walls, the failure modes get familiar:

  • The interface changes
    • APIs change, endpoints get retired, fields appear, disappear, or shift meaning.
  • The data arrives late, incomplete, or duplicated
    • Retries, partial exports, and backfills create strange batches and mismatched counts.
  • The transfer method is informal
    • Email attachments and ad hoc shares might “work”, but they leave you with no clean trail and no dependable automation.
  • Security and governance show up as an afterthought
    • Access, audit evidence, and retention get handled differently by each person and each team.

Sometimes the simplest fix is also the most durable: agree on a file contract, then exchange those files securely and predictably.


Why SFTP fits ETL data exchange

SFTP (SSH File Transfer Protocol) is widely supported, stable, and encrypted in transit. It is a practical “meeting point” between parties because almost anything can talk to it: operating systems, ETL tools, scripts, SDKs, and command line clients.

The big point is this: SFTP is rarely the whole solution. The win is using SFTP to standardise the handoff so everything around it becomes easier to automate, validate, and debug.


A simple SFTP-based ETL handoff pattern

The goal is to make your file exchange boring. Boring is good. Here is a pattern that holds up well:

Agree on a transfer contract

    • Folder per partner or data source
    • File naming that encodes dataset, date, and batch identifier
    • A schema version field (or schema file), plus rules for backwards compatibility

Make uploads safe and predictable

    • Upload to a temporary name, then rename when complete
    • Publish a checksum or manifest so you can validate integrity

Trigger processing reliably

    • Poll on a schedule if that is all you have
    • Prefer event-driven triggers (webhooks) when they are available

Validate before loading

    • Check checksum, size, and expected columns or fields
    • Make the load idempotent so reprocessing the same batch does not duplicate data

Load, archive, and retain

    • Load into a staging area first, then promote to curated tables
    • Move processed files into an archive path with a clear retention policy

File formats that work well over SFTP

There is no universal “best” file format for ETL. What matters is that it is consistent and easy to validate. These are common choices:

  • CSV
    • Widely supported, but it is not forgiving when schemas evolve and it is awkward for nested data.
  • JSON
    • Flexible, but large JSON arrays can be clumsy to stream and validate.
  • JSONL (newline-delimited JSON)
    • Often a strong fit for batch feeds because it is stream-friendly and handles schema evolution more gracefully.
  • Compression
    • Gzip is a practical default for large exports, especially when bandwidth is a constraint.

Whichever you pick, pick one format per dataset, document it, and version it when it changes. Most “ETL mysteries” start with undocumented drift.


Where a managed SFTP service helps

SFTP To Go’s SFTP cloud solution offers a few more features that make it superior to plain SFTP when used in ETL and data processing platforms:

  • Multi-protocol support
    Users and applications can pick and choose whichever protocol they wish to read and write data files. While SFTP is the default protocol, you can also use FTPS, and Amazon S3 access is available for S3-style workflows against the same storage.
  • Filesystem change webhook notifications
    In many cases, ETL processes are triggered by a time based scheduler, for example: “Load customer data every midnight”. With webhook notifications, you can trigger ingestion when a file is dropped into your storage, so you process data as it arrives instead of waiting for a scheduled run.
  • Management APIs for automation
    Automate provisioning and maintenance tasks such as creating or disabling users, rotating credentials, and managing integrations, so ETL access does not depend on manual steps.
  • Per-user isolation and access control
    Use per-user home directories and scoped permissions so each vendor or system has a defined lane for uploads and downloads, rather than sharing a single set of credentials.
  • Audit trails for troubleshooting and compliance
    Keep a clear record of file activity and access so “what changed, who touched it, and when” is easier to answer when a load goes sideways.
  • Network restrictions where needed
    Restrict inbound access when you have fixed source IPs or stricter network requirements, which helps lock down partner feeds.
  • Web portal for manual sources
    When a “data source” is really a person exporting and uploading files, the web portal provides a simpler way to deliver batches without teaching everyone an SFTP client.

Frequently asked questions

What does SFTP mean in ETL workflows?

In ETL, SFTP is usually the secure handoff point for data files between systems or organisations. One side exports files to an SFTP location, and the other side validates, loads, and archives those files as part of the pipeline.

Is SFTP better than an API for ETL data exchange?

Not always. APIs are great when they are stable and well maintained. SFTP tends to win when an API is missing, fragile, rate-limited, or expensive to integrate, and the workflow is naturally file-based, like nightly exports, batch loads, or partner feeds.

How do I automate ETL file transfers over SFTP?

A common pattern is: upload into a known folder, verify the upload is complete, validate the file, then trigger a load. Uploading to a temporary filename and renaming on completion helps prevent downstream jobs from picking up half-written files.

What file format works best for ETL over SFTP?

CSV is common but brittle when schemas evolve. JSONL (newline-delimited JSON) is often a strong choice for batch feeds because it is stream-friendly and handles schema evolution more gracefully. The best format is the one you can validate, version, and process consistently.

How do I prevent duplicate loads when ingesting files from SFTP?

Make your pipeline idempotent. Track a batch identifier from the filename or a manifest, store which batches have been processed, and archive or move files only after a successful load so retries do not create duplicate rows.

How can I tell when an SFTP upload is finished?

Use a completion signal. Common approaches are uploading to a temporary name and renaming at the end, publishing a manifest or checksum file after the data file, or requiring a “done” marker file that the ETL job waits for.

Is SFTP secure enough for sensitive ETL data?

SFTP encrypts authentication and file transfer in transit, but overall security depends on access control, key management, and logging. For higher assurance, isolate partners to separate folders, apply least-privilege permissions, rotate keys, and keep retention rules explicit.

What is the difference between SFTP and FTPS for ETL?

SFTP runs over SSH and typically uses one encrypted connection. FTPS is FTP with TLS added and keeps separate control and data connections, which can add firewall and certificate management overhead. Both can be secure, but SFTP is often simpler to automate consistently for ETL file exchange.