How To Resume Interrupted SFTP Downloads In Python
If you’ve ever needed to transfer large files over SFTP, then you know that connection drops and network timeouts can be a real headache; especially when you have to start all the way from the beginning again. This guide will show you how you can use Python to pick up where you left off should your file downloads ever get interrupted.
The secret sauce of our script is Paramiko. It’s a Python SSH library that allows you to connect to SFTP services. It lets us check file sizes, and ‘seeks’ to specific byte offsets so that we can continue downloads when they get interrupted. By the end of this article, you'll have functioning code that will work with any standard SFTP server, or cloud-based SFTP services like SFTP To Go.
What does “resume” mean for SFTP downloads?
Resuming SFTP downloads simply means that we can continue a download without losing progress should it get interrupted along the way. It feels like something that SFTP clients should do by default, but not all of them do.
SFTP allows us to resume downloads because it is a feature of the protocol itself; it's up to us to either use a client that supports SFTP download resuming, or to build one for ourselves. The client needs to track the progress of a partially downloaded file, and then use that information to compare the download progress with the original file. The logic looks like this:
- Check if a partial local file exists (we will use a .part extension for our downloads)
- Get its size in bytes
- Get the remote file’s total size
- Open the remote file and seek to its matching offset
- Read from that offset, and then append it to the local file
- When it completes, we rename the .part file to the actual filename with the correct extension or file type
It’s important to note that this only works reliably if the remote file hasn’t changed since the partial download started. If the remote file was modified, then we’d be appending new data to our existing old local data, which will give us a sad and corrupted file if the download actually manages to complete. But don’t worry, we’ll add a check for this scenario as well.
Prerequisites
For this demo you’ll need python 3.7+, Tenacity, and the Paramiko library:
pip install paramiko tenacityConnect to your SFTP server
Before we build the resume logic, we’ll need to test basic connectivity. Here’s a minimal example that uses SSH key authentication. Remember to change the default hostname, port, username, and key path parameters in these partial scripts if you would like to follow along.
#sftp-1.py example script
import paramiko
import os
# Here's our function to create an SFTP client using Ed25519 key authentication
def create_sftp_client(hostname, port, username, key_path):
"""Create and return an SFTP client connection."""
ssh = paramiko.SSHClient()
ssh.load_system_host_keys()
ssh.set_missing_host_key_policy(paramiko.RejectPolicy())
private_key = paramiko.Ed25519Key.from_private_key_file(key_path)
ssh.connect(hostname, port=port, username=username, pkey=private_key)
sftp = ssh.open_sftp()
return ssh, sftp
# This is the main part of the script that uses the function above
if __name__ == "__main__":
#Replace these with your actual values
ssh, sftp = create_sftp_client(
hostname="your_host",
port=22,
username="your_username",
key_path=os.path.expanduser("your_path")
)
A few things about this setup:
- We are using load_system_host_keys() to verify the server’s identity from the ~/.ssh/known_hosts file
- The RejectPolicy() will refuse connections to unknown hosts
- Ed25519 keys are quite secure for SFTP, but you can swap to your preferred key type if you have specific requirements for HIPAA, SOC 2 or GDPR.
For this test, you’ll want to make sure that your SFTP host has been added to your known_hosts before running this demo script. If the host is not present in that file, then you will see an error like:
To avoid this, just run the command ssh your_hostname and allow the entry to be added. If you are greeted with a “Last Login” message, then it worked without any issues. If not, then you will see an error like the one below. Simply enter your password and the host will be added for you.
Now that you have your host added to the known_hosts file, you might still get some authorization errors. If you see authentication errors, make sure that you’ve added your public key to your SFTP server. For SFTP To Go, you can do this from the Dashboard -> Credentials tab. Check out the documentation for adding SSH keys for all the details.
Once that’s done, give the script a quick test by running it:
You should see a list of all the files in your default directory, provided you were successful with setting up your SFTP connection.
Check file sizes
If we want our downloads to resume after interruptions, we need to answer two questions: how much of the file do we have locally, and how big is the remote file? The code below gives us these answers.
def get_local_size(local_path):
"""Return size of local file, or 0 if it doesn't exist."""
if os.path.exists(local_path):
return os.path.getsize(local_path)
return 0
def get_remote_size(sftp, remote_path):
"""Return size of remote file."""
return sftp.stat(remote_path).st_size
The .part file is our partially downloaded file. If we see that it exists, then we’ll continue from where it left off. If not, we can go ahead and start the download from scratch. Below is a snippet from the download_with_resume function that explains how it works.
def download_with_resume(sftp, remote_path, local_path, chunk_size=32768):
"""
Download a file with resume support.
Uses a .part file during transfer and renames on success.
Returns the total bytes downloaded in this session.
"""
part_path = local_path + ".part"
# Get file sizes
remote_size = get_remote_size(sftp, remote_path)
local_size = get_local_size(part_path)
# Already complete?
if local_size >= remote_size:
print(f"File already complete ({local_size:,} bytes)")
if os.path.exists(part_path):
os.rename(part_path, local_path)
return 0
print(f"Remote: {remote_size:,} bytes")
print(f"Local: {local_size:,} bytes")
if local_size > 0:
print(f"Resuming from byte {local_size:,}")Download with resume support
Here’s a partial version of our final script(don’t worry, the full version is towards the end of the article). In this section you’ll see how it opens remote files, seeks to the offset, and then reads it in chunks.
Note: To help you understand the functionality of the script, I’ve added a file generation function that creates a large(ish) .bin file of about 10MB, and I’ve slowed the transfer rate deliberately by adding a time.sleep(0.05) delay in the while loop so that you can press Ctrl+C to interrupt the transfer. When you run the script again, it automatically resumes from where it left off.
# sftp-2-resume.py example script
import paramiko
import os
import time
def create_sftp_client(hostname, port, username, key_path):
"""Create and return an SFTP client connection."""
ssh = paramiko.SSHClient()
ssh.load_system_host_keys()
ssh.set_missing_host_key_policy(paramiko.RejectPolicy())
private_key = paramiko.Ed25519Key.from_private_key_file(key_path)
ssh.connect(hostname, port=port, username=username, pkey=private_key)
sftp = ssh.open_sftp()
return ssh, sftp
def get_remote_size(sftp, remote_path):
"""Get the size of a remote file."""
return sftp.stat(remote_path).st_size
def get_local_size(local_path):
"""Get the size of a local file, or 0 if it doesn't exist."""
if os.path.exists(local_path):
return os.path.getsize(local_path)
return 0
def download_with_resume(sftp, remote_path, local_path, chunk_size=32768):
"""
Download a file with resume support.
Uses a .part file during transfer and renames on success.
Returns the total bytes downloaded in this session.
"""
part_path = local_path + ".part"
# Lets get those file sizes
remote_size = get_remote_size(sftp, remote_path)
local_size = get_local_size(part_path)
# Is it already complete?
if local_size >= remote_size:
print(f"File already complete ({local_size} bytes)")
if os.path.exists(part_path):
os.rename(part_path, local_path)
return 0
print(f"Remote size: {remote_size:,} bytes")
print(f"Local size: {local_size:,} bytes")
print(f"Resuming from byte {local_size:,}")
bytes_downloaded = 0
# Next, we open the remote file and seek to its offset
with sftp.open(remote_path, "rb") as remote_file:
remote_file.seek(local_size)
# Then we open the local file in append mode
with open(part_path, "ab") as local_file:
while True:
chunk = remote_file.read(chunk_size)
if not chunk:
break
local_file.write(chunk)
bytes_downloaded += len(chunk)
# Here's the progress update
total = local_size + bytes_downloaded
percent = (total / remote_size) * 100
print(f"\rProgress: {percent:.1f}% ({total:,}/{remote_size:,} bytes)", end="", flush=True)
# Here's the slow down section I mentioned so you can press Ctrl+C to rudely interrupt the transfer
time.sleep(0.05)
print() # Newline after progress indicator, looks better this way - I think?
# Verify and rename file so it isn't a .part file anymore
final_size = get_local_size(part_path)
if final_size == remote_size:
os.rename(part_path, local_path)
print(f"✓ Download complete: {local_path}")
else:
print(f"✗ Size mismatch: expected {remote_size}, got {final_size}")
return bytes_downloaded
if __name__ == "__main__":
# As before, we connect to SFTP here
ssh, sftp = create_sftp_client(
hostname="your_host",
port=22,
username="your_username",
key_path=os.path.expanduser("~/.ssh/id_ed25519")
)
print("✓ Connected to SFTP server\n")
# Here's where we create a large(ish) test file at around 10MB
test_file = "/home/mint/test_large.bin"
# It's a good idea to check if file exists first
try:
existing_size = get_remote_size(sftp, test_file)
print(f"Test file already exists: {existing_size:,} bytes\n")
except:
print(f"Creating 10MB test file: {test_file}")
with sftp.open(test_file, "wb") as f:
# Write 10MB of data
for i in range(320): # 320 * 32KB = 10MB
f.write(b"X" * 32768)
print(f"Test file created: {get_remote_size(sftp, test_file):,} bytes\n")
# Now, let's download the file
local_path = "/home/mint/Downloads/test_large.bin"
print(f"Downloading to: {local_path}")
print("(Press Ctrl+C to interrupt, then run again to resume)\n")
try:
bytes_dl = download_with_resume(sftp, test_file, local_path, chunk_size=32768)
print(f"\n✓ Downloaded {bytes_dl:,} bytes in this session")
except KeyboardInterrupt:
print("\n\n⚠️ Download interrupted! Run the script again to resume.")
# Cleanup
sftp.close()
ssh.close()Below, in the screen capture, you can see that I interrupted the download a few times, and all I needed to do was run the script again and it picked up where it left off. There is a percentage indicator that lets you view your download’s progress.
The main parts that you need to know about for resume support are:
- sftp.open() with "rb" mode opens the remote file for reading
- remote_file.seek(local_size)is what jumps to the byte offset where we left off
- open(part_path, "ab") opens the local file in append-binary mode so we can add to the existing partial data
- chunk_size=32768 32KB chunks balance memory usage and transfer efficiency (and it also happens to be Paramiko's default max request size).
Handle connection drops with retry logic
Now that we’ve seen how the script runs with resume logic, we can look at how we can add a wrapper that auto-reconnects for us should the connection get dropped.
from tenacity import retry, stop_after_attempt, wait_fixed, retry_if_exception_type
@retry(
stop=stop_after_attempt(3),
wait=wait_fixed(5),
retry=retry_if_exception_type((paramiko.SSHException, OSError, IOError)),
reraise=True
)
def download_with_retry(hostname, port, username, key_path, remote_path, local_path):
"""
Download a file with automatic reconnection on failure.
Waits 5 seconds between attempts.
"""
ssh = None
try:
print("\nConnecting...")
ssh, sftp = create_sftp_client(hostname, port, username, key_path)
download_with_resume(sftp, remote_path, local_path)
part_path = local_path + ".part"
if not os.path.exists(part_path):
print("Transfer successful!")
return True
raise IOError("Download incomplete")
finally:
if ssh:
ssh.close()
When a connection drops in the middle of a transfer, the script steps in and:
- Catches the exception so your operation doesn’t error-out completely
- Waits a little before retrying
- Reconnects and calls the download_with_resume() function again
- Then the resume function picks up the pieces and figures out where the .part file ended, and then it continues where it left off.
Verify the remote file hasn’t changed
Resuming only works if the remote file hasn’t changed since you started the download (same size and ideally the same modified timestamp or hash). If it has, resuming can corrupt the result because you’ll append bytes from a different version of the file.
There are a few different options for working this out with Paramiko:
- File size only:This is the fastest way to check for file changes, but it’s the least reliable. A file could be edited and still stay the same size, or get truncated and rewritten. Size checks are only a viable option when you control the upload process and know files won't be modified.
- Modification timestamp (mtime):This is fast and reliable for traditional SFTP servers. It checks the st_mtime attribute from sftp.stat(). However, S3-backed SFTP services don't treat timestamps the same way that traditional SFTP servers do.
- Checksum (MD5 or SHA256): Probably the most reliable method because it verifies the actual file content, but the downside is speed. You would normally need to read the entire remote file to calculate the hash, which is not ideal for large files. Our example script offers you the option to read only the last 1MB of the file if you really want to use this verification method, which is better than reading the entire file.
Choosing chunk size
The chunk_size parameter affects your system’s memory usage and how efficient your transfers are with system resources. Here are some chunk size examples:
- Smaller chunks (8KB-16KB): Using smaller chunks uses less memory, but requires more round trips to the server. This makes the transfer slightly slower, but kinder to your system’s resources.
- Larger chunks (64KB-128KB): This increases your system memory usage, but the upside is that it requires fewer round trips, which could potentially give you faster file transfers if you have a decent connection.
- Default (32KB): This is a more balanced setup that gives you the best of both worlds, and it matches Paramiko’s internal buffer size.
For most situations, the default of 32KB will work great without any noticeable performance issues. If you are transferring very large files over a stable connection, then you could experiment with larger chunks of 64KB to get some gains in the transfer speed department.
Complete Script
Below is the complete working script. We’ve added argparse so that you don’t need to hardcode any values, simply pass them as arguments when you run the script. Execute the script with the --help flag to see all of the arguments that you can use with it.
Complete Script Code
#!/usr/bin/env python3
"""
SFTP downloader with resume support.
Usage:
python sftp_resume.py <remote_path> <local_path> [options]
Examples:
python sftp_resume.py /uploads/data.zip ./data.zip
python sftp_resume.py /uploads/data.zip ./data.zip --verify tail
python sftp_resume.py /uploads/data.zip ./data.zip --host myserver.sftptogo.com --user myuser --key /path/to/key
"""
import paramiko
import os
import sys
import hashlib
import argparse
from tenacity import retry, stop_after_attempt, wait_fixed, retry_if_exception_type
class SFTPDownloader:
"""SFTP client with resume support for interrupted downloads."""
def __init__(self, hostname, port=22, username=None, key_path=None, password=None):
"""
Initialize the SFTP downloader.
Args:
hostname: SFTP server hostname
port: SFTP server port (default: 22)
username: SFTP username
key_path: Path to SSH private key (optional)
password: Password for auth (optional, used if no key_path)
"""
self.hostname = hostname
self.port = port
self.username = username
self.key_path = key_path
self.password = password
self.ssh = None
self.sftp = None
def connect(self):
"""Establish SSH and SFTP connections."""
self.ssh = paramiko.SSHClient()
self.ssh.load_system_host_keys()
self.ssh.set_missing_host_key_policy(paramiko.RejectPolicy())
if self.key_path:
# Auto-detect key type
key_classes = [
paramiko.Ed25519Key,
paramiko.RSAKey,
paramiko.ECDSAKey
]
private_key = None
for key_class in key_classes:
try:
private_key = key_class.from_private_key_file(self.key_path)
break
except paramiko.SSHException:
continue
if private_key is None:
raise paramiko.SSHException(f"Unable to load key from {self.key_path}")
self.ssh.connect(self.hostname, port=self.port, username=self.username, pkey=private_key)
else:
self.ssh.connect(self.hostname, port=self.port, username=self.username, password=self.password)
self.sftp = self.ssh.open_sftp()
def disconnect(self):
"""Close SFTP and SSH connections."""
if self.sftp:
self.sftp.close()
self.sftp = None
if self.ssh:
self.ssh.close()
self.ssh = None
def __enter__(self):
"""Context manager entry."""
self.connect()
return self
def __exit__(self, exc_type, exc_val, exc_tb):
"""Context manager exit."""
self.disconnect()
return False
def get_remote_size(self, remote_path):
"""Return size of remote file."""
return self.sftp.stat(remote_path).st_size
def get_remote_mtime(self, remote_path):
"""Return modification time of remote file."""
return self.sftp.stat(remote_path).st_mtime
def calculate_remote_checksum(self, remote_path, chunk_size=32768):
"""Calculate MD5 checksum of entire remote file."""
md5 = hashlib.md5()
with self.sftp.open(remote_path, "rb") as f:
while True:
chunk = f.read(chunk_size)
if not chunk:
break
md5.update(chunk)
return md5.hexdigest()
def calculate_remote_tail_checksum(self, remote_path, tail_size=1048576, chunk_size=32768):
"""
Calculate MD5 checksum of the last portion of a remote file.
Much faster than full-file checksum while still detecting most changes.
Args:
remote_path: Path to file on server
tail_size: Bytes to read from end of file (default: 1MB)
chunk_size: Read chunk size in bytes
Returns:
Tuple of (checksum, file_size)
"""
md5 = hashlib.md5()
remote_size = self.get_remote_size(remote_path)
# For small files, just checksum the whole thing
if remote_size <= tail_size:
return self.calculate_remote_checksum(remote_path, chunk_size), remote_size
start_pos = remote_size - tail_size
with self.sftp.open(remote_path, "rb") as f:
f.seek(start_pos)
bytes_read = 0
while bytes_read < tail_size:
chunk = f.read(min(chunk_size, tail_size - bytes_read))
if not chunk:
break
md5.update(chunk)
bytes_read += len(chunk)
return md5.hexdigest(), remote_size
def _download_with_resume(self, remote_path, local_path, chunk_size=32768):
"""
Download a file with resume support.
Uses a .part file during transfer and renames on success.
Returns the total bytes downloaded in this session.
"""
part_path = local_path + ".part"
remote_size = self.get_remote_size(remote_path)
local_size = os.path.getsize(part_path) if os.path.exists(part_path) else 0
# Handle empty files
if remote_size == 0:
print("Remote file is empty (0 bytes)")
open(local_path, 'w').close()
if os.path.exists(part_path):
os.remove(part_path)
return 0
if local_size >= remote_size:
print(f"File already complete ({local_size:,} bytes)")
if os.path.exists(part_path):
os.rename(part_path, local_path)
return 0
print(f"Remote: {remote_size:,} bytes")
print(f"Local: {local_size:,} bytes")
if local_size > 0:
print(f"Resuming from byte {local_size:,}")
bytes_downloaded = 0
with self.sftp.open(remote_path, "rb") as remote_file:
remote_file.seek(local_size)
with open(part_path, "ab") as local_file:
while True:
chunk = remote_file.read(chunk_size)
if not chunk:
break
local_file.write(chunk)
bytes_downloaded += len(chunk)
total = local_size + bytes_downloaded
percent = (total / remote_size) * 100
print(f"\rProgress: {percent:.1f}% ({total:,}/{remote_size:,} bytes)", end="", flush=True)
print()
final_size = os.path.getsize(part_path)
if final_size == remote_size:
os.rename(part_path, local_path)
print(f"Complete: {local_path}")
else:
print(f"Size mismatch: expected {remote_size:,}, got {final_size:,}")
return bytes_downloaded
def download(self, remote_path, local_path, verify="size", tail_size=1048576, chunk_size=32768):
"""
Download with the specified verification method.
Args:
remote_path: Path to file on server
local_path: Local destination path
verify: Verification method - "size", "timestamp", or "tail"
tail_size: Bytes to checksum from end of file when verify="tail" (default: 1MB)
chunk_size: Download chunk size in bytes
"""
part_path = local_path + ".part"
meta_path = local_path + ".meta"
if verify == "tail":
print(f"Calculating remote tail checksum (last {tail_size:,} bytes)...")
checksum, size = self.calculate_remote_tail_checksum(remote_path, tail_size, chunk_size)
remote_value = f"{checksum}:{size}"
print(f"Remote tail MD5: {checksum} (file size: {size:,})")
elif verify == "timestamp":
remote_value = str(self.get_remote_mtime(remote_path))
else: # size
remote_value = str(self.get_remote_size(remote_path))
# If .part exists but .meta is missing, start fresh (can't verify integrity)
if os.path.exists(part_path) and not os.path.exists(meta_path):
print("Partial download found without metadata. Starting fresh.")
os.remove(part_path)
# Check for existing partial download
if os.path.exists(part_path) and os.path.exists(meta_path):
with open(meta_path, "r") as f:
saved_value = f.read().strip()
if saved_value != str(remote_value):
print("Remote file changed since last download. Starting fresh.")
os.remove(part_path)
os.remove(meta_path)
elif verify == "tail":
print("Checksum matches. Resuming download...")
# Save current value for future verification
with open(meta_path, "w") as f:
f.write(str(remote_value))
# Download
self._download_with_resume(remote_path, local_path, chunk_size)
# Clean up meta file on success
if not os.path.exists(part_path) and os.path.exists(meta_path):
os.remove(meta_path)
@retry(
stop=stop_after_attempt(3),
wait=wait_fixed(5),
retry=retry_if_exception_type((paramiko.SSHException, OSError, IOError)),
reraise=True
)
def download_with_retry(self, remote_path, local_path, verify="size", tail_size=1048576, chunk_size=32768):
"""
Download a file with automatic retry on connection failure.
Args:
remote_path: Path to file on server
local_path: Local destination path
verify: Verification method - "size", "timestamp", or "tail"
tail_size: Bytes to checksum from end of file when verify="tail"
chunk_size: Download chunk size in bytes
Returns:
True if download completed successfully
"""
try:
self.connect()
self.download(remote_path, local_path, verify, tail_size, chunk_size)
# Check if complete
part_path = local_path + ".part"
if not os.path.exists(part_path):
print("\nTransfer successful!")
return True
# Not complete, will retry
raise IOError("Download incomplete")
finally:
self.disconnect()
def main():
parser = argparse.ArgumentParser(
description="Download files over SFTP with resume support.",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s /uploads/data.zip ./data.zip --host myserver.sftptogo.com --user myuser
%(prog)s /uploads/data.zip ./data.zip --host myserver.com --key ~/.ssh/id_ed25519
%(prog)s /uploads/data.zip ./data.zip --verify tail
Verification methods:
size Compare remote file size (default). Fast and works with all
servers, including cloud-hosted services like SFTP To Go.
timestamp Compare modification time. Fast, but not recommended for
S3-backed SFTP services where timestamps update on every upload.
tail Checksum last 1MB of file (or --tail-size bytes). Good balance
of speed and reliability for large files.
"""
)
parser.add_argument("remote_path", help="Path to file on SFTP server")
parser.add_argument("local_path", help="Local destination path")
parser.add_argument(
"--host",
default=os.environ.get("SFTP_HOST", ""),
help="SFTP server hostname (or set SFTP_HOST env var)"
)
parser.add_argument(
"--port",
type=int,
default=int(os.environ.get("SFTP_PORT", "22")),
help="SFTP server port (default: 22)"
)
parser.add_argument(
"--user",
default=os.environ.get("SFTP_USER", ""),
help="SFTP username (or set SFTP_USER env var)"
)
parser.add_argument(
"--key",
default=os.environ.get("SFTP_KEY", ""),
help="Path to SSH private key (or set SFTP_KEY env var)"
)
parser.add_argument(
"--password",
default=os.environ.get("SFTP_PASSWORD", ""),
help="SFTP password (or set SFTP_PASSWORD env var). Use --key instead when possible."
)
parser.add_argument(
"--verify",
choices=["size", "timestamp", "tail"],
default="size",
help="Method to detect remote file changes (default: size)"
)
parser.add_argument(
"--tail-size",
type=int,
default=1048576,
help="Bytes to checksum when using --verify tail (default: 1048576 = 1MB)"
)
parser.add_argument(
"--chunk-size",
type=int,
default=32768,
help="Download chunk size in bytes (default: 32768)"
)
args = parser.parse_args()
# Validate required arguments
if not args.host:
parser.error("--host is required (or set SFTP_HOST environment variable)")
if not args.user:
parser.error("--user is required (or set SFTP_USER environment variable)")
if not args.key and not args.password:
parser.error("Either --key or --password is required")
print(f"Connecting to {args.user}@{args.host}:{args.port}")
print(f"Remote file: {args.remote_path}")
print(f"Local file: {args.local_path}")
print(f"Verification: {args.verify}")
downloader = SFTPDownloader(
hostname=args.host,
port=args.port,
username=args.user,
key_path=args.key if args.key else None,
password=args.password if args.password else None
)
try:
success = downloader.download_with_retry(
remote_path=args.remote_path,
local_path=args.local_path,
verify=args.verify,
tail_size=args.tail_size,
chunk_size=args.chunk_size
)
sys.exit(0 if success else 1)
except Exception as e:
print(f"\nDownload failed after retries: {e}")
sys.exit(1)
if __name__ == "__main__":
main()
Here’s an example of how the script runs by default with no --verify flags. It defaults to a size check when it first runs, and stores that in a .meta file. If your connection is interrupted, that value is then used as a comparison to see if the file size is still the same as when the initial download started.
In the example below, I pulled out my network cable when the download got just over 73%. It told us that there was a connection error and attempted to connect again. On the third attempt, I plugged the cable back in, and the SFTP download resumed and completed.
Troubleshooting
Running into glitches is all part of the scripting journey, so here are some common troubleshooting tips if you get stuck with any of the examples we have put together for you.
Host key verification failed
If you come across an error like:
paramiko.ssh_exception.SSHException: Server 'host.com' not found in known_hostsYou’ll need to add it to your known_hosts with this command:
ssh-keyscan -t ed25519 host.com >> ~/.ssh/known_hostsYour system will now recognize the host entry and your connection will succeed.
Connection timeouts on large files
Transfers that run for a very long time can sometimes trigger server-side idle timeout, especially if your network’s latency fluctuates. Common errors that you might see:
Socket exception: Connection reset by peeror
paramiko.ssh_exception.SSHException: Server connection droppedThe good news is that the retry logic in our script handles this automatically and reconnects, then resumes from where it left off. If you're getting disconnected quite often, you can also add SSH keep-alive settings to your connection:
transport.set_keepalive(30) # Send a keepalive every 30 secondsThis tells Paramiko to send a keep-alive packet every 30 seconds to stop the server from treating your connection like it’s idle during slower portions of the download.
Download finished, but the file won’t open or is corrupt
If your downloaded file won’t open after resuming, or appears corrupted, the remote file probably changed between download attempts.
For files that are likely to be updated or overwritten, you can use the --verify tail flag. It will perform a checksum on the last 1MB of the file. It’s fast, and it catches most changes.
Wrapping Up
And there we have it, we now have a Python script that handles interrupted SFTP downloads without batting an eyelid. No more starting over from the beginning when your connection drops.
Check out more code samples on Github. If you're looking to automate these transfers on a schedule, explore SFTP automation with tools like Cron To Go.
Frequently asked questions
What’s the difference between sftp.get() and a custom resume script?The built in sftp.get() method tries to download files in one shot and, annoyingly, has no built-in resume support. If it fails mid-transfer then you have to start over. Using this approach tracks the download’s progress in a .part file that allows for continuation after a failed attempt.
Glad you asked! We store the remote file’s size, timestamp or checksum in a .meta file depending on the verify method that you choose. Before resuming, we compare it to the remote file’s attributes to see if they still match. If they are different, we throw out the .part file and start the download all over again.
You can choose your verification method with the --verify flag:
- size (default): fastest, works everywhere including SFTP To Go
- timestamp: fast, but only reliable on traditional SFTP servers
- tail: performs a checksum on the last 1MB of the file, good balance of speed and reliability for checking file changes
You could technically implement that feature using the same kind of logic that we used in our script. You would check the remote file size with sftp.stat() and then seek to that offset on your local file, and continue writing data from your local offset using sftp.open() in append mode. It’s basically the same as the download function, but mirrored.
We do this so that we don’t end up with a partially downloaded executable or configuration file that could be read or executed accidentally. Using a .part extension tells anyone that stumbles across it that it is a partial file and is of no use to them in its current, malnourished state.
The beauty of using Paramiko is that there isn’t any absolute limit to the file sizes you can transfer. SFTP supports 64-bit file offsets, so in theory you could transfer exabytes of data if you had the time, willpower, and nothing better to do. The real limits are your connection speed, network infrastructure, server resources, and storage capacity.
The good news is that the .part file will stay exactly where it was saved to, so its behaviour will be the same as any other kind of interruption. Simply check what went wrong in your console, and run the script again to resume.
Yes you can indeed, but you’ll need to perform a few extra steps. A simple approach that is reasonably secure is to store your password as an environmental variable on your system.
On a Linux system you would add the env value like this:
export SFTP_PASSWORD='your_super_secret_password'
The script already uses the SFTP_PASSWORD environmental variable by default if you use it without the --password flag.
python sftp_resume.py /uploads/data.zip ./data.zip --host myserver.com --user myuser
A less secure option is to use the --password flag followed by your password, which is not recommended, but it is an option.
python sftp_resume.py /uploads/data.zip ./data.zip --host myserver.com --user myuser --password "your_super_secret_password"
Bear in mind that it is more secure to use SSH keys instead of passwords for automated SFTP scripts.
All you need to do is loop over a list or array of remote paths and call the download_with_retry() method for each item. The resume logic works independently per file because it generates a .part and .meta file for each download.