Reframe landscape video to vertical (or any aspect ratio) while keeping faces and objects in shot.
An open-source take on Google AutoFlip: detect what matters in each scene, crop around it, keep the audio.
- What it does
- How it works
- Install
- Quick start (CLI)
- GUI
- Command-line options
- Object weights
- Limitations
- Contributing
- License
FrameShift takes a video, splits it into scenes, finds the faces and objects in each scene, then picks one fixed crop per scene that keeps the important content inside a target frame (9:16 for Reels/Shorts, 1:1 for square, anything you ask for). The original audio is muxed back in with FFmpeg.
The crop is stationary per scene — one crop box held for the whole shot, recomputed at each cut. There's no panning or object tracking within a shot. That keeps the output stable and predictable; it also means fast subject movement inside a single long take can drift out of frame. See Limitations.
Models download themselves on first run into a local models/ folder. Each file is checked against a pinned SHA-256 before it's loaded — a .pt is a pickle archive, so loading an unverified one would run whatever code it carries. A file that fails the check is deleted, not loaded.
- Faces —
yolov11n-face.pt(akanametov/yolo-face), with MediaPipe Face Detection as a fallback if the YOLO model won't load. - Objects —
yolo11n.pt(Ultralytics, 80 COCO classes). Only loaded when you ask for a non-face object weight > 0, so face-only runs stay light.
- Scene detection — PySceneDetect (
ContentDetector) finds the cuts. - Sampling — up to 150 frames per scene are run through the detector.
- Weighted interest region — every detection contributes to a centroid weighted by its
--object_weightsvalue and its area; the union of the boxes sets the size. - Crop — that region is expanded to the target aspect ratio and clamped to the frame.
- Render — the crop fills the output (pan & scan), or fits inside it with padding (black / blurred / solid colour).
- Audio — FFmpeg copies the original track onto the reframed video. No FFmpeg, no audio (you get a warning, the video still renders).
Python 3.8+ required.
git clone https://github.com/fralapo/FrameShift.git
cd FrameShift
pip install -r requirements.txtFFmpeg is optional but needed to keep audio. Install it from ffmpeg.org and make sure ffmpeg is on your PATH:
ffmpeg -versionThe two detection models (~11 MB total) download on the first run and are verified by SHA-256 before use.
Landscape to 9:16, cropped to fill the frame:
python -m frameshift.main input.mp4 output.mp4 --ratio 9:16Fit the whole crop with black bars instead of cropping to fill:
python -m frameshift.main input.mp4 output.mp4 --ratio 1:1 --paddingBlurred bars instead of black (--blur_amount 0–10):
python -m frameshift.main input.mp4 output.mp4 --ratio 16:9 --padding --padding_type blur --blur_amount 5Solid-colour bars (name or (R,G,B)):
python -m frameshift.main input.mp4 output.mp4 --ratio 4:3 --padding --padding_type color --padding_color_value "(0,0,255)"Batch a folder:
python -m frameshift.main videos/ out/ --ratio 4:5 --batchEach output's filename gets a _reframed suffix in batch mode.
There's a Tkinter front-end covering most CLI options — input/output, ratio, padding, object weights, with a cancel button for long jobs.
python frameshift_gui.pyFull walkthrough: FRAMESHIFT_GUI_MANUAL.md.
| Option | Default | What it does |
|---|---|---|
input |
— | Input file, or directory with --batch |
output |
— | Output file, or directory with --batch |
--ratio |
9/16 |
Target aspect ratio: 9:16, 1:1, or a decimal like 0.5625 |
--padding |
off | Fit the crop inside the frame with bars instead of cropping to fill |
--padding_type |
black |
black, blur, or color (needs --padding) |
--blur_amount |
5 |
Blur strength 0–10 when --padding_type blur |
--padding_color_value |
black |
Bar colour name or "(R,G,B)" when --padding_type color |
--output_height |
1080 |
Output height in px; width follows from --ratio |
--interpolation |
lanczos |
nearest, linear, cubic, area, lanczos |
--content_opacity |
1.0 |
Below 1.0, blends content over a blurred full-frame background |
--object_weights |
face:1.0,person:0.8,default:0.5 |
Per-class importance — see below |
--log_file |
none | Write DEBUG logs to a file |
--test |
off | Run a built-in sweep of scenarios against one input |
--batch |
off | Process every video in the input directory |
--object_weights decides what the crop chases. Higher weight = pulls the crop harder toward that class.
# Keep dogs in frame, care less about everything else
python -m frameshift.main in.mp4 out.mp4 --object_weights "face:1.0,dog:0.7,default:0.2"faceis the dedicated face model (or MediaPipe fallback).- Any COCO class —
person,car,dog,bottle, etc. — comes fromyolo11n.pt. - The object model only runs when at least one non-face class has weight > 0, so the default face-and-person setup never loads it.
defaultcovers any detected class you didn't name.
Worth knowing before you rely on it:
- One crop per scene. No tracking or panning inside a shot. A subject that walks across a long take can leave the frame.
- Output codec is
mp4v. Fine for previews, not the smallest or highest-quality option; re-encode with FFmpeg if you need H.264/H.265. - Dependencies are unpinned in
requirements.txt. Ultralytics and MediaPipe change APIs between releases, so a fresh install can pull a version that behaves differently. Pin them if you need reproducibility. --testis an output generator, not an assertion suite — it renders scenarios for you to eyeball, it doesn't pass/fail.
Issues and pull requests welcome at github.com/fralapo/FrameShift. If you hit a reframing result that looks wrong, attaching the input clip (or a short sample) makes it much easier to reproduce.
Ideas on the table: in-shot tracking/panning as an opt-in mode, configurable output codec, pinned dependencies.
MIT. See LICENSE.