Skip to content

cyberixae/scut

Repository files navigation

scut

A command-line tool for parsing arbitrary text into CSV.

What scut does

Reorder fields — something cut can't:

echo "a b c" | scut "2,1,3"
# b,a,c

Duplicate a field:

echo "x y" | scut "1,1,2"
# x,x,y

Reach into one field and re-parse it, leaving the rest alone:

echo "name=foo size=1024" | scut "2[at:=:2]"
# 1024

Glue an over-tokenized value back together and re-split it differently — the spec is short and the alternative in awk/perl isn't:

echo "May 21 09:21 a message" | scut "+1-3[on: :-]"
# May,21,09:21

Swap one delimiter for another in one step:

echo "a-b,c-d" | scut "[at:,-:+-]"
# "a,b,c,d"

(The output is CSV-quoted because the cell itself contains commas.)

Cheat sheet

The selectors (what to pick from a record):

Spec Meaning
1 field 1
1-3 fields 1 to 3
-3 up to field 3
3- from field 3 onwards
- every field
1,3 fields 1 and 3 (non-contiguous; emits two cells)
1,1 field 1 picked twice
+1-3 join fields 1 to 3 into a single cell
+1+3 join two non-contiguous fields into a single cell

The chops (how to split a record):

Spec Meaning
[at:,:1] split at ,
[at:,.:1] split at either , or .
[on:, :1] split on the literal sequence ", "
[chars::1-3] split into individual characters
[ws::1-3] split on any run of whitespace
1-3 bare form — shorthand for [ws::1-3]

Putting them together:

Spec Meaning
[at: :1[at:-:2],2] sub-block re-splits only field 1 — field 2 passes through
[at: :1,2][at:-:2] chained blends apply to every output of the first
[at: :+1-3] + joins fields 1–3 using the chop's delimiter
1[at:-:2] (and bare form works too — same as [ws::1[at:-:2]])

Recipes

File size and name from ls -l:

ls -l | tail -n +2 | scut "5,9"
# 3598,CLAUDE.md
# 5538,GRAMMAR.md
# ...

PID, CPU, and full command from ps aux (the +11- joins field 11 onwards back into one cell, so a command like /sbin/init splash survives the whitespace split):

ps aux | tail -n +2 | scut "2,3,+11-"
# 1,0.0,/sbin/init splash
# 2,0.0,[kthreadd]
# ...

Year, month, day from an ISO date:

echo "2026-05-24" | scut "[at:-:1-3]"
# 2026,05,24

Mount points using /dev storage, via csvkit. scut emits headerless CSV; csvkit assumes a header row, so prepend one when chaining:

{ echo "fs,use"; df -h | tail -n +2 | scut "1,5"; } | csvgrep -c fs -m "/dev"
# fs,use
# /dev/mapper/ubuntu--vg-ubuntu--lv,17%
# /dev/nvme0n1p2,13%

For simple cases, plain sort also pairs well:

ls -l *.md | scut "5,9" | sort -t, -k1 -nr
# 7210,README.md
# 5538,GRAMMAR.md
# 3598,CLAUDE.md

scut hands off cleanly to anything that reads CSV — csvkit, miller, sqlite-utils, or your own scripts.

How the spec works

A scut spec describes how to blend a record: chop it into fields, then glue the wanted fields back into a row.

The blend block

A blend has three colon-separated parts:

[ chop : args : glues ]

chop decides how to split the line, args is the chop's delimiter (when one is needed), and glues is a comma-separated list of selectors saying which fields to keep.

Chops

Chop What it does
at split at any of the characters in args
on split on the literal sequence in args
chars split into individual characters (no args)
ws split on any whitespace run (no args)
echo "a,b,c"      | scut "[at:,:1-3]"
# a,b,c
echo "a, b, c"    | scut "[on:, :1-3]"
# a,b,c
echo "abcde"      | scut "[chars::1-3]"
# a,b,c

at: and ws: treat a run of delimiter characters as a single separator: a,,b with [at:,:-] yields two fields, not three. A leading or trailing run still produces one empty field on the outside, so a, and a,,, both give a followed by one empty cell. on: does not collapse — every occurrence of the literal sequence separates two fields, so a,,b with [on:,:-] yields three.

Glues

After the second colon comes one or more selectors, separated by commas. Fields are numbered from 1, like cut and awk — there is no field 0.

  • A single field: 2.
  • A range: 2-4. Bounds can be omitted on either side (-3, 2-, or just - for every field).
  • A concat: +2-4 (one cell, fields 2–4 joined with the chop's delimiter), +1+3+5 (one cell, three non-contiguous fields joined).

Selectors combine: 1,+2-4,5 produces a row of three cells — field 1, then fields 2–4 joined, then field 5.

Sub-blocks

A selector can be followed by one or more [...] blends that re-process the picked value:

echo "a-1 b-2" | scut "[at: :1[at:-:2]]"
# 1

On a pick or range, the sub-block runs per field. On a + group, it runs on the joined cell. Multiple sub-blocks chain left-to-right: 2[at:-:1][on:.:1].

Chained blends

Adjacent top-level blends chain — the output of one feeds the next:

echo "ab-12 cd-34" | scut "[at: :1,2][at:-:2]"
# 12,34

Bare form

For the common case (whitespace splitting), the outer [ws::…] brackets are optional at the top level. 1-3 is identical to [ws::1-3]. Sub-blocks always require explicit brackets — the bare form is a top-level affordance only. Brackets that appear inside a bare spec attach to the preceding selector as a sub-block, not as a chained blend.

The chop's rejoin delimiter (used by +) is the first character of at:'s args, the full sequence of on:, the empty string for chars, and a space for the default whitespace chop.

For the full grammar, see GRAMMAR.md.

Limitations

scut is a one-way street from arbitrary text to CSV. It doesn't:

  • Aggregate or group (no sums, counts, group-by) — pipe into csvkit, miller, or awk.
  • Read CSV input — for re-processing CSV, csvkit is the right tool.
  • Join multiple files.
  • Handle quoted fields in input (input is treated as opaque text and split mechanically).
  • Process multi-line records.
  • Emit a header row.

If a record has fewer fields than the spec asks for, the missing cells are emitted empty.

Installation

Currently scut builds from source. With Python 3.10 or newer installed:

git clone https://github.com/cyberixae/scut.git
cd scut
python3 -m venv .venv
make install-build
make build
# binary at dist/scut

For development without building, run the module directly:

echo "a b c" | python3 src/scut.py "1-3"
# a,b,c

Motivation

cut can only remove bytes — it can't reorder or duplicate fields. csvkit can reorder and duplicate columns, but only of CSV input; it has no facility for parsing arbitrary delimited text. The traditional answer for turning raw CLI output or other unstructured text into CSV is a throwaway perl or awk script. scut is the tool you reach for instead.

About

A command-line tool for parsing arbitrary text into CSV

Resources

Stars

Watchers

Forks

Contributors