Skip to content

Commit 594a8ea

Browse files
committed
docs(safari): add RFC-011 data storage
Documents Safari's profile structure, per-category file layouts, and storage formats including the Safari 17+ nested WebKit Origins localStorage layout and binary SecurityOrigin serialization. Defers Keychain credential extraction to RFC-006 §7 and notes the cross-browser differences (plaintext cookies, plist bookmarks/downloads, Core Data epoch timestamps, partitioned storage).
1 parent 1561898 commit 594a8ea

1 file changed

Lines changed: 271 additions & 0 deletions

File tree

rfcs/011-safari-data-storage.md

Lines changed: 271 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,271 @@
1+
# RFC-011: Safari Data Storage
2+
3+
**Author**: moonD4rk
4+
**Status**: Living Document
5+
**Created**: 2026-04-21
6+
7+
## 1. Overview
8+
9+
Safari is **macOS-only** and sandboxed under App Sandbox. Most of Safari's user data lives inside `~/Library/Containers/com.apple.Safari/Data/Library/` (the container root) and requires **Full Disk Access (TCC)** for third-party processes to read. A few legacy files still reside at `~/Library/Safari/` for backwards compatibility.
10+
11+
Unlike Chromium and Firefox, Safari does **not** encrypt bookmarks, history, cookies, downloads, or localStorage — all are stored in plaintext on disk. Passwords are the only encrypted category and are delegated to the macOS login Keychain (see [RFC-006](006-key-retrieval-mechanisms.md) §7).
12+
13+
Safari 17 (September 2023) introduced **multi-profile support**. Profile discovery therefore has two layers: a synthetic "default" profile mapped to the pre-profile legacy paths, plus one or more named profiles enumerated from `SafariTabs.db`.
14+
15+
## 2. Profile Structure
16+
17+
Each `profileContext` (in `browser/safari/profiles.go`) tracks five fields:
18+
19+
| Field | Meaning |
20+
|-------|---------|
21+
| `name` | Human-readable profile name, disambiguated for duplicates |
22+
| `uuidUpper` | UUID in uppercase (used by `Safari/Profiles/<UUID>/` directories) |
23+
| `uuidLower` | UUID in lowercase (used by `WebKit/WebsiteDataStore/<uuid>/` directories) |
24+
| `legacyHome` | `~/Library/Safari` |
25+
| `container` | `~/Library/Containers/com.apple.Safari/Data/Library` |
26+
27+
Empty `uuidUpper` marks the synthetic default profile.
28+
29+
### 2.1 Profile Discovery
30+
31+
The default profile is always emitted first. Named profiles come from `SafariTabs.db`:
32+
33+
```sql
34+
SELECT external_uuid, title FROM bookmarks
35+
WHERE subtype = 2 AND external_uuid != 'DefaultProfile'
36+
```
37+
38+
`DefaultProfile` is Safari's sentinel string for the pre-profile era; it is filtered out because it is already represented by the synthetic default.
39+
40+
If the DB cannot be opened (missing, permission-denied), Safari falls back to scanning `Safari/Profiles/` for any directory whose name is a canonical 8-4-4-4-12 UUID and synthesizing the name as `profile-<uuid[:8]>`. This makes profile discovery robust even when TCC blocks the SQL read.
41+
42+
Duplicate display names are disambiguated with `-2`, `-3`, … suffixes, deterministically by discovery order.
43+
44+
### 2.2 UUID Case Asymmetry
45+
46+
Safari uses two different casings for the same profile UUID across the container:
47+
48+
| Path prefix | Casing | Example |
49+
|-------------|:------:|---------|
50+
| `Safari/Profiles/<UUID>/` | Uppercase | `5604E6F5-02ED-4E40-8249-63DE7BC986C8` |
51+
| `WebKit/WebsiteDataStore/<uuid>/` | Lowercase | `5604e6f5-02ed-4e40-8249-63de7bc986c8` |
52+
53+
`profileContext` stores both to avoid case-folding at every call site.
54+
55+
## 3. Data File Locations
56+
57+
### 3.1 Default Profile
58+
59+
| Category | Path | Format |
60+
|----------|------|--------|
61+
| History | `~/Library/Safari/History.db` | SQLite |
62+
| Cookie | `Container/Cookies/Cookies.binarycookies`, then `~/Library/Cookies/Cookies.binarycookies` | BinaryCookies |
63+
| Bookmark | `~/Library/Safari/Bookmarks.plist` | plist |
64+
| Download | `~/Library/Safari/Downloads.plist` | plist |
65+
| LocalStorage | `Container/WebKit/WebsiteData/Default/` | WebKit Origins dir |
66+
| Password | macOS Keychain ||
67+
68+
The Cookie path is resolved in priority order — the first candidate that exists wins. Modern (macOS 14+) installs keep cookies in the sandboxed container; the legacy path is kept as a fallback for upgraded systems.
69+
70+
### 3.2 Named Profiles
71+
72+
| Category | Path | Format |
73+
|----------|------|--------|
74+
| History | `Container/Safari/Profiles/<UUID>/History.db` | SQLite |
75+
| Cookie | `Container/WebKit/WebsiteDataStore/<uuid>/Cookies/Cookies.binarycookies` | BinaryCookies |
76+
| Download | `~/Library/Safari/Downloads.plist` (filtered by UUID) | plist |
77+
| LocalStorage | `Container/WebKit/WebsiteDataStore/<uuid>/Origins/` | WebKit Origins dir |
78+
79+
Bookmark is intentionally **omitted** from named profiles: `Bookmarks.plist` is a shared plist with no per-entry profile tag, so it is attributed to the default profile only. Duplicate bookmarks would otherwise be emitted per profile.
80+
81+
Downloads is shared across all profiles but each entry carries a `DownloadEntryProfileUUIDStringKey`; the extractor filters at read time so each profile sees only its own downloads.
82+
83+
Passwords live in the user-scope Keychain, not on a per-profile basis — only the default profile emits passwords to avoid duplicates across the output.
84+
85+
## 4. Data Storage Formats
86+
87+
### 4.1 History (History.db — SQLite)
88+
89+
```sql
90+
SELECT url, title, visit_count, visit_time
91+
FROM history_items
92+
LEFT JOIN history_visits ON history_items.id = history_visits.history_item
93+
```
94+
95+
Schema notes:
96+
- `visit_time` is a `REAL` column using the **Core Data epoch** (see Section 5)
97+
- One item → many visits; the extractor takes the most recent visit per item
98+
- Results are sorted by `visit_count` descending
99+
100+
### 4.2 Cookies (Cookies.binarycookies — binary)
101+
102+
Apple's proprietary BinaryCookies format — not SQLite, not a documented format. Parsed by the [go-binarycookies](https://github.com/moond4rk/go-binarycookies) library.
103+
104+
High-level layout:
105+
106+
```
107+
| "cook" magic | page_count | page_sizes[] | pages[] |
108+
|--------------|------------|------------------|--------------------------|
109+
| 4B | 4B (BE) | page_count × 4B | variable |
110+
```
111+
112+
Each page is an index-of-cookies table followed by per-cookie records. A cookie record carries flags (`isSecure`, `isHTTPOnly`), URL/name/path/value offsets into the record, and creation / expiry timestamps in Core Data epoch.
113+
114+
Cookie values are **plaintext** — no per-cookie encryption. This is a fundamental divergence from Chromium, which encrypts `encrypted_value` with the OS master key.
115+
116+
### 4.3 Bookmarks (Bookmarks.plist — property list)
117+
118+
A nested dictionary tree with a `WebBookmarkType` discriminator at each node:
119+
120+
| Type | Meaning | Additional keys |
121+
|------|---------|-----------------|
122+
| `WebBookmarkTypeList` | Folder | `Children` (array) |
123+
| `WebBookmarkTypeLeaf` | URL entry | `URLString`, `URIDictionary.title` |
124+
125+
The extractor walks the tree recursively, collecting leaf nodes into a flat list. Folder names are not preserved (only URL + title pairs are exported).
126+
127+
### 4.4 Downloads (Downloads.plist — property list)
128+
129+
A flat structure with a `DownloadHistory` array. Relevant keys per entry:
130+
131+
| Key | Meaning |
132+
|-----|---------|
133+
| `DownloadEntryURL` | Source URL |
134+
| `DownloadEntryPath` | Local filesystem path |
135+
| `DownloadEntryBytesReceivedSoFar` | Bytes downloaded |
136+
| `DownloadEntryProfileUUIDStringKey` | Owning profile's uppercase UUID, or `"DefaultProfile"` |
137+
138+
The extractor filters by the caller-provided owner UUID so each profile reports its own downloads. MIME type and start/end times are not stored by Safari — `MimeType` is always empty in the output.
139+
140+
### 4.5 Passwords (macOS Keychain)
141+
142+
Safari does **not** persist passwords to a file in its container. All credentials live in `login.keychain-db`, accessible via `InternetPassword` records. The extractor reads them directly through [keychainbreaker](https://github.com/moond4rk/keychainbreaker) and reconstructs the URL from `(protocol, server, port, path)`.
143+
144+
Default port handling:
145+
146+
| Protocol | Default port | URL rendering |
147+
|----------|-------------:|---------------|
148+
| `https` | 443 | `https://host/path` (port omitted) |
149+
| `http` | 80 | `http://host/path` (port omitted) |
150+
| `ftp` | 21 | `ftp://host/path` (port omitted) |
151+
| Other || `scheme://host:port/path` |
152+
153+
The `htps` FourCC protocol code emitted by some Keychain entries is normalized to `https`.
154+
155+
Partial-extraction mode: if the Keychain cannot be unlocked (no `--keychain-pw` supplied, or the password is wrong), metadata-only records are still emitted — URL, username, timestamps — with `PlainPassword` left blank. See [RFC-006](006-key-retrieval-mechanisms.md) §7 for the full credential-extraction architecture.
156+
157+
### 4.6 LocalStorage (WebKit Origins — nested SQLite)
158+
159+
Safari 17+ stores localStorage under a **partition-aware nested tree**, rooted at:
160+
161+
| Profile | Root path |
162+
|---------|-----------|
163+
| Default | `Container/WebKit/WebsiteData/Default/` |
164+
| Named | `Container/WebKit/WebsiteDataStore/<uuid>/Origins/` |
165+
166+
Under the root, two levels of hashed directories lead to the actual data:
167+
168+
```
169+
<root>/<top-frame-hash>/<frame-hash>/
170+
├── origin ← binary-serialized origins (top + frame)
171+
└── LocalStorage/
172+
├── localstorage.sqlite3 ← ItemTable(key TEXT UNIQUE, value BLOB NOT NULL)
173+
├── localstorage.sqlite3-shm
174+
└── localstorage.sqlite3-wal
175+
```
176+
177+
`top-frame-hash == frame-hash` for **first-party** storage. They differ for **partitioned third-party** storage (an iframe with a different origin than the top document). The named profile root additionally carries a `salt` sibling file used by WebKit's origin-hashing — skipped at traversal time.
178+
179+
The flat `WebsiteDataStore/<uuid>/LocalStorage/<scheme>_<host>_<port>.localstorage` layout used by older WebKit is **empty on modern Safari** and is not supported.
180+
181+
#### Origin file format
182+
183+
Two `origin` blocks back-to-back — top-frame then frame. Each block:
184+
185+
```
186+
| scheme record | host record | port section |
187+
|--------------------------|--------------------------|-----------------|
188+
| uint32_le len | enc byte | uint32_le len | enc byte | 0x00 |
189+
| <len bytes> | <len bytes> | |
190+
or
191+
| 0x01 | uint16_le port |
192+
```
193+
194+
- `enc byte`: `0x01` = Latin-1/ASCII (common), `0x00` = UTF-16 LE
195+
- Port section: `0x00` marker means "use scheme default" (stored as port 0 in the parsed struct); `0x01` marker is followed by a 2-byte little-endian port
196+
197+
The extractor reads both blocks and reports the **frame origin URL** — that is what JavaScript's `window.localStorage` actually exposes in the partitioned case. If only the top-frame block is parseable, the extractor falls back to it.
198+
199+
#### ItemTable
200+
201+
```sql
202+
SELECT key, value FROM ItemTable
203+
```
204+
205+
Schema: `(key TEXT UNIQUE ON CONFLICT REPLACE, value BLOB NOT NULL ON CONFLICT FAIL)`.
206+
207+
Values are **UTF-16 LE** encoded JS strings. Oversized values (≥ 2048 bytes) are replaced with a size marker in the output — this matches the cap used by the Chromium extractor ([RFC-002](002-chromium-data-storage.md) §4.8) and keeps JSON/CSV exports bounded.
208+
209+
## 5. Time Formats
210+
211+
Safari uses the **Core Data epoch** — 2001-01-01 00:00:00 UTC, which is **978,307,200 seconds** after the Unix epoch. To convert a Core Data timestamp to Unix time, add `978307200` seconds.
212+
213+
| Data Type | Field | Storage |
214+
|-----------|-------|---------|
215+
| History | `visit_time` | REAL seconds, Core Data epoch |
216+
| Cookies | `creation`, `expiry` | REAL seconds, Core Data epoch |
217+
| Downloads || No timestamp stored |
218+
| Passwords | Keychain `Created` | Already Unix time (via keychainbreaker) |
219+
| LocalStorage || No timestamp stored |
220+
221+
Bookmarks carry no timestamp in Safari's plist representation.
222+
223+
## 6. Encryption
224+
225+
Safari's encryption story is deliberately thin:
226+
227+
| Category | Encryption |
228+
|----------|------------|
229+
| History | None (plaintext SQLite) |
230+
| Cookies | None (plaintext binary format) |
231+
| Bookmarks | None (plaintext plist) |
232+
| Downloads | None (plaintext plist) |
233+
| LocalStorage | None (plaintext SQLite; UTF-16 LE is an encoding, not encryption) |
234+
| Passwords | macOS Keychain — see [RFC-006](006-key-retrieval-mechanisms.md) §7 |
235+
236+
The only encrypted category is passwords. Because they are not stored in Safari's own files at all, there is no Safari-specific cipher, key derivation, or master-key retrieval to document. See RFC-006 for the `InternetPassword` extraction path.
237+
238+
## 7. Platform Specifics
239+
240+
- **macOS-only**. There is no Safari on Windows or Linux.
241+
- **Full Disk Access (TCC)** is required to read the sandboxed container. Without it, cookies / history / downloads / localStorage reads fail silently with permission errors at stat or open time. Legacy paths under `~/Library/Safari/` sometimes remain readable without FDA, but are mostly empty on modern systems.
242+
- **Live-file safety**: `SafariTabs.db`, `History.db`, and `localstorage.sqlite3` can be written to by a running Safari instance. All live SQL reads use `?mode=ro&immutable=1`, which disables WAL replay and locking — the extractor sees a consistent snapshot of the main DB as of read time. Uncommitted WAL content is intentionally not replayed to avoid race-induced corruption.
243+
- **Multi-profile availability**: requires Safari 17 (macOS 14 Sonoma) or newer. Older Safari versions have only the default profile; discovery degrades cleanly via the ReadDir fallback described in §2.1.
244+
- **File acquisition**: all per-profile files are copied into a `filemanager.Session` temp directory before extraction, except the discovery-time `SafariTabs.db` read which opens the live file directly. See [RFC-008](008-file-acquisition-and-platform-quirks.md) for the general pattern.
245+
246+
## 8. Key Differences from Chromium and Firefox
247+
248+
| Aspect | Chromium | Firefox | Safari |
249+
|--------|----------|---------|--------|
250+
| Platform | Cross-platform | Cross-platform | **macOS-only** |
251+
| Profile discovery | `Preferences` sentinel file | Any data file present | `SafariTabs.db` SQL + dir fallback |
252+
| Profile naming | `Default`, `Profile 1`, … | `<prefix>.default-release` | Human-readable title from SafariTabs.db |
253+
| Password storage | Encrypted SQLite (`Login Data`) | Encrypted JSON (`logins.json`) | **macOS Keychain** (no file) |
254+
| Cookie encryption | Encrypted with OS master key | Plaintext | **Plaintext** |
255+
| Cookie format | SQLite | SQLite | Proprietary BinaryCookies binary |
256+
| History | SQLite | SQLite (`places.sqlite`) | SQLite (Core Data epoch) |
257+
| Bookmark | JSON | SQLite (`places.sqlite`) | **plist** |
258+
| Download | SQLite (`History`, shared) | SQLite (`places.sqlite`, shared) | **plist** (filtered by UUID) |
259+
| LocalStorage | LevelDB | SQLite (`webappsstore.sqlite`) | Nested **WebKit Origins** SQLite |
260+
| LocalStorage partitioning | No | No | **Yes** (top-frame + frame hashes) |
261+
| CreditCard / SessionStorage | Supported | Not supported | Not supported |
262+
| Encryption scope | Passwords, cookies, credit cards | Passwords only | Passwords only |
263+
| Time format | WebKit microseconds since 1601 | Mixed (μs for most, ms for passwords) | Core Data seconds since 2001 |
264+
265+
## Related RFCs
266+
267+
| RFC | Topic |
268+
|-----|-------|
269+
| [RFC-001](001-project-architecture.md) | Project architecture and directory layout |
270+
| [RFC-006](006-key-retrieval-mechanisms.md) | §7 covers Safari Keychain credential extraction |
271+
| [RFC-008](008-file-acquisition-and-platform-quirks.md) | File acquisition via `filemanager.Session` |

0 commit comments

Comments
 (0)