java_triage.py is a static triage tool for suspicious Java codebases, decompiled JARs, and Minecraft mods.
It can decompile JARs with CFR, rewrite supported obfuscated string patterns, scan suspicious strings and behaviors, identify suspicious artifacts, resolve runtime C2 hints from on-chain config data, optionally inspect a resolved stage-2 JAR in static-only mode, query external enrichment APIs (RatterScanner and JLab static scan), and produce Rich console, JSON, and HTML reports.
- Decompiles JARs with CFR as part of the workflow when scanning from a directory containing a target JAR and a local CFR JAR.
- Deobfuscates
StringDecrypt.decrypt(new byte[]{...})calls with multi-pass rewrite support. - Deobfuscates
load(new int[]{...}, new int[]{...}, k1, k2)patterns. - Includes deterministic length-seeded XOR-stream candidate support used by common Java obfuscators.
- Tracks deobfuscation stats such as seen, replaced, unresolved, per-family counts, and pass count.
- Scans plain Java string literals for suspicious indicators including URLs, command execution strings, payload paths, encoded blobs, and keywords.
- Reconstructs additional obfuscation patterns from source, including split
String[]fragments, printablebyte[]orchar[]literals, and reversedStringBuilder(...).reverse().toString()forms. - Detects Discord indicators, including bot tokens, webhook URLs, and snowflake IDs.
- Detects Discord Chromium encrypted-token marker payloads (
dQw4w9WgXcQ:<base64>) and classifies them as credential-theft context. - Detects additional comms indicators, including Telegram bot tokens, Telegram API patterns, and generic non-Discord webhook patterns.
- Detects additional encoded literals such as Base64, Base32, hex, and XOR-recovered text where possible.
- Performs a full XOR string decoding pass over all
getBytes("ISO-8859-1")andtoCharArray()prefixed-key patterns, capturing complete decoded strings — including JSON payload templates, User-Agent strings, static UUIDs, and persistence paths — rather than only filtering to "interesting" candidates. - Traces Minecraft session/token/identity API calls through variable assignments to network/write sinks (data flow tracer), emitting specific
dataflow_*behavior findings when a source-to-sink path is confirmed. - Detects multi-payload exfiltration architectures where malware sends tiered POST requests — e.g. a lightweight prefire beacon followed by a separate full-credential profile POST.
- Detects self-copy + detached re-launch persistence chains: JAR resolves its own path → copies to LOCALAPPDATA → spawns javaw.exe detached → survives game shutdown.
- Classifies decoded strings into categories such as URLs, RPC templates, credential fields, paths, and crypto-related values.
- Falls back to
.classconstant-pool scanning when decompiled.javasources are unavailable. - Expands scan roots by unpacking nested dropped JARs and embedded Base32 archive resources for recursive triage.
- Flags behavior indicators such as:
- dynamic class loading or invocation
- HTTP payload download and exfiltration patterns
- native payload extraction or loading
- command execution and dropper or elevation helpers
- CMSTP, UAC bypass, and Defender tampering indicators
- Adds explicit methodology detections for obfuscated token/session access patterns and token-harvest vectors, including:
- XOR/Base64/Caesar decoded names, MethodHandles, LambdaMetafactory, array-indirect dispatch, split-name reconstruction, Unsafe/VarHandle access, StackWalker indirection, integer-array encoded names, and classloader-bypass access
- class-sweep token harvest, spin-race window harvest, Yggdrasil internal probing, and process-argument/system-property/environment token probing paths
- Adds explicit heavy-obfuscation, decompiler-failure, and class-fallback diagnostic behaviors.
- Splits assessment behavior findings into
benign,needs_review, andsuspicious. - Assigns behavior severities (
critical,high,medium,low,info) and reports severity counts. - Adds verdict-tier grading:
confirmed_behavior,exposed_capability,suspicious_capability, andlibrary_noise. - Emits contradiction/caveat notes when evidence is exposure-only (for example token access without proven automatic exfiltration).
- Suppresses or down-weights generic heuristic noise inside known bundled libraries (for example Gson, Java-WebSocket, SLF4J).
- Adds metadata sections such as
Basic Properties,JAR Info, andBundle Info. - Optionally enriches metadata with
Vhash,SSDEEP,TLSH,TrID, andMagikawhen local tools or libraries are available. - Identifies suspicious artifacts such as
*.jar.*, large opaque.dator.bin, and embedded resource payloads. - Optionally downloads a resolved stage-2 payload JAR and performs static-only archive and content triage without executing code.
- Extracts blockchain indicators from decoded strings such as contracts, selectors, RPC hosts, and RPC URLs.
- Detects known malware variants, runs raw string detections, and applies cross-variant heuristics.
- Queries the free RatterScanner API for discovered SHA256 hashes.
- Queries the JLab public static scan API by uploading the source JAR/ZIP (when available) and includes matched signature results.
- Produces:
- human-readable console output with optional Rich tables and progress bars
- machine-readable JSON output
- standalone HTML reports with clickable sortable columns
- Interactive post-scan prompt for optional stage-2 payload download + AES decryption
- Optional live infrastructure probing (DNS + HTTP HEAD) via post-scan prompt
By default, running:
python java_triage.py <target>will:
- Resolve the target folder or use the current directory.
- If applicable, decompile a selected JAR with CFR into a working source folder.
- Run a quick obfuscation-density probe on the scan root.
- If supported obfuscated call patterns are detected, copy the target to a deobfuscated working folder and rewrite supported string calls there.
- Scan the resulting source tree.
- Optionally resolve runtime C2 hints, perform stage-2 static analysis, and enrich results with RatterScanner and JLab static scan.
- Render the Rich console report and write JSON and HTML reports by default.
If the probe does not detect any supported obfuscated call patterns, no deobfuscated copy is created and the source tree is scanned directly.
Current default probe threshold:
- Total
StringDecrypt.decrypt(...)+load(new int[]{...})calls >=1
Auto output folder naming for rewritten trees:
<target_name>_deobfuscated- if it exists:
<target_name>_deobfuscated_2,_3, etc.
Default report naming:
- scanning
ExampleModwritesExampleMod.jsonandExampleMod.html - scanning a directory such as
example_projectwritesexample_project.jsonandexample_project.html
String literal scanning includes:
- URLs and endpoint-like strings
- Command and LOLBin patterns such as
cmd.exe,powershell, andcmstp - Path and payload indicators such as
.exe,.dll,.jar,.dat,.bin, and temp or appdata paths - High-entropy encoded blobs
- Suspicious keywords such as
token,authorization,webhook, anddefender
Behavior scanning also includes:
- Environment variable access (
System.getenv) - Dynamic class loading via
URLClassLoader - Local Minecraft session or account file path references such as
session.json,launcher_accounts.json, and.minecraft - Possible identity exfiltration when username or UUID reads appear alongside outbound HTTP activity
Discord-focused detection includes:
- Bot tokens
- Webhook URLs (
discord.com/api/webhooks/...) - Snowflake IDs (
17-20digit IDs) - Contextual IDs in literals containing labels like
guild_id,channel_id,user_id,role_id, andapplication_id - Encrypted Chromium token marker blobs (
dQw4w9WgXcQ:<base64>) commonly used in token-stealer chains
To reduce false positives, session or account path detection requires:
- the token to appear inside a Java string literal such as
session.json,launcher_accounts.json, or.minecraft - file I/O usage in the same file such as
new File(,Paths.get(,Files.read...,FileInputStream(, orFileReader(
This helps avoid import-only or UI text being misclassified as file access. If outbound HTTP is also present in that file, an additional high-severity signal is raised for possible exfiltration.
The scanner also flags a high-severity indicator when user identifiers are read and outbound HTTP appears in the same file:
- Username reads:
method_1676(),getName(),getUsername() - UUID reads:
method_44717(),GameProfile.getId(),Session.getUuid(), and mapped or Yarn variants - Outbound HTTP markers: discovered host URLs,
HttpClient.send(...),OkHttpClient.newCall(...),HttpURLConnection
If any username or UUID read appears with outbound HTTP, the tool emits possible_minecraft_identity_exfiltration with the source location and evidence.
Expanded alias coverage includes:
- Session presence or access:
method_1548(),getSession(),getUser(),net.minecraft.client.util.Session,new Session(...) - Username access:
method_1676(),getName(),getUsername() - UUID access:
method_44717(),getProfileId(),getUuid(),GameProfile.getId() - Token access:
method_1674(),getAccessToken(),session.getAccessToken()
When enabled, Java Triage will attempt to upload the original source JAR/ZIP to:
https://jlab.threat.rip/api/public/static-scan
Behavior details:
- Enabled by default (
--jlab-static-scan) - Can be disabled with
--no-jlab-static-scan - Requires network access (disabled by
--no-network) - Upload target priority:
- source JAR metadata path/name fallback for directory scans that originated from a JAR
- scan root file if internal analysis root resolves to a
.jar/.zip
- Size and format guardrails:
- only
.jar/.zipare uploaded - max upload size handled by the tool:
50 MB
- only
Returned data is stored under jlab_static_scan in JSON and rendered in Rich/HTML reports, including:
- upload metadata (filename, size, status)
- rate-limit metadata when available
- matched signature count and signature rows (severity, id, name, description, type, count, match preview)
The tool can generate an AI executive summary using either OpenAI or DeepSeek.
OPENAI_API_KEY: enables OpenAI Chat CompletionsDEEPSEEK_API_KEY: enables DeepSeek Chat CompletionsTRIAGE_LLM_PROVIDER: optional provider selector:auto(default): tries OpenAI first, then DeepSeekopenai: use only OpenAIdeepseek: use only DeepSeek
TRIAGE_OPENAI_MODEL: OpenAI model override (default:gpt-4.1-mini)TRIAGE_DEEPSEEK_MODEL: DeepSeek model override (default:deepseek-v4-flash)- Common values:
deepseek-v4-flash,deepseek-v4-pro
- Common values:
TRIAGE_DEEPSEEK_REASONING_EFFORT: DeepSeek reasoning effort (default:high)
If neither API key is present, the tool behaves as if this feature does not exist and does not mention AI in the output.
- Python 3.10+ recommended
- Optional:
richfor enhanced terminal output - Optional CLI tools for metadata enrichment:
ssdeep,tlsh,trid,vhash - Optional Python package for metadata enrichment:
magika
No package install is required for the script itself.
# optional, for rich UI output
pip install rich
# optional, for magika metadata enrichment
pip install magikapython java_triage.py [target]target is a directory path (or omitted for current directory).
For a full list of options at any time:
python java_triage.py --help# Scan current directory
python java_triage.py
# Scan a specific unpacked source tree
python java_triage.py ./sample_project
# Disable default auto-decrypt copy or rewrite behavior
python java_triage.py ./sample_project --no-auto-decrypt
# Explicitly write a decrypted copy to a chosen path, then scan it
python java_triage.py ./sample_project --decrypt-codebase-out ./sample_project_deobf
# Rewrite in-place
python java_triage.py ./sample_project --decrypt-codebase-in-place
# Rewrite only, then skip the post-decrypt triage scan
python java_triage.py ./sample_project --no-rescan-after-decrypt
# Disable JSON output
python java_triage.py ./sample_project --no-json
# Save JSON report to a custom file
python java_triage.py ./sample_project --out report.json
# Disable HTML report output
python java_triage.py ./sample_project --no-html
# Save HTML report to a custom file
python java_triage.py ./sample_project --html-out report.html
# Disable all network lookups during analysis
python java_triage.py ./sample_project --no-network
# Disable stage-2 static analysis
python java_triage.py ./sample_project --no-analyze-stage2
# Disable JLab static scan enrichment
python java_triage.py ./sample_project --no-jlab-static-scan
# Wider rich output
python java_triage.py ./sample_project --rich-width 220target: folder to scan (default: current directory)--json: emit JSON output (enabled by default)--no-json: emit text or Rich output instead of JSON--out <path>: write output to file--html: also emit an HTML report (enabled by default)--no-html: disable HTML report output--html-out <path>: write HTML report to a custom file--no-progress: disable progress messages--no-network: disable runtime C2 resolution and related network lookups--jlab-static-scan: upload source JAR/ZIP to JLab public static scan API and include matched signature results (enabled by default)--no-jlab-static-scan: disable JLab public static scan lookup--analyze-stage2: after resolving a runtime payload endpoint, download the stage-2 JAR and perform static-only analysis (enabled by default)--no-analyze-stage2: disable stage-2 static analysis--rich-width <int>: preferred Rich console width for progress and final report rendering--decrypt-codebase-in-place: rewrite supported encrypted string calls in the target tree directly--decrypt-codebase-out <path>: copy the tree to<path>, rewrite there, then scan that rewritten tree--no-rescan-after-decrypt: perform rewrite only and exit--no-auto-decrypt: disable opportunistic auto-decrypt probe and rewrite behavior--decipher-codebase: produce a deciphered copy of the target with all XOR-obfuscatedgetBytes/toCharArraystrings replaced by decoded literals, then scan both copies (enabled by default; disable with--no-auto-decrypt)--decipher-only <path>: decipher a single.javafile and write decoded strings to JSON (no scan)
The following behavior IDs were added for explicit methodology coverage and can be searched directly in JSON output:
obf_xor_encoded_name_accessobf_base64_encoded_name_accessobf_caesar_encoded_name_accessobf_methodhandle_token_accessobf_lambdametafactory_token_accessobf_array_indirect_dispatch_token_accessobf_split_reassembled_name_accessobf_unsafe_field_token_accessobf_varhandle_field_token_accessobf_stackwalker_indirect_accessobf_int_array_encoded_name_accessobf_classloader_bypass_token_accesstoken_class_sweep_static_field_harvesttoken_spin_race_window_harvesttoken_yggdrasil_internal_probetoken_process_commandline_harvesttoken_processhandle_commandline_probetoken_runtime_mxbean_arg_probetoken_system_property_auth_probetoken_environment_auth_probetoken_sun_java_command_probetoken_jdk_internal_process_probedataflow_token_to_network_sinkdataflow_username_to_network_sinkdataflow_uuid_to_network_sinktwo_payload_exfil_architecturepersistence_filesystem_copy_relaunch_chainpersistence_detached_process_relaunchc2_fallback_domainpayload_download_endpointpersistence_install_directorypython_executable_referencepython_script_referenceexfil_endpoint_prefiremcexfil_endpoint_submit_logpython_subprocess_argument_chaindetached_process_runtime_indicatorminecraft_coordinate_exfiltrationdiscord_webhook_url_reassemblymulti_path_exfil_breakdowninline_xor_string_decodersensitive_game_data_comment
The decipher section in JSON reports contains counts of XOR strings replaced and files changed when --decipher-codebase is used (enabled by default).
Text and Rich output include:
- Basic Properties, JAR Info, and Bundle Info
- Cryptocurrency Addresses
- Discord / Webhook Indicators
- Windows Persistence / Staging Indicators
- Decode and string findings (sorted by category priority)
- Assessment findings (
benign,needs_review,suspicious) - Behavioral findings (sorted by severity)
- Artifact findings
- Network Endpoint Assessment
- Runtime C2 Resolution
- Assembled C2 URLs
- Infrastructure Probe Results
- Blockchain Indicators
- Variant Detections
- Raw String Detections
- Heuristic Detections
- RatterScanner results
- JLab static scan results (sorted by severity)
- Stage-2 Analysis status
- Interactive post-scan download + decrypt prompt
- Summary counts and verdict layers
JSON output includes the full scan payload, including:
target_metadataruntime_c2url_assemblyinfra_probestage2_analysisblockchain_indicatorsnetwork_endpoint_assessmentvariant_detectionsraw_string_detectionsheuristic_detectionsratter_scannerjlab_static_scandecipherfindingsbehavior_findingsartifact_findingssummary
HTML output is a standalone styled report and includes:
- top-level summary cards and overall assessment
- executive summary, when available
- expanded metadata and enrichment sections
- clickable column headers for sorting tables
- omission of categories that are completely empty
- Bitcoin/cryptocurrency address detection — Base58 P2PKH/P2SH + Bech32 regex, dedicated
cryptocurrency_addresscategory - Java comment scanning — extracts
//and/* */comments for malware self-documentation (coordinate exfil, stealer labels, C2 references) - Inline XOR string decoder — Skidfuscator-style
byte[] arr = "XORdata"first-byte-key patterns - Full XOR decode pass — captures all decoded
getBytes/toCharArraystrings, not just "interesting" ones - Discord keyword detection — catches
"Discord Notification"and similar in decoded strings - Coordinate exfiltration detection —
minecraft_coordinate_exfiltrationbehavior when position reads meet Discord/HTTP - Discord webhook URL reassembly detection — flags XOR-fragmented webhook URLs with snowflake IDs
- Multi-path exfiltration breakdown —
multi_path_exfil_breakdowndescribes exactly which data flows to which endpoint - Windows persistence/staging — dedicated section showing env vars, staging paths, executables, launched payloads, confirmed/not-confirmed persistence
LOCALAPPDATA/APPDATA/TEMP→path(wasstring)-restarted/-cp/-Detached→dynamic_execution(waspath)java.home→path(wascomms_indicator)"null"JSON placeholder →string(waspath)User-Agent/Content-Type:→http_header(wasstring)- Bitcoin addresses →
cryptocurrency_address(was missing orhex_decoded_binary) "Discord Notification"→discord_indicator(wasstring)
- C2 URL assembler —
assemble_c2_urls()builds full URLs from blockchain-resolved domain + decoded path fragments - Infrastructure probe — DNS + HTTP HEAD (Range: bytes=0-0 for CDN) — OPT-IN via post-scan prompt
- Enhanced
resolve_runtime_c2()— assembles correctpayload_endpointfrom path fragments, not guessing/api/delivery/handler - AES stage-2 decryption —
_aes_decrypt_stage2_blob()decrypts Zenith-style AES/CBC/NoPadding payloads using key from source --analyze-stage2no longer auto-downloads — download is deferred to the interactive Y/N prompt after the scan
- Tables sorted by priority — decoded findings by category danger, behaviors by severity, JLab signatures by severity
- Clickable HTML headers — all smart-tables have click-to-sort column headers
- HTML column width improvements — behavior File/Behavior thinner, Evidence wider; JLab ID thinner, Name wider
- Rich & HTML parity — same sections in same order across console and HTML
- Windows Persistence / Staging section — env vars, paths, executables, payloads, confirmed/not-confirmed
- Cryptocurrency Addresses section — dedicated card for BTC addresses
- Discord / Webhook Indicators section — dedicated card with signal type + value
- Assembled C2 URLs section — full URLs with method + description
- Infrastructure Probe Results section — live/dead/error status per endpoint
- Post-scan Y/N prompt for stage-2 download + AES decrypt
- Post-scan y/N prompt for endpoint probing
- Neither runs automatically — fully opt-in
- This is a triage helper, not a full malware sandbox or decompiler.
- The deobfuscation stage is deterministic and heuristic-based; unsupported custom routines may still remain unresolved.
- Class-constant fallback mode provides useful indicators but less semantic context than full source scanning.
- Behavioral and signature detections are heuristic-based and may produce false positives or miss novel techniques.
- Network-based runtime C2 resolution and stage-2 enrichment are best-effort and may fail due to missing indicators, DNS failure, RPC issues, or decoding variance.
- External API enrichments (RatterScanner/JLab) are best-effort and may fail due to network issues, API errors, rate limits, or response format changes.
- JLab public scan is an external experimental endpoint; response fields and behavior may change over time.
- Metadata enrichments such as
SSDEEP,TLSH,TrID,Magika, andVhashare best-effort and only appear when dependencies are available. - Nested archive or payload extraction is heuristic and best-effort; highly custom packers may still evade static expansion.
- Do not rely on this tool alone to determine whether a Java application is safe.
