Skip to content

feat(aws): add --vllm-extra-args flag and auto GPU detection for RHEL AI#848

Open
are-ces wants to merge 1 commit into
redhat-developer:mainfrom
are-ces:rhelai-tool-calling
Open

feat(aws): add --vllm-extra-args flag and auto GPU detection for RHEL AI#848
are-ces wants to merge 1 commit into
redhat-developer:mainfrom
are-ces:rhelai-tool-calling

Conversation

@are-ces

@are-ces are-ces commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Add --vllm-extra-args flag to mapt aws rhel-ai create for passing additional vLLM arguments, and auto-detect GPU count for tensor parallelism.

--vllm-extra-args

A string flag that appends extra arguments to the RHAIIS vLLM Exec= line in install.conf. Allows configuring tool calling, context length, chat templates, and any future vLLM options without requiring new MAPT flags.

  • If --max-model-len is included in the extra args, it replaces the default 4096 value
  • If not included, the default 4096 is preserved and extra args are appended after it

Auto GPU detection

Automatically detects the number of GPUs on the instance via nvidia-smi and sets --tensor-parallel-size accordingly. Works for any instance type without additional configuration.

Usage:

mapt aws rhel-ai create \
    --auto-start \
    --model meta-llama/Llama-3.1-8B-Instruct \
    --vllm-extra-args '--max-model-len 131072 --enable-auto-tool-choice --tool-call-parser llama3_json --chat-template /opt/app-root/template/tool_chat_template_llama3.1_json.jinja' \
    --expose-ports 8000 \
    ...

Updated Tekton task template with vllm-extra-args param.

Co-Authored-By: Claude Opus 4.6 (1M context) [email protected]

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds RHEL AI vLLM extra-args wiring across the API, CLI, provider setup script, and Tekton task definitions. Also adds Azure RHEL AI Tekton parameters for GPU count and GPU manufacturer, and forwards them into the create command.

Changes

RHEL AI vLLM Extra Args

Layer / File(s) Summary
API, params, and CLI wiring
pkg/target/host/rhelai/api.go, cmd/mapt/cmd/params/params.go, cmd/mapt/cmd/aws/hosts/rhelai.go
RHELAIArgs gains VLLMExtraArgs; new parameter constants are added; the create command wires the new Viper-backed flag into the provider call.
Provider request and setup script
pkg/provider/aws/action/rhel-ai/rhelai.go
rhelAIRequest gains vllmExtraArgs, Create copies the value from args, and rhaiisSetupScript adjusts install.conf for GPU count and vLLM extra arguments.
Tekton task parameters and command assembly
tkn/infra-aws-rhel-ai.yaml, tkn/template/infra-aws-rhel-ai.yaml
The RHEL AI Tekton task adds vllm-extra-args and conditionally appends the matching create-mode CLI flag when non-empty.

Azure RHEL AI GPU Parameters

Layer / File(s) Summary
Task params and command wiring
tkn/infra-azure-rhel-ai.yaml
The Azure Tekton task adds gpus and gpu-manufacturer params and conditionally appends the corresponding create flags.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main AWS RHEL AI changes: vLLM extra args support and GPU auto-detection.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description matches the changes, covering the new vllm-extra-args flag and GPU auto-detection for RHEL AI create.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/provider/aws/action/rhel-ai/rhelai.go`:
- Around line 380-388: The install.conf update in rhelai.go only runs when
toolCallParser is set, so a standalone chatTemplate value is dropped even though
it was accepted earlier. Update the config-building logic around the rhelai.go
script assembly to either validate the chatTemplate/toolCallParser dependency up
front or apply chatTemplate independently in this block, using the existing
toolCallParser and chatTemplate symbols so both settings are handled correctly.
- Around line 381-387: The shell command built in rhelai.go is directly
embedding user-controlled values from r.toolCallParser and r.chatTemplate into
the sed script, which can break the command or allow injection. Update the logic
around toolArgs and the fmt.Sprintf that appends to script so these values are
safely escaped or quoted for single-quoted shell/sed usage before insertion. Use
the existing rhelai configuration-building path as the place to sanitize both
fields consistently before generating install.conf updates.

In `@tkn/infra-aws-rhel-ai.yaml`:
- Around line 326-331: The cmd-building logic in the Tekton script appends raw
params for tool-call-parser and chat-template into an eval string, which can
break quoting and enable shell injection. Update the command assembly to stop
concatenating these params into cmd; instead, build argv incrementally in the
same script section that handles params and execute the command directly without
eval, using the existing parameter checks around tool-call-parser and
chat-template as the place to refactor.

In `@tkn/template/infra-aws-rhel-ai.yaml`:
- Around line 326-331: The command-building logic in the Tekton template is
unsafe because `cmd` is later executed with `eval`, and
`params.tool-call-parser` / `params.chat-template` are being appended as raw
quoted text. Update the script to stop concatenating Tekton params into the eval
string; instead, build the argument list incrementally in the same section that
sets `cmd`, and execute the command directly without `eval`. Use the existing
parameter-handling block around `tool-call-parser` and `chat-template` as the
place to switch from string concatenation to argv-based execution.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: febda5c2-da8c-430e-82cd-93979e54553a

📥 Commits

Reviewing files that changed from the base of the PR and between a43bcd4 and e11da37.

📒 Files selected for processing (6)
  • cmd/mapt/cmd/aws/hosts/rhelai.go
  • cmd/mapt/cmd/params/params.go
  • pkg/provider/aws/action/rhel-ai/rhelai.go
  • pkg/target/host/rhelai/api.go
  • tkn/infra-aws-rhel-ai.yaml
  • tkn/template/infra-aws-rhel-ai.yaml

Comment on lines +380 to +388
if len(*r.toolCallParser) > 0 {
toolArgs := fmt.Sprintf(`--enable-auto-tool-choice \\\n --tool-call-parser %s`, *r.toolCallParser)
if len(*r.chatTemplate) > 0 {
toolArgs += fmt.Sprintf(` \\\n --chat-template %s`, *r.chatTemplate)
}
script += fmt.Sprintf(
` && sudo sed -i 's|--max-model-len.*|--max-model-len 4096 \\\n %s|' %s/install.conf`,
toolArgs, confDir)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

--chat-template is silently ignored unless --tool-call-parser is also set.

Line 380 gates the entire config update on toolCallParser, so a standalone --chat-template value is accepted by the CLI/Tekton layers but never reaches install.conf. Either validate that dependency earlier or handle chatTemplate independently here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/aws/action/rhel-ai/rhelai.go` around lines 380 - 388, The
install.conf update in rhelai.go only runs when toolCallParser is set, so a
standalone chatTemplate value is dropped even though it was accepted earlier.
Update the config-building logic around the rhelai.go script assembly to either
validate the chatTemplate/toolCallParser dependency up front or apply
chatTemplate independently in this block, using the existing toolCallParser and
chatTemplate symbols so both settings are handled correctly.

Comment thread pkg/provider/aws/action/rhel-ai/rhelai.go Outdated
Comment thread tkn/infra-aws-rhel-ai.yaml Outdated
Comment on lines +326 to +331
if [[ "$(params.tool-call-parser)" != "" ]]; then
cmd+="--tool-call-parser '$(params.tool-call-parser)' "
fi
if [[ "$(params.chat-template)" != "" ]]; then
cmd+="--chat-template '$(params.chat-template)' "
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

Do not append raw Tekton params into the eval command string.

These values flow into cmd and are executed via eval later in the script. A tool-call-parser or chat-template containing a single quote can break quoting and run arbitrary shell in the task pod, exposing the mounted credentials. Build argv incrementally and exec it directly instead of extending the eval string.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tkn/infra-aws-rhel-ai.yaml` around lines 326 - 331, The cmd-building logic in
the Tekton script appends raw params for tool-call-parser and chat-template into
an eval string, which can break quoting and enable shell injection. Update the
command assembly to stop concatenating these params into cmd; instead, build
argv incrementally in the same script section that handles params and execute
the command directly without eval, using the existing parameter checks around
tool-call-parser and chat-template as the place to refactor.

Comment thread tkn/template/infra-aws-rhel-ai.yaml Outdated
Comment on lines +326 to +331
if [[ "$(params.tool-call-parser)" != "" ]]; then
cmd+="--tool-call-parser '$(params.tool-call-parser)' "
fi
if [[ "$(params.chat-template)" != "" ]]; then
cmd+="--chat-template '$(params.chat-template)' "
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🔴 Critical | ⚡ Quick win

Do not append raw Tekton params into the eval command string.

These values flow into cmd and are executed via eval later in the script. A tool-call-parser or chat-template containing a single quote can break quoting and run arbitrary shell in the task pod, exposing the mounted credentials. Build argv incrementally and exec it directly instead of extending the eval string.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tkn/template/infra-aws-rhel-ai.yaml` around lines 326 - 331, The
command-building logic in the Tekton template is unsafe because `cmd` is later
executed with `eval`, and `params.tool-call-parser` / `params.chat-template` are
being appended as raw quoted text. Update the script to stop concatenating
Tekton params into the eval string; instead, build the argument list
incrementally in the same section that sets `cmd`, and execute the command
directly without `eval`. Use the existing parameter-handling block around
`tool-call-parser` and `chat-template` as the place to switch from string
concatenation to argv-based execution.

@are-ces are-ces force-pushed the rhelai-tool-calling branch 2 times, most recently from 98e4848 to 2d7165f Compare June 27, 2026 18:25

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/mapt/cmd/params/params.go`:
- Around line 136-137: The standalone --chat-template flag is accepted in the
CLI but ignored unless --tool-call-parser is also set, so the dependency needs
to be enforced or the flag wiring fixed. Update the RHEL AI flow in
cmd/mapt/cmd/aws/hosts/rhelai.go and pkg/provider/aws/action/rhel-ai/rhelai.go
so that either chat-template cannot be used without tool-call-parser (fail fast
with validation) or the provider always passes through chat-template when set,
regardless of toolCallParser.

In `@tkn/infra-azure-rhel-ai.yaml`:
- Around line 88-90: The Tekton param default for gpus is forcing the Azure RHEL
AI task to always pass --gpus 8 when callers omit the value, bypassing the
provider’s unset/0 normalization path. Update the gpus parameter in
infra-azure-rhel-ai.yaml so it does not default to 8, and let the existing
CLI/provider handling in the task logic around the gpus condition at the
referenced block decide the effective GPU count unless a caller explicitly opts
in.
- Around line 238-243: The command-building logic in the Tekton task is
vulnerable because params like gpus and gpu-manufacturer are interpolated into
cmd and later executed with eval. Harden the command assembly by validating or
sanitizing these params before appending them, or avoid eval entirely by passing
arguments safely; update the logic in the script section where cmd is built so
single quotes or shell metacharacters cannot break out of the intended argument.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 4280c80e-9c50-42e3-ba62-d4cda787ff9a

📥 Commits

Reviewing files that changed from the base of the PR and between e11da37 and 2d7165f.

📒 Files selected for processing (7)
  • cmd/mapt/cmd/aws/hosts/rhelai.go
  • cmd/mapt/cmd/params/params.go
  • pkg/provider/aws/action/rhel-ai/rhelai.go
  • pkg/target/host/rhelai/api.go
  • tkn/infra-aws-rhel-ai.yaml
  • tkn/infra-azure-rhel-ai.yaml
  • tkn/template/infra-aws-rhel-ai.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • pkg/target/host/rhelai/api.go
  • cmd/mapt/cmd/aws/hosts/rhelai.go
  • pkg/provider/aws/action/rhel-ai/rhelai.go

Comment thread cmd/mapt/cmd/params/params.go Outdated
Comment on lines +88 to +90
- name: gpus
description: Number of GPUs for the cloud instance (valid marketplace values are 1, 2, 4, 8)
default: "8"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Performance & Scalability | 🟠 Major | ⚡ Quick win

Preserve the Azure RHEL AI GPU default unless callers opt in.

With default: "8", the condition on Line 238 is always true for omitted Tekton params, so this task now always passes --gpus 8. That bypasses the CLI/provider default path where unset/0 GPUs are normalized to 1, which can unexpectedly select larger marketplace SKUs and hit quota/cost limits.

Proposed fix
     - name: gpus
       description: Number of GPUs for the cloud instance (valid marketplace values are 1, 2, 4, 8)
-      default: "8"
+      default: ""

As per path instructions, **: “Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.”

Also applies to: 238-240

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tkn/infra-azure-rhel-ai.yaml` around lines 88 - 90, The Tekton param default
for gpus is forcing the Azure RHEL AI task to always pass --gpus 8 when callers
omit the value, bypassing the provider’s unset/0 normalization path. Update the
gpus parameter in infra-azure-rhel-ai.yaml so it does not default to 8, and let
the existing CLI/provider handling in the task logic around the gpus condition
at the referenced block decide the effective GPU count unless a caller
explicitly opts in.

Source: Path instructions

Comment on lines +238 to +243
if [[ "$(params.gpus)" != "" ]]; then
cmd+="--gpus '$(params.gpus)' "
fi
if [[ "$(params.gpu-manufacturer)" != "" ]]; then
cmd+="--gpu-manufacturer '$(params.gpu-manufacturer)' "
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Validate new params before adding them to the eval’d command.

These Tekton params are interpolated into cmd and executed via eval on Line 272. A value containing a single quote can break out of the intended argument and execute shell syntax.

Proposed hardening
           if [[ "$(params.gpus)" != "" ]]; then
-            cmd+="--gpus '$(params.gpus)' "
+            case "$(params.gpus)" in
+              *[!0-9]*) echo "Parameter gpus must be numeric"; exit 1 ;;
+            esac
+            cmd+="--gpus $(params.gpus) "
           fi
           if [[ "$(params.gpu-manufacturer)" != "" ]]; then
+            case "$(params.gpu-manufacturer)" in
+              *[!A-Za-z0-9._-]*) echo "Parameter gpu-manufacturer contains unsupported characters"; exit 1 ;;
+            esac
             cmd+="--gpu-manufacturer '$(params.gpu-manufacturer)' "
           fi

As per path instructions, **: “Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [[ "$(params.gpus)" != "" ]]; then
cmd+="--gpus '$(params.gpus)' "
fi
if [[ "$(params.gpu-manufacturer)" != "" ]]; then
cmd+="--gpu-manufacturer '$(params.gpu-manufacturer)' "
fi
if [[ "$(params.gpus)" != "" ]]; then
case "$(params.gpus)" in
*[!0-9]*) echo "Parameter gpus must be numeric"; exit 1 ;;
esac
cmd+="--gpus $(params.gpus) "
fi
if [[ "$(params.gpu-manufacturer)" != "" ]]; then
case "$(params.gpu-manufacturer)" in
*[!A-Za-z0-9._-]*) echo "Parameter gpu-manufacturer contains unsupported characters"; exit 1 ;;
esac
cmd+="--gpu-manufacturer '$(params.gpu-manufacturer)' "
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tkn/infra-azure-rhel-ai.yaml` around lines 238 - 243, The command-building
logic in the Tekton task is vulnerable because params like gpus and
gpu-manufacturer are interpolated into cmd and later executed with eval. Harden
the command assembly by validating or sanitizing these params before appending
them, or avoid eval entirely by passing arguments safely; update the logic in
the script section where cmd is built so single quotes or shell metacharacters
cannot break out of the intended argument.

Source: Path instructions

@are-ces are-ces force-pushed the rhelai-tool-calling branch from 2d7165f to 872698e Compare June 27, 2026 21:41
@adrianriobo

Copy link
Copy Markdown
Collaborator

Hey @are-ces I created that follow up, issue for auto-start, #847 I added one specific bullet for extended args, which basically you are trying here, but for other content...

I was wondering if it make sense (as I had in mind) just allow a flag at mapt level named (i.e.) --vllm-extra-args and then basically you can pass them all together...

mapt aws rhel-ai \
     --auto-start \
     ....
     --vllm-extra-args '--tool-call-parser llama3_json --chat-template tool_chat_template_llama3.1_json.jinja  --max-model-len 16384'

The mapt flag may end up a bit long but that will allow us to move basically any vllm setup, as then we can use it as an openai compatible solution...

@are-ces

are-ces commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Hey @are-ces I created that follow up, issue for auto-start, #847 I added one specific bullet for extended args, which basically you are trying here, but for other content...

I was wondering if it make sense (as I had in mind) just allow a flag at mapt level named (i.e.) --vllm-extra-args and then basically you can pass them all together...

mapt aws rhel-ai \
     --auto-start \
     ....
     --vllm-extra-args '--tool-call-parser llama3_json --chat-template tool_chat_template_llama3.1_json.jinja  --max-model-len 16384'

The mapt flag may end up a bit long but that will allow us to move basically any vllm setup, as then we can use it as an openai compatible solution...

Yes I was thinking if this should be the right approach... let me update the PR

Enable vLLM tool calling by adding --tool-call-parser and --chat-template
flags. When --tool-call-parser is set, --enable-auto-tool-choice is
automatically added. Chat template filename is resolved to /opt/app-root/template/.
Updated Tekton task template with new params.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@are-ces are-ces force-pushed the rhelai-tool-calling branch from 872698e to 5cf9ea8 Compare June 29, 2026 07:52
@are-ces are-ces changed the title feat(aws): add --tool-call-parser and --chat-template flags for RHEL AI feat(aws): add --vllm-extra-args flag and auto GPU detection for RHEL AI Jun 29, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/provider/aws/action/rhel-ai/rhelai.go`:
- Around line 378-380: The GPU probe in the script built by rhelai.go is
currently chained with && so a zero-GPU result aborts the rest of the setup and
prevents later install.conf changes and systemctl start rhaiis from running.
Update the logic around the GPU_COUNT check in the rhelai setup flow so that a
missing or zero GPU count is treated as non-fatal, only applying the
tensor-parallel-size sed replacement when GPUs are present and otherwise
continuing the script normally.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 8abc72b4-c336-4e6f-875f-e274eeb9d0e8

📥 Commits

Reviewing files that changed from the base of the PR and between 872698e and 5cf9ea8.

📒 Files selected for processing (7)
  • cmd/mapt/cmd/aws/hosts/rhelai.go
  • cmd/mapt/cmd/params/params.go
  • pkg/provider/aws/action/rhel-ai/rhelai.go
  • pkg/target/host/rhelai/api.go
  • tkn/infra-aws-rhel-ai.yaml
  • tkn/infra-azure-rhel-ai.yaml
  • tkn/template/infra-aws-rhel-ai.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • tkn/infra-azure-rhel-ai.yaml

Comment on lines +378 to +380
script += fmt.Sprintf(
` && GPU_COUNT=$(nvidia-smi -L 2>/dev/null | wc -l) && [ "$GPU_COUNT" -gt 0 ] && sudo sed -i "s|--tensor-parallel-size 1|--tensor-parallel-size $GPU_COUNT|" %s/install.conf`,
confDir)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Do not let a zero-GPU probe abort the whole setup.

&& [ "$GPU_COUNT" -gt 0 ] && ... turns “no GPUs detected yet” into a hard failure, so the later install.conf updates and systemctl start rhaiis are skipped. Make this branch non-fatal.

Suggested fix
-	script += fmt.Sprintf(
-		` && GPU_COUNT=$(nvidia-smi -L 2>/dev/null | wc -l) && [ "$GPU_COUNT" -gt 0 ] && sudo sed -i "s|--tensor-parallel-size 1|--tensor-parallel-size $GPU_COUNT|" %s/install.conf`,
-		confDir)
+	script += fmt.Sprintf(
+		` && GPU_COUNT=$(nvidia-smi -L 2>/dev/null | wc -l) && if [ "$GPU_COUNT" -gt 0 ]; then sudo sed -i "s|--tensor-parallel-size 1|--tensor-parallel-size $GPU_COUNT|" %s/install.conf; fi`,
+		confDir)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
script += fmt.Sprintf(
` && GPU_COUNT=$(nvidia-smi -L 2>/dev/null | wc -l) && [ "$GPU_COUNT" -gt 0 ] && sudo sed -i "s|--tensor-parallel-size 1|--tensor-parallel-size $GPU_COUNT|" %s/install.conf`,
confDir)
script += fmt.Sprintf(
` && GPU_COUNT=$(nvidia-smi -L 2>/dev/null | wc -l) && if [ "$GPU_COUNT" -gt 0 ]; then sudo sed -i "s|--tensor-parallel-size 1|--tensor-parallel-size $GPU_COUNT|" %s/install.conf; fi`,
confDir)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/aws/action/rhel-ai/rhelai.go` around lines 378 - 380, The GPU
probe in the script built by rhelai.go is currently chained with && so a
zero-GPU result aborts the rest of the setup and prevents later install.conf
changes and systemctl start rhaiis from running. Update the logic around the
GPU_COUNT check in the rhelai setup flow so that a missing or zero GPU count is
treated as non-fatal, only applying the tensor-parallel-size sed replacement
when GPUs are present and otherwise continuing the script normally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants