[FEAT] Models is always loaded in vram

### Is this a new feature request?

- [x] I have searched the existing issues

### Wanted change

Save GPU VRAM when not in use. VRAM is quite valuable resource and should be possible to configure a `keep_alive` value. For example with Ollama it is configured like this:

-  `keep_alive=-1` keeps model in memory indefinitely
-  `keep_alive=0` unloads model after each use
-  `keep_alive=60` keeps the model in memory for 1 minute after use

This can be a environment variable, default to `-1` to not be a breaking change for anyone.

### Reason for change

Right now the model is loaded into memory as soon as the container starts, and stays there even when not in use. 

### Proposed code change

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEAT] Models is always loaded in vram #32

Is this a new feature request?

Wanted change

Reason for change

Proposed code change

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[FEAT] Models is always loaded in vram #32

Description

Is this a new feature request?

Wanted change

Reason for change

Proposed code change

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions