[BETA] Create Deployment

POST /ai/deployment

Deploy a model on an inference server

application/json

Body Required

gpu-count integer(int64) Required

Number of GPUs (1-8)

Minimum value is 1.
inference-engine-version string

Inference engine version

Values are 0.12.0, 0.15.1, 0.16.0, or 0.17.0. Default value is 0.17.0.
name string Required

Deployment name

Minimum length is 1.
gpu-type string Required

GPU type family (e.g., gpua5000, gpu3080ti)
replicas integer(int64) Required

Number of replicas (>=1)

Minimum value is 1.
inference-engine-parameters array[string]

Optional extra inference engine server CLI args
model object Required
Hide model attributes Show model attributes object
- name string
  
  Associated model name
  
  Minimum length is 1.
- id string(uuid)
  
  Associated model ID

Responses

412 application/json

412
Hide response attributes Show response attributes object
- type string(uri-reference) Required
- title string Required
- status integer Required
  
  Minimum value is 100, maximum value is 599.
- detail string Required
- instance string(uri-reference)
- errors array[object]
  
  Hide errors attributes Show errors attributes object
  
  path string
  
  detail string
  
  pointer string
  
  location string
200 application/json

200
Hide response attributes Show response attributes object
- id string(uuid)
  
  Operation ID
- reason string
  
  Operation failure reason
  
  Values are incorrect, unknown, unavailable, forbidden, busy, fault, partial, not-found, interrupted, unsupported, or conflict.
- reference object
  
  Related resource reference
  
  Hide reference attributes Show reference attributes object
  
  id string(uuid)
  
  Reference ID
  
  link string
  
  Link to the referenced resource
  
  command string
  
  Command name
- message string
  
  Operation message
- state string
  
  Operation status
  
  Values are failure, pending, success, or timeout.
400 application/json

400
Hide response attributes Show response attributes object
- type string(uri-reference) Required
- title string Required
- status integer Required
  
  Minimum value is 100, maximum value is 599.
- detail string Required
- instance string(uri-reference)
- errors array[object]
  
  Hide errors attributes Show errors attributes object
  
  path string
  
  detail string
  
  pointer string
  
  location string

POST /ai/deployment

curl \
 --request POST 'https://api-ch-gva-2.exoscale.com/v2/ai/deployment' \
 --header "Content-Type: application/json" \
 --data '{"gpu-count":42,"inference-engine-version":"0.17.0","name":"string","gpu-type":"string","replicas":42,"inference-engine-parameters":["string"],"model":{"name":"string","id":"string"}}'

Request examples

{
  "gpu-count": 42,
  "inference-engine-version": "0.17.0",
  "name": "string",
  "gpu-type": "string",
  "replicas": 42,
  "inference-engine-parameters": [
    "string"
  ],
  "model": {
    "name": "string",
    "id": "string"
  }
}

Response examples (412)

{
  "type": "string",
  "title": "string",
  "status": 42,
  "detail": "string",
  "instance": "string",
  "errors": [
    {
      "path": "string",
      "detail": "string",
      "pointer": "string",
      "location": "string"
    }
  ]
}

Response examples (200)

{
  "id": "string",
  "reason": "incorrect",
  "reference": {
    "id": "string",
    "link": "string",
    "command": "string"
  },
  "message": "string",
  "state": "failure"
}

Response examples (400)

{
  "type": "string",
  "title": "string",
  "status": 42,
  "detail": "string",
  "instance": "string",
  "errors": [
    {
      "path": "string",
      "detail": "string",
      "pointer": "string",
      "location": "string"
    }
  ]
}