Skip to content

Deployments API

This reference documents the Deployments endpoints in the PaiTIENT Secure Model Service REST API, allowing you to manage model deployments programmatically.

Create Deployment

Creates a new model deployment.

Request

http
POST /v1/deployments
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID
json
{
  "model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
  "deployment_name": "clinical-assistant",
  "compute_type": "gpu",
  "instance_type": "g4dn.xlarge",
  "min_replicas": 1,
  "max_replicas": 3,
  "auto_scaling": true,
  "region": "us-east-1",
  "vpc_config": {
    "subnet_ids": ["subnet-abc123", "subnet-def456"],
    "security_group_ids": ["sg-123456"]
  },
  "tags": {
    "department": "clinical-research",
    "project": "diabetes-assistant",
    "environment": "production"
  },
  "model_config": {
    "context_length": 4096,
    "max_output_tokens": 1024,
    "default_temperature": 0.7,
    "default_top_p": 0.95
  },
  "security_settings": {
    "network_isolation": true,
    "private_endpoints": true,
    "encryption_level": "maximum",
    "audit_logging": true,
    "compliance_mode": "hipaa"
  }
}

Response

json
{
  "id": "dep_12345abcde",
  "model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
  "deployment_name": "clinical-assistant",
  "status": "creating",
  "endpoint": "https://api.paitient.com/v1/deployments/dep_12345abcde/generate",
  "created_at": "2023-11-01T12:34:56Z",
  "updated_at": "2023-11-01T12:34:56Z",
  "compute_type": "gpu",
  "instance_type": "g4dn.xlarge",
  "min_replicas": 1,
  "max_replicas": 3,
  "auto_scaling": true,
  "region": "us-east-1",
  "tags": {
    "department": "clinical-research",
    "project": "diabetes-assistant",
    "environment": "production"
  }
}

Get Deployment

Retrieves details about a specific deployment.

Request

http
GET /v1/deployments/{deployment_id}
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

Response

json
{
  "id": "dep_12345abcde",
  "model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
  "deployment_name": "clinical-assistant",
  "status": "running",
  "endpoint": "https://api.paitient.com/v1/deployments/dep_12345abcde/generate",
  "created_at": "2023-11-01T12:34:56Z",
  "updated_at": "2023-11-01T12:35:30Z",
  "compute_type": "gpu",
  "instance_type": "g4dn.xlarge",
  "min_replicas": 1,
  "max_replicas": 3,
  "current_replicas": 1,
  "auto_scaling": true,
  "region": "us-east-1",
  "tags": {
    "department": "clinical-research",
    "project": "diabetes-assistant",
    "environment": "production"
  },
  "model_config": {
    "context_length": 4096,
    "max_output_tokens": 1024,
    "default_temperature": 0.7,
    "default_top_p": 0.95
  },
  "security_settings": {
    "network_isolation": true,
    "private_endpoints": true,
    "encryption_level": "maximum",
    "audit_logging": true,
    "compliance_mode": "hipaa"
  }
}

List Deployments

Retrieves a list of all deployments.

Request

http
GET /v1/deployments
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

Query Parameters

ParameterTypeDescription
limitintegerMaximum number of deployments to return (default: 20, max: 100)
offsetintegerOffset for pagination (default: 0)
statusstringFilter by status (e.g., "creating", "running", "failed", "deleted")
model_namestringFilter by model name
regionstringFilter by region
tagsobjectFilter by tags (e.g., tags[environment]=production)

Response

json
{
  "deployments": [
    {
      "id": "dep_12345abcde",
      "model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
      "deployment_name": "clinical-assistant",
      "status": "running",
      "endpoint": "https://api.paitient.com/v1/deployments/dep_12345abcde/generate",
      "created_at": "2023-11-01T12:34:56Z",
      "updated_at": "2023-11-01T12:35:30Z",
      "compute_type": "gpu",
      "instance_type": "g4dn.xlarge",
      "tags": {
        "environment": "production"
      }
    },
    {
      "id": "dep_67890fghij",
      "model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
      "deployment_name": "research-assistant",
      "status": "running",
      "endpoint": "https://api.paitient.com/v1/deployments/dep_67890fghij/generate",
      "created_at": "2023-10-15T09:12:34Z",
      "updated_at": "2023-10-15T09:15:21Z",
      "compute_type": "gpu",
      "instance_type": "g4dn.xlarge",
      "tags": {
        "environment": "development"
      }
    }
  ],
  "pagination": {
    "total": 7,
    "limit": 2,
    "offset": 0,
    "next_offset": 2
  }
}

Update Deployment

Updates an existing deployment.

Request

http
PATCH /v1/deployments/{deployment_id}
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID
json
{
  "min_replicas": 2,
  "max_replicas": 5,
  "tags": {
    "environment": "production",
    "version": "2.0"
  },
  "model_config": {
    "default_temperature": 0.8
  }
}

Response

json
{
  "id": "dep_12345abcde",
  "model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
  "deployment_name": "clinical-assistant",
  "status": "updating",
  "endpoint": "https://api.paitient.com/v1/deployments/dep_12345abcde/generate",
  "created_at": "2023-11-01T12:34:56Z",
  "updated_at": "2023-11-01T14:23:45Z",
  "compute_type": "gpu",
  "instance_type": "g4dn.xlarge",
  "min_replicas": 2,
  "max_replicas": 5,
  "auto_scaling": true,
  "tags": {
    "department": "clinical-research",
    "project": "diabetes-assistant",
    "environment": "production",
    "version": "2.0"
  },
  "model_config": {
    "context_length": 4096,
    "max_output_tokens": 1024,
    "default_temperature": 0.8,
    "default_top_p": 0.95
  }
}

Delete Deployment

Deletes a deployment.

Request

http
DELETE /v1/deployments/{deployment_id}
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

Query Parameters

ParameterTypeDescription
forcebooleanForce deletion without confirmation (default: false)

Response

json
{
  "id": "dep_12345abcde",
  "status": "deleting"
}

Get Deployment Metrics

Retrieves performance metrics for a deployment.

Request

http
GET /v1/deployments/{deployment_id}/metrics
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

Query Parameters

ParameterTypeDescription
start_timestringStart time in ISO 8601 format
end_timestringEnd time in ISO 8601 format
metricsarrayList of metrics to retrieve (e.g., "latency", "throughput", "error_rate", "token_usage")
intervalstringTime interval for data points (e.g., "1m", "5m", "1h", "1d")

Response

json
{
  "deployment_id": "dep_12345abcde",
  "start_time": "2023-11-01T00:00:00Z",
  "end_time": "2023-11-30T23:59:59Z",
  "interval": "1d",
  "metrics": {
    "latency": {
      "average": [120, 115, 118, 125, 122],
      "p50": [110, 105, 108, 112, 109],
      "p95": [180, 175, 178, 185, 182],
      "p99": [220, 215, 218, 225, 222],
      "timestamps": [
        "2023-11-01T00:00:00Z",
        "2023-11-02T00:00:00Z",
        "2023-11-03T00:00:00Z",
        "2023-11-04T00:00:00Z",
        "2023-11-05T00:00:00Z"
      ]
    },
    "throughput": {
      "requests_per_second": [12.5, 13.2, 12.8, 14.1, 13.5],
      "timestamps": [
        "2023-11-01T00:00:00Z",
        "2023-11-02T00:00:00Z",
        "2023-11-03T00:00:00Z",
        "2023-11-04T00:00:00Z",
        "2023-11-05T00:00:00Z"
      ]
    },
    "error_rate": {
      "percentage": [0.2, 0.1, 0.15, 0.25, 0.18],
      "timestamps": [
        "2023-11-01T00:00:00Z",
        "2023-11-02T00:00:00Z",
        "2023-11-03T00:00:00Z",
        "2023-11-04T00:00:00Z",
        "2023-11-05T00:00:00Z"
      ]
    },
    "token_usage": {
      "total": [250000, 263000, 258000, 271000, 265000],
      "input": [120000, 125000, 122000, 128000, 124000],
      "output": [130000, 138000, 136000, 143000, 141000],
      "timestamps": [
        "2023-11-01T00:00:00Z",
        "2023-11-02T00:00:00Z",
        "2023-11-03T00:00:00Z",
        "2023-11-04T00:00:00Z",
        "2023-11-05T00:00:00Z"
      ]
    }
  },
  "summary": {
    "latency": {
      "average": 120,
      "p50": 109,
      "p95": 180,
      "p99": 222
    },
    "throughput": {
      "average_requests_per_second": 13.2
    },
    "error_rate": {
      "average_percentage": 0.18
    },
    "token_usage": {
      "total": 1307000,
      "input": 619000,
      "output": 688000
    }
  }
}

Get Deployment Logs

Retrieves logs for a deployment.

Request

http
GET /v1/deployments/{deployment_id}/logs
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

Query Parameters

ParameterTypeDescription
start_timestringStart time in ISO 8601 format
end_timestringEnd time in ISO 8601 format
limitintegerMaximum number of logs to return (default: 100, max: 1000)
filterstringFilter expression (e.g., "level=error")
orderstringOrder of logs ("asc" or "desc" by timestamp, default: "desc")

Response

json
{
  "logs": [
    {
      "timestamp": "2023-11-01T12:34:56Z",
      "level": "info",
      "message": "Deployment dep_12345abcde is now running",
      "component": "deployment-service",
      "request_id": "req_abcdefg12345"
    },
    {
      "timestamp": "2023-11-01T12:34:55Z",
      "level": "info",
      "message": "Replica 1/1 is now ready",
      "component": "deployment-service",
      "request_id": "req_abcdefg12345"
    },
    {
      "timestamp": "2023-11-01T12:34:50Z",
      "level": "info",
      "message": "Loading model ZimaBlueAI/HuatuoGPT-o1-8B",
      "component": "model-service",
      "request_id": "req_abcdefg12345"
    }
  ],
  "pagination": {
    "next_token": "eyJsYXN0X2V2YWx1YXRlZF9rZXkiOnsidGltZXN0YW1wIjoiMjAyMy0xMS0wMVQxMjozNDo1MFoifX0="
  }
}

Create Endpoint

Creates a new endpoint that routes to a specific deployment.

Request

http
POST /v1/endpoints
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID
json
{
  "name": "clinical-assistant-production",
  "deployment_id": "dep_12345abcde"
}

Response

json
{
  "id": "ep_12345abcde",
  "name": "clinical-assistant-production",
  "deployment_id": "dep_12345abcde",
  "url": "https://api.paitient.com/v1/endpoints/ep_12345abcde/generate",
  "created_at": "2023-11-01T12:40:00Z",
  "updated_at": "2023-11-01T12:40:00Z",
  "status": "active"
}

Create Canary Endpoint

Creates a new canary endpoint that splits traffic between multiple deployments.

Request

http
POST /v1/canary-endpoints
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID
json
{
  "name": "clinical-assistant-canary",
  "deployments": [
    {
      "id": "dep_12345abcde",
      "traffic_percentage": 90
    },
    {
      "id": "dep_67890fghij",
      "traffic_percentage": 10
    }
  ]
}

Response

json
{
  "id": "cep_12345abcde",
  "name": "clinical-assistant-canary",
  "deployments": [
    {
      "id": "dep_12345abcde",
      "traffic_percentage": 90
    },
    {
      "id": "dep_67890fghij",
      "traffic_percentage": 10
    }
  ],
  "url": "https://api.paitient.com/v1/endpoints/cep_12345abcde/generate",
  "created_at": "2023-11-01T13:00:00Z",
  "updated_at": "2023-11-01T13:00:00Z",
  "status": "active"
}

Update Canary Endpoint

Updates traffic distribution for a canary endpoint.

Request

http
PATCH /v1/canary-endpoints/{endpoint_id}
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID
json
{
  "deployments": [
    {
      "id": "dep_12345abcde",
      "traffic_percentage": 50
    },
    {
      "id": "dep_67890fghij",
      "traffic_percentage": 50
    }
  ]
}

Response

json
{
  "id": "cep_12345abcde",
  "name": "clinical-assistant-canary",
  "deployments": [
    {
      "id": "dep_12345abcde",
      "traffic_percentage": 50
    },
    {
      "id": "dep_67890fghij",
      "traffic_percentage": 50
    }
  ],
  "url": "https://api.paitient.com/v1/endpoints/cep_12345abcde/generate",
  "created_at": "2023-11-01T13:00:00Z",
  "updated_at": "2023-11-01T14:30:00Z",
  "status": "active"
}

Error Codes

CodeDescription
invalid_request_errorThe request was malformed or missing required parameters
authentication_errorAPI key or client ID is invalid or missing
permission_deniedThe API key doesn't have permission to perform the operation
resource_not_foundThe requested resource (deployment, endpoint) doesn't exist
quota_exceededAccount quota for deployments has been exceeded
rate_limit_exceededToo many requests in a given time period
deployment_errorError occurred during deployment creation or update
instance_type_unavailableThe requested instance type is not available in the selected region
model_not_foundThe specified model doesn't exist
validation_errorOne or more parameters failed validation

Webhook Notifications

Subscribe to webhook notifications to receive real-time updates about your deployments.

Deployment Events

EventDescription
deployment.createdA new deployment has been created
deployment.updatedA deployment has been updated
deployment.deletedA deployment has been deleted
deployment.status_changedA deployment's status has changed
deployment.health_changedA deployment's health status has changed
deployment.scaling_changedA deployment's scaling configuration has changed
deployment.errorAn error occurred with a deployment

Webhook Configuration

Configure webhooks through the PaiTIENT dashboard or API.

Released under the MIT License.