Deployments API
This reference documents the Deployments endpoints in the PaiTIENT Secure Model Service REST API, allowing you to manage model deployments programmatically.
Create Deployment
Creates a new model deployment.
Request
http
POST /v1/deployments
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_IDjson
{
"model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
"deployment_name": "clinical-assistant",
"compute_type": "gpu",
"instance_type": "g4dn.xlarge",
"min_replicas": 1,
"max_replicas": 3,
"auto_scaling": true,
"region": "us-east-1",
"vpc_config": {
"subnet_ids": ["subnet-abc123", "subnet-def456"],
"security_group_ids": ["sg-123456"]
},
"tags": {
"department": "clinical-research",
"project": "diabetes-assistant",
"environment": "production"
},
"model_config": {
"context_length": 4096,
"max_output_tokens": 1024,
"default_temperature": 0.7,
"default_top_p": 0.95
},
"security_settings": {
"network_isolation": true,
"private_endpoints": true,
"encryption_level": "maximum",
"audit_logging": true,
"compliance_mode": "hipaa"
}
}Response
json
{
"id": "dep_12345abcde",
"model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
"deployment_name": "clinical-assistant",
"status": "creating",
"endpoint": "https://api.paitient.com/v1/deployments/dep_12345abcde/generate",
"created_at": "2023-11-01T12:34:56Z",
"updated_at": "2023-11-01T12:34:56Z",
"compute_type": "gpu",
"instance_type": "g4dn.xlarge",
"min_replicas": 1,
"max_replicas": 3,
"auto_scaling": true,
"region": "us-east-1",
"tags": {
"department": "clinical-research",
"project": "diabetes-assistant",
"environment": "production"
}
}Get Deployment
Retrieves details about a specific deployment.
Request
http
GET /v1/deployments/{deployment_id}
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_IDResponse
json
{
"id": "dep_12345abcde",
"model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
"deployment_name": "clinical-assistant",
"status": "running",
"endpoint": "https://api.paitient.com/v1/deployments/dep_12345abcde/generate",
"created_at": "2023-11-01T12:34:56Z",
"updated_at": "2023-11-01T12:35:30Z",
"compute_type": "gpu",
"instance_type": "g4dn.xlarge",
"min_replicas": 1,
"max_replicas": 3,
"current_replicas": 1,
"auto_scaling": true,
"region": "us-east-1",
"tags": {
"department": "clinical-research",
"project": "diabetes-assistant",
"environment": "production"
},
"model_config": {
"context_length": 4096,
"max_output_tokens": 1024,
"default_temperature": 0.7,
"default_top_p": 0.95
},
"security_settings": {
"network_isolation": true,
"private_endpoints": true,
"encryption_level": "maximum",
"audit_logging": true,
"compliance_mode": "hipaa"
}
}List Deployments
Retrieves a list of all deployments.
Request
http
GET /v1/deployments
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_IDQuery Parameters
| Parameter | Type | Description |
|---|---|---|
limit | integer | Maximum number of deployments to return (default: 20, max: 100) |
offset | integer | Offset for pagination (default: 0) |
status | string | Filter by status (e.g., "creating", "running", "failed", "deleted") |
model_name | string | Filter by model name |
region | string | Filter by region |
tags | object | Filter by tags (e.g., tags[environment]=production) |
Response
json
{
"deployments": [
{
"id": "dep_12345abcde",
"model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
"deployment_name": "clinical-assistant",
"status": "running",
"endpoint": "https://api.paitient.com/v1/deployments/dep_12345abcde/generate",
"created_at": "2023-11-01T12:34:56Z",
"updated_at": "2023-11-01T12:35:30Z",
"compute_type": "gpu",
"instance_type": "g4dn.xlarge",
"tags": {
"environment": "production"
}
},
{
"id": "dep_67890fghij",
"model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
"deployment_name": "research-assistant",
"status": "running",
"endpoint": "https://api.paitient.com/v1/deployments/dep_67890fghij/generate",
"created_at": "2023-10-15T09:12:34Z",
"updated_at": "2023-10-15T09:15:21Z",
"compute_type": "gpu",
"instance_type": "g4dn.xlarge",
"tags": {
"environment": "development"
}
}
],
"pagination": {
"total": 7,
"limit": 2,
"offset": 0,
"next_offset": 2
}
}Update Deployment
Updates an existing deployment.
Request
http
PATCH /v1/deployments/{deployment_id}
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_IDjson
{
"min_replicas": 2,
"max_replicas": 5,
"tags": {
"environment": "production",
"version": "2.0"
},
"model_config": {
"default_temperature": 0.8
}
}Response
json
{
"id": "dep_12345abcde",
"model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
"deployment_name": "clinical-assistant",
"status": "updating",
"endpoint": "https://api.paitient.com/v1/deployments/dep_12345abcde/generate",
"created_at": "2023-11-01T12:34:56Z",
"updated_at": "2023-11-01T14:23:45Z",
"compute_type": "gpu",
"instance_type": "g4dn.xlarge",
"min_replicas": 2,
"max_replicas": 5,
"auto_scaling": true,
"tags": {
"department": "clinical-research",
"project": "diabetes-assistant",
"environment": "production",
"version": "2.0"
},
"model_config": {
"context_length": 4096,
"max_output_tokens": 1024,
"default_temperature": 0.8,
"default_top_p": 0.95
}
}Delete Deployment
Deletes a deployment.
Request
http
DELETE /v1/deployments/{deployment_id}
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_IDQuery Parameters
| Parameter | Type | Description |
|---|---|---|
force | boolean | Force deletion without confirmation (default: false) |
Response
json
{
"id": "dep_12345abcde",
"status": "deleting"
}Get Deployment Metrics
Retrieves performance metrics for a deployment.
Request
http
GET /v1/deployments/{deployment_id}/metrics
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_IDQuery Parameters
| Parameter | Type | Description |
|---|---|---|
start_time | string | Start time in ISO 8601 format |
end_time | string | End time in ISO 8601 format |
metrics | array | List of metrics to retrieve (e.g., "latency", "throughput", "error_rate", "token_usage") |
interval | string | Time interval for data points (e.g., "1m", "5m", "1h", "1d") |
Response
json
{
"deployment_id": "dep_12345abcde",
"start_time": "2023-11-01T00:00:00Z",
"end_time": "2023-11-30T23:59:59Z",
"interval": "1d",
"metrics": {
"latency": {
"average": [120, 115, 118, 125, 122],
"p50": [110, 105, 108, 112, 109],
"p95": [180, 175, 178, 185, 182],
"p99": [220, 215, 218, 225, 222],
"timestamps": [
"2023-11-01T00:00:00Z",
"2023-11-02T00:00:00Z",
"2023-11-03T00:00:00Z",
"2023-11-04T00:00:00Z",
"2023-11-05T00:00:00Z"
]
},
"throughput": {
"requests_per_second": [12.5, 13.2, 12.8, 14.1, 13.5],
"timestamps": [
"2023-11-01T00:00:00Z",
"2023-11-02T00:00:00Z",
"2023-11-03T00:00:00Z",
"2023-11-04T00:00:00Z",
"2023-11-05T00:00:00Z"
]
},
"error_rate": {
"percentage": [0.2, 0.1, 0.15, 0.25, 0.18],
"timestamps": [
"2023-11-01T00:00:00Z",
"2023-11-02T00:00:00Z",
"2023-11-03T00:00:00Z",
"2023-11-04T00:00:00Z",
"2023-11-05T00:00:00Z"
]
},
"token_usage": {
"total": [250000, 263000, 258000, 271000, 265000],
"input": [120000, 125000, 122000, 128000, 124000],
"output": [130000, 138000, 136000, 143000, 141000],
"timestamps": [
"2023-11-01T00:00:00Z",
"2023-11-02T00:00:00Z",
"2023-11-03T00:00:00Z",
"2023-11-04T00:00:00Z",
"2023-11-05T00:00:00Z"
]
}
},
"summary": {
"latency": {
"average": 120,
"p50": 109,
"p95": 180,
"p99": 222
},
"throughput": {
"average_requests_per_second": 13.2
},
"error_rate": {
"average_percentage": 0.18
},
"token_usage": {
"total": 1307000,
"input": 619000,
"output": 688000
}
}
}Get Deployment Logs
Retrieves logs for a deployment.
Request
http
GET /v1/deployments/{deployment_id}/logs
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_IDQuery Parameters
| Parameter | Type | Description |
|---|---|---|
start_time | string | Start time in ISO 8601 format |
end_time | string | End time in ISO 8601 format |
limit | integer | Maximum number of logs to return (default: 100, max: 1000) |
filter | string | Filter expression (e.g., "level=error") |
order | string | Order of logs ("asc" or "desc" by timestamp, default: "desc") |
Response
json
{
"logs": [
{
"timestamp": "2023-11-01T12:34:56Z",
"level": "info",
"message": "Deployment dep_12345abcde is now running",
"component": "deployment-service",
"request_id": "req_abcdefg12345"
},
{
"timestamp": "2023-11-01T12:34:55Z",
"level": "info",
"message": "Replica 1/1 is now ready",
"component": "deployment-service",
"request_id": "req_abcdefg12345"
},
{
"timestamp": "2023-11-01T12:34:50Z",
"level": "info",
"message": "Loading model ZimaBlueAI/HuatuoGPT-o1-8B",
"component": "model-service",
"request_id": "req_abcdefg12345"
}
],
"pagination": {
"next_token": "eyJsYXN0X2V2YWx1YXRlZF9rZXkiOnsidGltZXN0YW1wIjoiMjAyMy0xMS0wMVQxMjozNDo1MFoifX0="
}
}Create Endpoint
Creates a new endpoint that routes to a specific deployment.
Request
http
POST /v1/endpoints
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_IDjson
{
"name": "clinical-assistant-production",
"deployment_id": "dep_12345abcde"
}Response
json
{
"id": "ep_12345abcde",
"name": "clinical-assistant-production",
"deployment_id": "dep_12345abcde",
"url": "https://api.paitient.com/v1/endpoints/ep_12345abcde/generate",
"created_at": "2023-11-01T12:40:00Z",
"updated_at": "2023-11-01T12:40:00Z",
"status": "active"
}Create Canary Endpoint
Creates a new canary endpoint that splits traffic between multiple deployments.
Request
http
POST /v1/canary-endpoints
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_IDjson
{
"name": "clinical-assistant-canary",
"deployments": [
{
"id": "dep_12345abcde",
"traffic_percentage": 90
},
{
"id": "dep_67890fghij",
"traffic_percentage": 10
}
]
}Response
json
{
"id": "cep_12345abcde",
"name": "clinical-assistant-canary",
"deployments": [
{
"id": "dep_12345abcde",
"traffic_percentage": 90
},
{
"id": "dep_67890fghij",
"traffic_percentage": 10
}
],
"url": "https://api.paitient.com/v1/endpoints/cep_12345abcde/generate",
"created_at": "2023-11-01T13:00:00Z",
"updated_at": "2023-11-01T13:00:00Z",
"status": "active"
}Update Canary Endpoint
Updates traffic distribution for a canary endpoint.
Request
http
PATCH /v1/canary-endpoints/{endpoint_id}
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_IDjson
{
"deployments": [
{
"id": "dep_12345abcde",
"traffic_percentage": 50
},
{
"id": "dep_67890fghij",
"traffic_percentage": 50
}
]
}Response
json
{
"id": "cep_12345abcde",
"name": "clinical-assistant-canary",
"deployments": [
{
"id": "dep_12345abcde",
"traffic_percentage": 50
},
{
"id": "dep_67890fghij",
"traffic_percentage": 50
}
],
"url": "https://api.paitient.com/v1/endpoints/cep_12345abcde/generate",
"created_at": "2023-11-01T13:00:00Z",
"updated_at": "2023-11-01T14:30:00Z",
"status": "active"
}Error Codes
| Code | Description |
|---|---|
invalid_request_error | The request was malformed or missing required parameters |
authentication_error | API key or client ID is invalid or missing |
permission_denied | The API key doesn't have permission to perform the operation |
resource_not_found | The requested resource (deployment, endpoint) doesn't exist |
quota_exceeded | Account quota for deployments has been exceeded |
rate_limit_exceeded | Too many requests in a given time period |
deployment_error | Error occurred during deployment creation or update |
instance_type_unavailable | The requested instance type is not available in the selected region |
model_not_found | The specified model doesn't exist |
validation_error | One or more parameters failed validation |
Webhook Notifications
Subscribe to webhook notifications to receive real-time updates about your deployments.
Deployment Events
| Event | Description |
|---|---|
deployment.created | A new deployment has been created |
deployment.updated | A deployment has been updated |
deployment.deleted | A deployment has been deleted |
deployment.status_changed | A deployment's status has changed |
deployment.health_changed | A deployment's health status has changed |
deployment.scaling_changed | A deployment's scaling configuration has changed |
deployment.error | An error occurred with a deployment |
Webhook Configuration
Configure webhooks through the PaiTIENT dashboard or API.