Deployments API

This reference documents the Deployments endpoints in the PaiTIENT Secure Model Service REST API, allowing you to manage model deployments programmatically.

Create Deployment

Creates a new model deployment.

Request

http

POST /v1/deployments
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

json

{
  "model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
  "deployment_name": "clinical-assistant",
  "compute_type": "gpu",
  "instance_type": "g4dn.xlarge",
  "min_replicas": 1,
  "max_replicas": 3,
  "auto_scaling": true,
  "region": "us-east-1",
  "vpc_config": {
    "subnet_ids": ["subnet-abc123", "subnet-def456"],
    "security_group_ids": ["sg-123456"]
  },
  "tags": {
    "department": "clinical-research",
    "project": "diabetes-assistant",
    "environment": "production"
  },
  "model_config": {
    "context_length": 4096,
    "max_output_tokens": 1024,
    "default_temperature": 0.7,
    "default_top_p": 0.95
  },
  "security_settings": {
    "network_isolation": true,
    "private_endpoints": true,
    "encryption_level": "maximum",
    "audit_logging": true,
    "compliance_mode": "hipaa"
  }
}

Response

json

{
  "id": "dep_12345abcde",
  "model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
  "deployment_name": "clinical-assistant",
  "status": "creating",
  "endpoint": "https://api.paitient.com/v1/deployments/dep_12345abcde/generate",
  "created_at": "2023-11-01T12:34:56Z",
  "updated_at": "2023-11-01T12:34:56Z",
  "compute_type": "gpu",
  "instance_type": "g4dn.xlarge",
  "min_replicas": 1,
  "max_replicas": 3,
  "auto_scaling": true,
  "region": "us-east-1",
  "tags": {
    "department": "clinical-research",
    "project": "diabetes-assistant",
    "environment": "production"
  }
}

Get Deployment

Retrieves details about a specific deployment.

Request

http

GET /v1/deployments/{deployment_id}
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

Response

json

{
  "id": "dep_12345abcde",
  "model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
  "deployment_name": "clinical-assistant",
  "status": "running",
  "endpoint": "https://api.paitient.com/v1/deployments/dep_12345abcde/generate",
  "created_at": "2023-11-01T12:34:56Z",
  "updated_at": "2023-11-01T12:35:30Z",
  "compute_type": "gpu",
  "instance_type": "g4dn.xlarge",
  "min_replicas": 1,
  "max_replicas": 3,
  "current_replicas": 1,
  "auto_scaling": true,
  "region": "us-east-1",
  "tags": {
    "department": "clinical-research",
    "project": "diabetes-assistant",
    "environment": "production"
  },
  "model_config": {
    "context_length": 4096,
    "max_output_tokens": 1024,
    "default_temperature": 0.7,
    "default_top_p": 0.95
  },
  "security_settings": {
    "network_isolation": true,
    "private_endpoints": true,
    "encryption_level": "maximum",
    "audit_logging": true,
    "compliance_mode": "hipaa"
  }
}

List Deployments

Retrieves a list of all deployments.

Request

http

GET /v1/deployments
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

Query Parameters

Parameter	Type	Description
`limit`	integer	Maximum number of deployments to return (default: 20, max: 100)
`offset`	integer	Offset for pagination (default: 0)
`status`	string	Filter by status (e.g., "creating", "running", "failed", "deleted")
`model_name`	string	Filter by model name
`region`	string	Filter by region
`tags`	object	Filter by tags (e.g., `tags[environment]=production`)

Response

json

{
  "deployments": [
    {
      "id": "dep_12345abcde",
      "model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
      "deployment_name": "clinical-assistant",
      "status": "running",
      "endpoint": "https://api.paitient.com/v1/deployments/dep_12345abcde/generate",
      "created_at": "2023-11-01T12:34:56Z",
      "updated_at": "2023-11-01T12:35:30Z",
      "compute_type": "gpu",
      "instance_type": "g4dn.xlarge",
      "tags": {
        "environment": "production"
      }
    },
    {
      "id": "dep_67890fghij",
      "model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
      "deployment_name": "research-assistant",
      "status": "running",
      "endpoint": "https://api.paitient.com/v1/deployments/dep_67890fghij/generate",
      "created_at": "2023-10-15T09:12:34Z",
      "updated_at": "2023-10-15T09:15:21Z",
      "compute_type": "gpu",
      "instance_type": "g4dn.xlarge",
      "tags": {
        "environment": "development"
      }
    }
  ],
  "pagination": {
    "total": 7,
    "limit": 2,
    "offset": 0,
    "next_offset": 2
  }
}

Update Deployment

Updates an existing deployment.

Request

http

PATCH /v1/deployments/{deployment_id}
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

json

{
  "min_replicas": 2,
  "max_replicas": 5,
  "tags": {
    "environment": "production",
    "version": "2.0"
  },
  "model_config": {
    "default_temperature": 0.8
  }
}

Response

json

{
  "id": "dep_12345abcde",
  "model_name": "ZimaBlueAI/HuatuoGPT-o1-8B",
  "deployment_name": "clinical-assistant",
  "status": "updating",
  "endpoint": "https://api.paitient.com/v1/deployments/dep_12345abcde/generate",
  "created_at": "2023-11-01T12:34:56Z",
  "updated_at": "2023-11-01T14:23:45Z",
  "compute_type": "gpu",
  "instance_type": "g4dn.xlarge",
  "min_replicas": 2,
  "max_replicas": 5,
  "auto_scaling": true,
  "tags": {
    "department": "clinical-research",
    "project": "diabetes-assistant",
    "environment": "production",
    "version": "2.0"
  },
  "model_config": {
    "context_length": 4096,
    "max_output_tokens": 1024,
    "default_temperature": 0.8,
    "default_top_p": 0.95
  }
}

Delete Deployment

Deletes a deployment.

Request

http

DELETE /v1/deployments/{deployment_id}
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

Query Parameters

Parameter	Type	Description
`force`	boolean	Force deletion without confirmation (default: false)

Response

json

{
  "id": "dep_12345abcde",
  "status": "deleting"
}

Get Deployment Metrics

Retrieves performance metrics for a deployment.

Request

http

GET /v1/deployments/{deployment_id}/metrics
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

Query Parameters

Parameter	Type	Description
`start_time`	string	Start time in ISO 8601 format
`end_time`	string	End time in ISO 8601 format
`metrics`	array	List of metrics to retrieve (e.g., "latency", "throughput", "error_rate", "token_usage")
`interval`	string	Time interval for data points (e.g., "1m", "5m", "1h", "1d")

Response

json

{
  "deployment_id": "dep_12345abcde",
  "start_time": "2023-11-01T00:00:00Z",
  "end_time": "2023-11-30T23:59:59Z",
  "interval": "1d",
  "metrics": {
    "latency": {
      "average": [120, 115, 118, 125, 122],
      "p50": [110, 105, 108, 112, 109],
      "p95": [180, 175, 178, 185, 182],
      "p99": [220, 215, 218, 225, 222],
      "timestamps": [
        "2023-11-01T00:00:00Z",
        "2023-11-02T00:00:00Z",
        "2023-11-03T00:00:00Z",
        "2023-11-04T00:00:00Z",
        "2023-11-05T00:00:00Z"
      ]
    },
    "throughput": {
      "requests_per_second": [12.5, 13.2, 12.8, 14.1, 13.5],
      "timestamps": [
        "2023-11-01T00:00:00Z",
        "2023-11-02T00:00:00Z",
        "2023-11-03T00:00:00Z",
        "2023-11-04T00:00:00Z",
        "2023-11-05T00:00:00Z"
      ]
    },
    "error_rate": {
      "percentage": [0.2, 0.1, 0.15, 0.25, 0.18],
      "timestamps": [
        "2023-11-01T00:00:00Z",
        "2023-11-02T00:00:00Z",
        "2023-11-03T00:00:00Z",
        "2023-11-04T00:00:00Z",
        "2023-11-05T00:00:00Z"
      ]
    },
    "token_usage": {
      "total": [250000, 263000, 258000, 271000, 265000],
      "input": [120000, 125000, 122000, 128000, 124000],
      "output": [130000, 138000, 136000, 143000, 141000],
      "timestamps": [
        "2023-11-01T00:00:00Z",
        "2023-11-02T00:00:00Z",
        "2023-11-03T00:00:00Z",
        "2023-11-04T00:00:00Z",
        "2023-11-05T00:00:00Z"
      ]
    }
  },
  "summary": {
    "latency": {
      "average": 120,
      "p50": 109,
      "p95": 180,
      "p99": 222
    },
    "throughput": {
      "average_requests_per_second": 13.2
    },
    "error_rate": {
      "average_percentage": 0.18
    },
    "token_usage": {
      "total": 1307000,
      "input": 619000,
      "output": 688000
    }
  }
}

Get Deployment Logs

Retrieves logs for a deployment.

Request

http

GET /v1/deployments/{deployment_id}/logs
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

Query Parameters

Parameter	Type	Description
`start_time`	string	Start time in ISO 8601 format
`end_time`	string	End time in ISO 8601 format
`limit`	integer	Maximum number of logs to return (default: 100, max: 1000)
`filter`	string	Filter expression (e.g., "level=error")
`order`	string	Order of logs ("asc" or "desc" by timestamp, default: "desc")

Response

json

{
  "logs": [
    {
      "timestamp": "2023-11-01T12:34:56Z",
      "level": "info",
      "message": "Deployment dep_12345abcde is now running",
      "component": "deployment-service",
      "request_id": "req_abcdefg12345"
    },
    {
      "timestamp": "2023-11-01T12:34:55Z",
      "level": "info",
      "message": "Replica 1/1 is now ready",
      "component": "deployment-service",
      "request_id": "req_abcdefg12345"
    },
    {
      "timestamp": "2023-11-01T12:34:50Z",
      "level": "info",
      "message": "Loading model ZimaBlueAI/HuatuoGPT-o1-8B",
      "component": "model-service",
      "request_id": "req_abcdefg12345"
    }
  ],
  "pagination": {
    "next_token": "eyJsYXN0X2V2YWx1YXRlZF9rZXkiOnsidGltZXN0YW1wIjoiMjAyMy0xMS0wMVQxMjozNDo1MFoifX0="
  }
}

Create Endpoint

Creates a new endpoint that routes to a specific deployment.

Request

http

POST /v1/endpoints
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

json

{
  "name": "clinical-assistant-production",
  "deployment_id": "dep_12345abcde"
}

Response

json

{
  "id": "ep_12345abcde",
  "name": "clinical-assistant-production",
  "deployment_id": "dep_12345abcde",
  "url": "https://api.paitient.com/v1/endpoints/ep_12345abcde/generate",
  "created_at": "2023-11-01T12:40:00Z",
  "updated_at": "2023-11-01T12:40:00Z",
  "status": "active"
}

Create Canary Endpoint

Creates a new canary endpoint that splits traffic between multiple deployments.

Request

http

POST /v1/canary-endpoints
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

json

{
  "name": "clinical-assistant-canary",
  "deployments": [
    {
      "id": "dep_12345abcde",
      "traffic_percentage": 90
    },
    {
      "id": "dep_67890fghij",
      "traffic_percentage": 10
    }
  ]
}

Response

json

{
  "id": "cep_12345abcde",
  "name": "clinical-assistant-canary",
  "deployments": [
    {
      "id": "dep_12345abcde",
      "traffic_percentage": 90
    },
    {
      "id": "dep_67890fghij",
      "traffic_percentage": 10
    }
  ],
  "url": "https://api.paitient.com/v1/endpoints/cep_12345abcde/generate",
  "created_at": "2023-11-01T13:00:00Z",
  "updated_at": "2023-11-01T13:00:00Z",
  "status": "active"
}

Update Canary Endpoint

Updates traffic distribution for a canary endpoint.

Request

http

PATCH /v1/canary-endpoints/{endpoint_id}
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY
X-Client-ID: YOUR_CLIENT_ID

json

{
  "deployments": [
    {
      "id": "dep_12345abcde",
      "traffic_percentage": 50
    },
    {
      "id": "dep_67890fghij",
      "traffic_percentage": 50
    }
  ]
}

Response

json

{
  "id": "cep_12345abcde",
  "name": "clinical-assistant-canary",
  "deployments": [
    {
      "id": "dep_12345abcde",
      "traffic_percentage": 50
    },
    {
      "id": "dep_67890fghij",
      "traffic_percentage": 50
    }
  ],
  "url": "https://api.paitient.com/v1/endpoints/cep_12345abcde/generate",
  "created_at": "2023-11-01T13:00:00Z",
  "updated_at": "2023-11-01T14:30:00Z",
  "status": "active"
}

Error Codes

Code	Description
`invalid_request_error`	The request was malformed or missing required parameters
`authentication_error`	API key or client ID is invalid or missing
`permission_denied`	The API key doesn't have permission to perform the operation
`resource_not_found`	The requested resource (deployment, endpoint) doesn't exist
`quota_exceeded`	Account quota for deployments has been exceeded
`rate_limit_exceeded`	Too many requests in a given time period
`deployment_error`	Error occurred during deployment creation or update
`instance_type_unavailable`	The requested instance type is not available in the selected region
`model_not_found`	The specified model doesn't exist
`validation_error`	One or more parameters failed validation

Webhook Notifications

Subscribe to webhook notifications to receive real-time updates about your deployments.

Deployment Events

Event	Description
`deployment.created`	A new deployment has been created
`deployment.updated`	A deployment has been updated
`deployment.deleted`	A deployment has been deleted
`deployment.status_changed`	A deployment's status has changed
`deployment.health_changed`	A deployment's health status has changed
`deployment.scaling_changed`	A deployment's scaling configuration has changed
`deployment.error`	An error occurred with a deployment

Webhook Configuration

Configure webhooks through the PaiTIENT dashboard or API.

Deployments API ​

Create Deployment ​

Request ​

Response ​

Get Deployment ​

Request ​

Response ​

List Deployments ​

Request ​

Query Parameters ​

Response ​

Update Deployment ​

Request ​

Response ​

Delete Deployment ​

Request ​

Query Parameters ​

Response ​

Get Deployment Metrics ​

Request ​

Query Parameters ​

Response ​

Get Deployment Logs ​

Request ​

Query Parameters ​

Response ​

Create Endpoint ​

Request ​

Response ​

Create Canary Endpoint ​

Request ​

Response ​

Update Canary Endpoint ​

Request ​

Response ​

Error Codes ​

Webhook Notifications ​

Deployment Events ​

Webhook Configuration ​

Deployments API

Create Deployment

Request

Response

Get Deployment

Request

Response

List Deployments

Request

Query Parameters

Response

Update Deployment

Request

Response

Delete Deployment

Request

Query Parameters

Response

Get Deployment Metrics

Request

Query Parameters

Response

Get Deployment Logs

Request

Query Parameters

Response

Create Endpoint

Request

Response

Create Canary Endpoint

Request

Response

Update Canary Endpoint

Request

Response

Error Codes

Webhook Notifications

Deployment Events

Webhook Configuration