Skip to content

Python SDK: Text Generation

This guide covers the text generation capabilities of the PaiTIENT Secure Model Service Python SDK, allowing you to securely generate text from deployed models in a HIPAA/SOC2 compliant environment.

Basic Text Generation

Generate text from a deployed model:

python
from paitient_secure_model import Client

# Initialize client
client = Client()

# Generate text
response = client.generate_text(
    deployment_id="dep_12345abcde",
    prompt="What are the potential side effects of metformin?"
)

print(response.text)

Generation Options

Text Parameters

Customize text generation with various parameters:

python
# Generation with parameters
response = client.generate_text(
    deployment_id="dep_12345abcde",
    prompt="What are the potential side effects of metformin?",
    max_tokens=500,              # Maximum length of the generated text
    temperature=0.7,             # Controls randomness (0.0-1.0)
    top_p=0.95,                  # Nucleus sampling parameter
    top_k=50,                    # Top-k sampling parameter
    stop=["\n\n", "END"],        # Stop sequences
    repetition_penalty=1.1,      # Penalize repeated tokens
    presence_penalty=0.0,        # Penalize tokens based on presence
    frequency_penalty=0.0        # Penalize tokens based on frequency
)

System Messages

Use system messages to control the model's behavior:

python
# Generation with system message
response = client.generate_text(
    deployment_id="dep_12345abcde",
    prompt="What are the potential side effects of metformin?",
    system_message="You are a helpful medical assistant providing accurate information to healthcare professionals. Always include references to clinical guidelines and mention important warnings."
)

Context Management

Maintain conversation context:

python
# Conversation with context
conversation = client.create_conversation(deployment_id="dep_12345abcde")

# First message
response1 = conversation.generate_text(
    prompt="What are the potential side effects of metformin?"
)
print("Response 1:", response1.text)

# Follow-up question (context is automatically maintained)
response2 = conversation.generate_text(
    prompt="What about patients with kidney disease?"
)
print("Response 2:", response2.text)

# Another follow-up
response3 = conversation.generate_text(
    prompt="Are there any alternatives for these patients?"
)
print("Response 3:", response3.text)

Advanced Generation Features

Streaming Responses

Stream the response as it's generated:

python
# Stream the response
for chunk in client.generate_text_stream(
    deployment_id="dep_12345abcde",
    prompt="Write a detailed summary of diabetes management techniques.",
    max_tokens=1000
):
    print(chunk.text, end="", flush=True)

Batch Processing

Process multiple prompts efficiently:

python
# Define a list of prompts
prompts = [
    "What are the symptoms of hypertension?",
    "What are common treatments for type 2 diabetes?",
    "Explain the mechanism of action for statins."
]

# Process in batch
results = client.generate_text_batch(
    deployment_id="dep_12345abcde",
    prompts=prompts,
    max_tokens=300
)

for i, result in enumerate(results):
    print(f"Prompt {i+1}: {prompts[i]}")
    print(f"Response: {result.text}")
    print()

Function Calling

Define functions that the model can call:

python
from paitient_secure_model import Client
from paitient_secure_model.functions import FunctionDefinition

# Initialize client
client = Client()

# Define functions
functions = [
    FunctionDefinition(
        name="search_medication_interactions",
        description="Search for potential interactions between medications",
        parameters={
            "type": "object",
            "properties": {
                "medications": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "List of medications to check for interactions"
                }
            },
            "required": ["medications"]
        }
    ),
    FunctionDefinition(
        name="calculate_dosage",
        description="Calculate medication dosage based on patient parameters",
        parameters={
            "type": "object",
            "properties": {
                "medication": {"type": "string", "description": "Medication name"},
                "weight_kg": {"type": "number", "description": "Patient weight in kg"},
                "age_years": {"type": "number", "description": "Patient age in years"},
                "kidney_function": {"type": "string", "description": "Kidney function (normal, impaired, severe)"}
            },
            "required": ["medication", "weight_kg", "age_years"]
        }
    )
]

# Generate text with function calling
response = client.generate_text(
    deployment_id="dep_12345abcde",
    prompt="Check for interactions between metformin, lisinopril, and simvastatin.",
    functions=functions,
    function_call="auto"  # Options: "auto", "none", or {"name": "specific_function"}
)

# Check if the model decided to call a function
if response.function_call:
    function_name = response.function_call.name
    function_args = response.function_call.arguments
    
    print(f"Model called function: {function_name}")
    print(f"Arguments: {function_args}")
    
    # Here you would actually execute the function with the provided arguments
    # and then potentially continue the conversation with the result
    if function_name == "search_medication_interactions":
        # Simulate function execution
        result = {"interactions": [
            {"medications": ["metformin", "lisinopril"], "severity": "low", "description": "..."},
            {"medications": ["metformin", "simvastatin"], "severity": "moderate", "description": "..."}
        ]}
        
        # Continue the conversation with the function result
        follow_up = client.generate_text(
            deployment_id="dep_12345abcde",
            prompt="What should I do about these interactions?",
            function_results=[{
                "name": function_name,
                "arguments": function_args,
                "result": result
            }]
        )
        
        print("Follow-up:", follow_up.text)
else:
    print("Model response:", response.text)

Security Controls

Apply security controls to text generation:

python
from paitient_secure_model import Client
from paitient_secure_model.security import SecuritySettings, DataFiltering

# Initialize client
client = Client()

# Generate text with security controls
response = client.generate_text(
    deployment_id="dep_12345abcde",
    prompt="Patient with diabetes and hypertension, currently taking metformin and lisinopril.",
    security_settings=SecuritySettings(
        data_filtering=DataFiltering(
            detect_pii=True,           # Detect personally identifiable information
            redact_phi=True,           # Redact protected health information
            content_filtering="strict", # Apply strict content filtering
            detect_toxic_content=True  # Detect potentially harmful content
        )
    )
)

Response Analysis

Response Metadata

Access additional information about the generation:

python
# Analyze response metadata
response = client.generate_text(
    deployment_id="dep_12345abcde",
    prompt="What are the potential side effects of metformin?",
    max_tokens=500,
    return_metadata=True
)

print("Response:", response.text)
print("Token count:", response.usage.total_tokens)
print("Prompt tokens:", response.usage.prompt_tokens)
print("Completion tokens:", response.usage.completion_tokens)
print("Finish reason:", response.finish_reason)
print("Model used:", response.model)
print("Created at:", response.created_at)

Content Analysis

Analyze the generated content:

python
# Analyze generated content
analysis = client.analyze_text(
    text=response.text,
    analyses=["toxicity", "factuality", "bias", "medical_accuracy"]
)

print("Toxicity score:", analysis.toxicity)
print("Factuality score:", analysis.factuality)
print("Bias score:", analysis.bias)
print("Medical accuracy score:", analysis.medical_accuracy)

Generation Control

Rate Limiting

Implement rate limiting for your application:

python
from paitient_secure_model import Client
from paitient_secure_model.rate_limit import RateLimiter

# Initialize client with rate limiter
client = Client()
limiter = RateLimiter(
    requests_per_minute=60,
    burst_size=10
)

# Generate text with rate limiting
try:
    with limiter:
        response = client.generate_text(
            deployment_id="dep_12345abcde",
            prompt="What are the potential side effects of metformin?"
        )
        print(response.text)
except Exception as e:
    print(f"Rate limit exceeded: {e}")

Timeout Control

Control timeouts for requests:

python
# Generate text with timeout control
try:
    response = client.generate_text(
        deployment_id="dep_12345abcde",
        prompt="Write a detailed analysis of current diabetes management guidelines.",
        max_tokens=2000,
        timeout=30.0  # Timeout in seconds
    )
    print(response.text)
except TimeoutError:
    print("Request timed out. Try again with fewer tokens or simpler prompt.")

Advanced Usage

Async Support

Use async versions of methods for better performance:

python
import asyncio
from paitient_secure_model import AsyncClient

async def generate_responses():
    # Initialize async client
    client = AsyncClient()
    
    # Generate multiple responses concurrently
    prompts = [
        "What are the symptoms of hypertension?",
        "What are common treatments for type 2 diabetes?",
        "Explain the mechanism of action for statins."
    ]
    
    tasks = [client.generate_text(
        deployment_id="dep_12345abcde",
        prompt=prompt,
        max_tokens=300
    ) for prompt in prompts]
    
    responses = await asyncio.gather(*tasks)
    
    for i, response in enumerate(responses):
        print(f"Prompt: {prompts[i]}")
        print(f"Response: {response.text}")
        print()

# Run the async function
asyncio.run(generate_responses())

Custom Models

Generate text from custom fine-tuned models:

python
# Get fine-tuned model ID
fine_tuning_job = client.get_fine_tuning_job("ft_12345abcde")
fine_tuned_model = fine_tuning_job.fine_tuned_model

# Deploy the fine-tuned model
deployment = client.create_deployment(
    model_name=fine_tuned_model,
    deployment_name="custom-medical-assistant"
)

# Wait for deployment to complete
deployment.wait_until_ready()

# Generate text from custom model
response = client.generate_text(
    deployment_id=deployment.id,
    prompt="What are the potential side effects of metformin?"
)

print(response.text)

Multi-tenant Usage

For applications serving multiple clients:

python
# Generate text in multi-tenant context
response = client.generate_text(
    deployment_id="dep_12345abcde",
    prompt="What are the potential side effects of metformin?",
    tenant_id="tenant_12345",  # Ensures strong isolation
    user_id="user_67890"       # For audit and attribution
)

Error Handling

Implement robust error handling:

python
from paitient_secure_model import Client
from paitient_secure_model.exceptions import (
    GenerationError,
    ResourceNotFoundError,
    RateLimitError,
    ContentFilterError,
    InvalidParameterError
)

client = Client()

try:
    response = client.generate_text(
        deployment_id="dep_12345abcde",
        prompt="What are the potential side effects of metformin?",
        max_tokens=500
    )
    print(response.text)
except ResourceNotFoundError:
    print("Deployment not found. Check your deployment ID.")
except RateLimitError as e:
    print(f"Rate limit exceeded. Retry after {e.retry_after} seconds.")
except ContentFilterError as e:
    print(f"Content filtered: {e.reason}")
except InvalidParameterError as e:
    print(f"Invalid parameter: {e}")
except GenerationError as e:
    print(f"Generation failed: {e}")
    print(f"Request ID for troubleshooting: {e.request_id}")

Best Practices

Prompt Engineering

Follow these best practices for effective prompts:

  1. Be Specific: Provide clear, detailed instructions
  2. Establish Context: Include relevant background information
  3. Structure Output: Specify desired format and structure
  4. Use Examples: Include examples for complex tasks
  5. Iterate: Refine prompts based on results

Example of a well-structured prompt:

python
prompt = """
You are a clinical assistant helping a healthcare provider. 
Provide information about metformin for diabetes management.

Please include:
1. Common side effects and their frequency
2. Contraindications
3. Recommended dosage adjustments for patients with renal impairment
4. Drug interactions to be aware of

Format the response with clear headings and bullet points.
Cite relevant clinical guidelines where appropriate.
"""

response = client.generate_text(
    deployment_id="dep_12345abcde",
    prompt=prompt,
    max_tokens=800
)

Performance Optimization

Optimize text generation performance:

  1. Batch Requests: Use batch API for multiple prompts
  2. Stream Long Responses: Use streaming for better UX
  3. Optimize Tokens: Keep prompts concise
  4. Cache Common Responses: Implement response caching
  5. Right-size Parameters: Adjust max_tokens to actual needs

Security

Ensure secure text generation:

  1. Sanitize Inputs: Validate and clean user inputs
  2. Enable Content Filtering: Prevent harmful outputs
  3. Use PII/PHI Detection: Protect sensitive information
  4. Audit Generations: Track and review outputs
  5. Implement Rate Limiting: Prevent abuse

Next Steps

Released under the MIT License.