LLM Action Best Practices

When and how to use LLM actions effectively in Agent Studio workflows.
View as Markdown

Agent Studio workflows are fundamentally deterministic. HTTP actions call APIs, mappers transform data, DSL handles logic. The reasoning engine handles slot collection and inference. But at certain points in a workflow, you need language understanding that deterministic code can’t provide: classifying ambiguous input, summarizing a verbose API response, extracting structured data from free text, or making routing decisions from natural language. That’s where LLM actions come in.

This page covers when and how to use LLM actions effectively. For the API reference (parameters, output schema, model table), see LLM Actions.

When to Reach for an LLM Action

LLM actions fit a specific set of problems in Agent Studio workflows. Here are the real use cases where they shine:

  • Classification — routing tickets by category, determining intent from user input, tagging content. The input is unstructured text, and the output is one of a known set of labels.
  • Summarization — condensing verbose API responses into human-readable summaries. A 50-field ServiceNow incident record becomes a 3-sentence status update.
  • Extraction — pulling structured fields from unstructured text. Names, dates, dollar amounts, or reference numbers buried in a free-text description.
  • Transformation — reformatting data that doesn’t follow a predictable pattern. Parsing inconsistent date formats from different systems, normalizing address fields, or cleaning up user input.
  • Decision support — analyzing context to recommend an action or route in a workflow. Given a set of facts, determine the best next step.
Use CaseExampleAction Type
Classify a support ticketRoute to IT vs HR vs Facilitiesgenerate_structured_value_action
Summarize an incident50-field API response to 3-sentence summarygenerate_text_action
Extract entities from textPull names, dates, amounts from a descriptiongenerate_structured_value_action
Draft a responseGenerate a reply email from contextgenerate_text_action
Route a workflowDecide next step based on user inputgenerate_structured_value_action

generate_text_action vs generate_structured_value_action

Both actions call an LLM, but they serve different purposes. Picking the right one upfront saves you from wrestling with output parsing later.

Use generate_text_action when:

  • The output IS the deliverable (summaries, drafts, explanations)
  • You don’t need to parse the output programmatically
  • The downstream consumer is a human (via the reasoning engine)

Use generate_structured_value_action when:

  • You need to use the output in downstream logic (if/else, routing, API calls)
  • You need consistent, parseable fields (classification labels, extracted entities, boolean decisions)
  • Output reliability matters more than prose quality

Side-by-Side: Same Ticket, Different Actions

Consider a support ticket with this description:

Employee Sarah Chen (Building 3, Floor 2) reports that her Dell Latitude 5540 laptop screen has been intermittently flickering since the company-wide Windows 11 23H2 update pushed last Thursday. The flickering occurs every 10-15 minutes and lasts approximately 30 seconds each time. During these episodes she is unable to read text on screen and gets disconnected from Microsoft Teams video calls, which has caused her to miss portions of three client meetings this week. She has already tried restarting the laptop, updating the Intel Iris Xe display driver through Device Manager, and connecting an external monitor (which works fine, suggesting the issue is with the built-in display panel or its driver). She notes that two other colleagues on the same floor are experiencing similar symptoms after the same update. Her ticket was originally filed under “General IT Request” but she believes it should be escalated given the impact on client-facing work.

Text action for a human-readable summary:

generate_text_action
1- action:
2 action_name: mw.generate_text_action
3 input_args:
4 system_prompt: >
5 Summarize this support ticket in 2-3 sentences for a help desk agent.
6 Include the reported symptom, business impact, and what the user has
7 already tried. Be concise.
8 user_input: data.ticket_description
9 output_key: ticket_summary

Output: "Sarah Chen's laptop screen flickers every 10-15 minutes since the Windows 11 23H2 update, dropping her from Teams video calls and impacting client meetings. She's tried driver updates and an external monitor (which works), pointing to a display panel or driver issue. Two colleagues on the same floor have the same problem, suggesting a broader rollout issue."

Structured action for routing logic:

generate_structured_value_action
1- action:
2 action_name: mw.generate_structured_value_action
3 input_args:
4 system_prompt: >
5 You are a support ticket classifier for an enterprise IT help desk.
6 Given a ticket description, determine the category, priority, and
7 affected system.
8 user_input: data.ticket_description
9 output_schema: >-
10 {
11 "type": "object",
12 "properties": {
13 "category": {
14 "type": "string",
15 "enum": ["IT_Hardware", "IT_Software", "IT_Access", "HR", "Facilities", "Finance"]
16 },
17 "priority": {
18 "type": "string",
19 "enum": ["P1_Critical", "P2_High", "P3_Medium", "P4_Low"]
20 },
21 "affected_system": {
22 "type": "string"
23 }
24 },
25 "required": ["category", "priority", "affected_system"],
26 "additionalProperties": false
27 }
28 strict: true
29 output_key: ticket_classification

Output: { "category": "IT_Software", "priority": "P2_High", "affected_system": "Windows Display Driver" }

Notice additionalProperties: false and strict: true on the structured action. These are required for reliable output. Without them, the model may add extra fields or deviate from your schema. The output_schema must be written as a JSON string using >- block syntax, not as native YAML.

Writing Effective System Prompts

The system prompt is the most important input to an LLM action. A vague prompt produces vague results.

Be specific about the task

Tell the model exactly what you want, including the full set of valid outputs.

Weak:

1system_prompt: '''Analyze this ticket and tell me what category it belongs to.'''

Strong:

1system_prompt: >
2 You are a support ticket classifier. Given a ticket description,
3 classify it into exactly one category: IT, HR, Facilities, or Finance.
4 Respond with only the category name.

The weak prompt leaves the model guessing about what categories exist and how to format the response. The strong prompt eliminates ambiguity.

Constrain the output

For generate_text_action, tell the model what format you expect:

1system_prompt: >
2 Summarize this incident in exactly 3 bullet points. Each bullet should
3 be one sentence. Do not include a header or introduction.

For generate_structured_value_action, the output schema handles format constraints, but the system prompt should still describe what each field means and how to determine its value.

Include examples

Few-shot prompting works well for edge cases. Show the model 2-3 input/output pairs:

1system_prompt: |
2 You are a support ticket classifier. Given a ticket description,
3 classify it into exactly one category: IT, HR, Facilities, or Finance.
4 Respond with only the category name.
5
6 Examples:
7 - "My laptop won't connect to WiFi" -> IT
8 - "I need to update my direct deposit info" -> HR
9 - "The AC in building 3 is broken" -> Facilities
10 - "I need to submit a purchase order for new monitors" -> Finance
11 - "Can't access Salesforce after password reset" -> IT

Examples are especially useful when categories overlap. “Can’t access Salesforce” could be IT or Finance depending on context. The examples anchor the model’s interpretation.

Don’t contradict yourself

If you say “be concise” in one sentence and “explain your reasoning in detail” in the next, the model will pick one or produce something awkward. Review your prompt for conflicting instructions before deploying.

Model Selection

Pick the smallest model that handles your task well. You can always upgrade if the output quality isn’t sufficient.

ModelBest ForNotes
gpt-4o-miniClassification, extraction, simple summarizationFast, cheap, good enough for most tasks. Default choice.
gpt-4oNuanced summarization, complex extractionBetter quality when mini isn’t cutting it
gpt-5Multi-step reasoning, edge-case classificationUse when you need the model to think through ambiguity

Set the model in input_args:

1input_args:
2 model: '''gpt-4o-mini'''

If you don’t specify a model, it defaults to gpt-4o-mini. For most classification and extraction tasks, the default is fine. Start there and upgrade only if you see quality issues in testing.

Temperature

Temperature controls output randomness. Lower values produce more consistent, deterministic results. Higher values produce more varied, creative output.

  • Low (0-0.3): Classification, extraction, routing, anything where consistency matters.
  • Medium (0.5-0.7): Summarization, content generation, drafting responses.

The default works for most use cases. Override it when you have a specific need:

1input_args:
2 temperature: 0.2

Workflow Example: Classify and Route a Support Ticket

Here’s a complete compound action that shows an LLM action in context. The workflow fetches a ticket from ServiceNow, classifies it with a structured LLM action, then assigns it to the right team.

Compound Action: Classify and Route Ticket
1steps:
2 # Step 1: Fetch ticket details from ServiceNow
3 - action:
4 action_name: fetch_ticket_details
5 input_args:
6 ticket_id: data.ticket_id
7 output_key: ticket_data
8
9 # Step 2: Classify the ticket using an LLM
10 - action:
11 action_name: mw.generate_structured_value_action
12 input_args:
13 system_prompt: |
14 You are a support ticket classifier for an enterprise IT help desk.
15 Given a ticket's short description and full description, determine
16 the category, priority level, and the team that should handle it.
17
18 Categories: IT_Hardware, IT_Software, IT_Access, HR, Facilities, Finance
19 Priority: P1_Critical (system down, multiple users affected),
20 P2_High (single user blocked), P3_Medium (degraded but functional),
21 P4_Low (cosmetic or future request)
22 Teams: Desktop_Support, Network_Ops, Identity_Access, HR_Operations,
23 Facilities_Mgmt, Finance_Ops
24
25 Examples:
26 - "Laptop won't power on" -> IT_Hardware, P2_High, Desktop_Support
27 - "Need VPN access for new contractor" -> IT_Access, P3_Medium, Identity_Access
28 - "Office kitchen sink is leaking" -> Facilities, P3_Medium, Facilities_Mgmt
29 user_input:
30 short_description: data.ticket_data.short_description
31 description: data.ticket_data.description
32 reported_by: data.ticket_data.caller_id
33 created: data.ticket_data.sys_created_on
34 output_schema: >-
35 {
36 "type": "object",
37 "properties": {
38 "category": {
39 "type": "string",
40 "enum": ["IT_Hardware", "IT_Software", "IT_Access", "HR", "Facilities", "Finance"]
41 },
42 "priority": {
43 "type": "string",
44 "enum": ["P1_Critical", "P2_High", "P3_Medium", "P4_Low"]
45 },
46 "suggested_team": {
47 "type": "string",
48 "enum": ["Desktop_Support", "Network_Ops", "Identity_Access", "HR_Operations", "Facilities_Mgmt", "Finance_Ops"]
49 },
50 "reasoning": {
51 "type": "string"
52 }
53 },
54 "required": ["category", "priority", "suggested_team", "reasoning"],
55 "additionalProperties": false
56 }
57 strict: true
58 model: '''gpt-4o-mini'''
59 temperature: 0.1
60 output_key: classification
61
62 # Step 3: Assign the ticket to the suggested team
63 - action:
64 action_name: assign_ticket
65 input_args:
66 ticket_id: data.ticket_id
67 assignment_group: classification.suggested_team
68 priority: classification.priority
69 category: classification.category
70 work_notes: |
71 Auto-classified by Agent Studio.
72 Category: {{classification.category}}
73 Priority: {{classification.priority}}
74 Team: {{classification.suggested_team}}
75 Reasoning: {{classification.reasoning}}
76 output_key: assignment_result

A few things to note in this example:

  • Low temperature (0.1) because classification needs to be consistent. The same ticket should get the same classification every time.
  • strict: true and additionalProperties: false on the output schema guarantee the model returns exactly the fields you expect.
  • The output_schema uses >- block syntax to write JSON inline in YAML. This is required for generate_structured_value_action.
  • The reasoning field is included in the schema so the model explains its decision. This gets written to the ticket’s work notes for the human agent, but it doesn’t affect routing logic.
  • Step 3 only uses the structured fields (suggested_team, priority, category) from the classification. No free-text parsing needed.