Simple Sampling
👨💼 Our users love the prompt functionality, but they asked why the LLM couldn't
just suggest tags when they create the post and we thought that's a great idea!
So now your goal is to make your server request a simple completion from the
language model whenever a new journal entry is created.
In this first step, we'll just get things wired up and then we'll work on our
prompt for the LLM in the next step.
Here's what you'll do:
- When the create_entry tool is called, call the
suggestTagsSampling
function with the agent and the createdEntry.id. - We don't want to wait for the sampling function to finish so instead of "await" use "void" which effectively ignores the promise.
- First check if the client supports sampling. If it does, proceed with the sampling request.
- Implement a function that sends a sampling request to the client using
agent.server.server.createMessage
(theserver.server
thing is funny, but our MCP server manages an internal server and that's what we're accessing). - Use a simple system prompt (e.g., "You are a helpful assistant.") and a user message that references the new journal entry's ID (we'll enhance this next).
- Set a reasonable
maxTokens
value for the response. - Parse the model's response using a provided Zod schema.
- Send a notification with the result so you can see the model's output.
And don't forget to call it when the user creates a new journal entry!
Here's an example of how to check if the client supports sampling:
// the server.server is how the MCP SDK exposes the underlying server
// instance for more advanced APIs like this one.
const capabilities = agent.server.server.getClientCapabilities()
if (!capabilities?.sampling) {
console.error('Client does not support sampling, skipping sampling request')
return
}
// Proceed with sampling request here
And here's an example of how to send a sampling request:
const result = await agent.server.server.createMessage({
// the system prompt is used to explain to the LLM it's purpose
systemPrompt: '...',
messages: [
{
role: 'user',
content: {
type: 'text',
// the user message is the input to the LLM
text: '...',
},
},
],
maxTokens: 100,
})
The
maxTokens
option references the max tokens the model should return.The
system prompt
+ messages
+ OUTPUT MESSAGE
can't exceed the context
window size of the model you're user is using.Here's an example of how to send a notification:
// void is to communicate we don't care about the return value. Fire and forget.
void agent.server.server.sendLoggingMessage({
level: 'info', // "error" | "debug" | "info" | "notice" | "warning" | "critical" | "alert" | "emergency"
// data can be any JSON-serializable value
data: {
a: 1,
otherStuff: '...',
},
})
Check the specification for more info:
https://modelcontextprotocol.io/specification/2025-06-18/server/utilities/logging#log-message-notifications
To be able to send a logging message, you need to first advertise that we support logging and add support for the
SetLevelRequestSchema
request. You'll do this in . As a general reminder, here's how you add request handlers:// this.server.server to access the underlying server instance
this.server.server.setRequestHandler(
RequestSchema, // this is the request schema you're adding support for
async (request) => {
// do something with the request
return // return the appropriate response
},
)
This step will help you get comfortable with the basic request/response flow for
sampling in MCP, and set the stage for more advanced prompt engineering in the
next step.
There are tests to help verify your sampling request is working.
To test this in the MCP inspector:
- run the
create_entry
tool - check the "Sampling" navigation tab and you should have a sampling request
- at this point, YOU are the LLM. You can copy paste this into your response:
{
"model": "stub-model",
"stopReason": "endTurn",
"role": "assistant",
"content": {
"type": "text",
"text": "Good job!"
}
}