Simple Sampling

👨‍💼 Our users love the prompt functionality, but they asked why the LLM couldn't just suggest tags when they create the post and we thought that's a great idea!

So now your goal is to make your server request a simple completion from the language model whenever a new journal entry is created.

In this first step, we'll just get things wired up and then we'll work on our prompt for the LLM in the next step.

Here's what you'll do:

When the create_entry tool is called, call the suggestTagsSampling function with the agent and the createdEntry.id.
We don't want to wait for the sampling function to finish so instead of "await" use "void" which effectively ignores the promise.
First check if the client supports sampling. If it does, proceed with the sampling request.
Implement a function that sends a sampling request to the client using agent.server.server.createMessage (the server.server thing is funny, but our MCP server manages an internal server and that's what we're accessing).
Use a simple system prompt (e.g., "You are a helpful assistant.") and a user message that references the new journal entry's ID (we'll enhance this next).
Set a reasonable maxTokens value for the response.
Parse the model's response using a provided Zod schema.
Send a notification with the result so you can see the model's output.

And don't forget to call it when the user creates a new journal entry!

Here's an example of how to check if the client supports sampling:

// the server.server is how the MCP SDK exposes the underlying server
// instance for more advanced APIs like this one.
const capabilities = agent.server.server.getClientCapabilities()
if (!capabilities?.sampling) {
	console.error('Client does not support sampling, skipping sampling request')
	return
}
// Proceed with sampling request here

And here's an example of how to send a sampling request:

const result = await agent.server.server.createMessage({
	// the system prompt is used to explain to the LLM it's purpose
	systemPrompt: '...',
	messages: [
		{
			role: 'user',
			content: {
				type: 'text',
				// the user message is the input to the LLM
				text: '...',
			},
		},
	],
	maxTokens: 100,
})

The maxTokens option references the max tokens the model should return.

The system prompt + messages + OUTPUT MESSAGE can't exceed the context window size of the model your user is using.

Here's an example of how to send a notification:

// void is to communicate we don't care about the return value. Fire and forget.
void agent.server.server.sendLoggingMessage({
	level: 'info', // "error" | "debug" | "info" | "notice" | "warning" | "critical" | "alert" | "emergency"
	// data can be any JSON-serializable value
	data: {
		a: 1,
		otherStuff: '...',
	},
})

Check the specification for more info: https://modelcontextprotocol.io/specification/2025-06-18/server/utilities/logging#log-message-notifications

The SDK handles keeping track of the logging level and only sending messages if they are within the set logging level.

This step will help you get comfortable with the basic request/response flow for sampling in MCP, and set the stage for more advanced prompt engineering in the next step.

There are tests to help verify your sampling request is working.

To test this in the MCP inspector:

run the create_entry tool
check the "Sampling" navigation tab and you should have a sampling request
at this point, YOU are the LLM. You can copy paste this into your response:

{
	"model": "stub-model",
	"stopReason": "endTurn",
	"role": "assistant",
	"content": {
		"type": "text",
		"text": "Good job!"
	}
}

03. Sampling/01. Simple Sampling (💪 problem)

Simple Sampling