Exploring Semantic Kernel
Introduction
We have come to learn during the past months that AI-powered LLM(large language model)'s are good at generating text based on user input. We quickly discovered the obvious benefits and disadvantages as these models were made publicly available from early on.
The less talked about topic is how traditional businesses have utilized the power of LLM's and AI. The obvious question which I often bump into is "can it DO anything else than generate text"? This question was the foundation of my interest towards Semantic Kernel, an AI orchestration tool for developers and data experts.
Semantic Kernel tries to narrow the gap between traditional software development and AI services. You can create your own plugins or attach already existing ones and orchestrate them as a sequence of actions for creating different kind of workflows based on semantic input.
Use case
On a high-level, I wanted my workflow to execute actions for fetching some data from somewhere, transforming it and then doing something useful with the data. I ended up with the following setup:
- Search for book reviews based on user input.
- Get the best result and translate it to Finnish.
- Summarize the review in two sentences.
- Send the summary to my personal email address.
High-level program flow
To make it work, I needed to install Azure OpenAI Service(you can also use Chat GPT) for text completion capabilities, Azure Cognitive Search for semantic search capabilities(this can be also utilized as a memory store) and I also needed email functionality so I decided to use Graph API for that. Azure Cognitive Search with semantic search support is a bit pricey(Standard plan required), so you might want to use a different storage solution for PoC purposes. However, that would require more work on the native function implementation.
Implementation
After installing the previously mentioned Azure services, we need to implement the actual flow. I'm using C# as the language here.
Create a project and install Microsoft.SemanticKernel NuGet package. At the time of writing, the latest version(0.19.230804.2-preview) is still in preview.
Plugins
We'll first create the required plugins which are essentially a group of functions. There are two types of functions, semantic and native, the first being actually and AI prompt and the latter being a custom C# class implementation. So we'll create two plugins with a native functions, one for searching the data and one for sending the email. We also utilize a ready-made WriterPlugin semantic functions for text transformation.
We'll just create simplified implementations for the native functions as the main objective is to illustrate the semantic kernel capabilities. First, let's create the class for representing our Cognitive Search data model.
public class AzureCognitiveSearchRecord
{
[SimpleField(IsKey = true, IsFilterable = false)]
public string id { get; set; } = string.Empty;
[SearchableField]
public string content { get; set; } = string.Empty;
[SearchableField]
public string title { get; set; } = string.Empty;
[SearchableField]
public string author { get; set; } = string.Empty;
}
Next we create the plugin which executes the actual search functionality. For this to work, we'll need to add Azure.Search.Documents NuGet package. We need to use the beta version(11.5.0-beta.4) which has the required semantic capabilities. The important part of the native function is the description which helps the orchestration to find the correct functionality for the requested prompt.
public class SearchPlugin
{
private readonly SearchIndexClient _adminClient;
private const string _indexName = "{}";
private readonly ConcurrentDictionary<string, SearchClient> _clientsByIndex = new();
public SearchPlugin(IOptionsMonitor<SearchConfigurationSettings> options)
{
var searchOptions = new SearchClientOptions();
AzureKeyCredential credentials = new(options.CurrentValue.Key);
_adminClient = new SearchIndexClient(new Uri(options.CurrentValue.Url), credentials, searchOptions);
}
[SKFunction]
[Description("Search for book reviews.")]
public async Task<string> Search(string input)
{
var result = await SearchAsync(input);
return result?.content ?? string.Empty;
}
public async Task<AzureCognitiveSearchRecord?> SearchAsync(
string query,
CancellationToken cancellationToken = default)
{
var client = _adminClient.GetSearchClient(_indexName);
var options = new SearchOptions
{
QueryType = SearchQueryType.Semantic,
SemanticConfigurationName = "default",
QueryLanguage = "en-us",
Size = 1,
};
var searchResult = await client
.SearchAsync<AzureCognitiveSearchRecord>(query, options, cancellationToken: cancellationToken)
.ConfigureAwait(false);
if (searchResult != null)
{
await foreach (var doc in searchResult.Value.GetResultsAsync())
{
return doc.Document ?? null;
}
}
return null;
}
private SearchClient GetSearchClient(string indexName)
{
return _adminClient.GetSearchClient(indexName);
}
}
Next we implement the email plugin. We're using Graph API in this example so you'll need to have mechanism for requesting authentication tokens or utilize another email sending service.
internal class GraphSearchService : IGraphSearchService
{
public async Task SendMailAsync(string recipient, string subject, string content)
{
var authenticationProvider = new BaseBearerTokenAuthenticationProvider(new CustomTokenProvider());
var graphServiceClient = new GraphServiceClient(authenticationProvider);
var me = await graphServiceClient.Me.GetAsync();
var message = new Microsoft.Graph.Me.SendMail.SendMailPostRequestBody
{
Message = new Message
{
ToRecipients = new List<Recipient> {
new Recipient { EmailAddress = new EmailAddress { Address = recipient } },
},
Sender = new Recipient { EmailAddress = new EmailAddress { Address = me!.Mail } },
Subject = subject,
Body = new ItemBody { Content = content, ContentType = BodyType.Text },
},
};
await graphServiceClient.Me.SendMail.PostAsync(message);
}
}
public class SendMailPlugin
{
private readonly IGraphSearchService _graphSearchService;
public SendMailPlugin(IGraphSearchService graphSearchService)
{
_graphSearchService = graphSearchService;
}
[SKFunction]
[Description("Send mail body to a recipient.")]
[SKParameter("input", "The body of the message")]
[SKParameter("to", "The recipient of the message")]
public async Task<string> Send(SKContext context)
{
var recipient = context["to"];
var body = context["input"];
await _graphSearchService.SendMailAsync(recipient, "Email from SendMail plugin", body);
return body;
}
}
Normally, a native function accepts only a single input string. We need to use the Semantic Kernel context in order to pass multiple variables to the function, as seen above. All parameters must be of string type.
There is a set of ready-made semantic functions available in the Git repo. Semantic functions are basically AI prompts fitted for a purpose and saved in a file. Semantic Kernel then looks for the given function from the file structure under the plugin and attaches it to the kernel.
Planner
We use a special Semantic Kernel feature, called Planner, for the orchestration. It is basically a function that receives the execution prompt and returns the plan, or the sequence of actions which are to be executed. The planner then uses skills/plugins assigned to it to execute the workflow.
internal class EmailWorkFlow
{
private readonly IKernel _kernel;
private readonly Plan _plan;
private const string _emailAddress = "{}";
public EmailWorkFlow(ILoggerFactory loggerFactory,
IOptionsMonitor<OpenAIConfigurationSettings> _openAIConfigurationSettings,
IOptionsMonitor<SearchConfigurationSettings> _searchConfigurationSettings,
IGraphSearchService graphSearchService)
{
var _logger = loggerFactory.CreateLogger<EmailWorkFlow>();
_kernel = Kernel.Builder
.WithLogger(_logger)
.WithAzureTextCompletionService(
_openAIConfigurationSettings.CurrentValue.DeploymentName,
_openAIConfigurationSettings.CurrentValue.EndpointBaseUrl?.AbsoluteUri ?? string.Empty,
_openAIConfigurationSettings.CurrentValue.ApiKey)
.Build();
_kernel.ImportSemanticSkillFromDirectory(Path.Combine(Directory.GetCurrentDirectory(), "Plugins"), "WriterPlugin");
_kernel.ImportSkill(new SearchPlugin(_searchConfigurationSettings), "SearchPlugin");
_kernel.ImportSkill(new SendMailPlugin(graphSearchService), "SendMailPlugin");
var planner = new SequentialPlanner(_kernel);
var ask = string.Format(@"Search book reviews with {{$input}} then translate the result to Finnish and send the translated result as email to {0}.", _emailAddress);
_plan = planner.CreatePlanAsync(ask).Result;
}
public async Task<string> ExecuteAsync(string query, CancellationToken cancellationToken)
{
var context = new SKContext();
context.Variables["input"] = query;
var planResult = await _plan.InvokeAsync(context, new CompleteRequestSettings { MaxTokens = 1000 });
return await Task.FromResult(planResult.Result);
}
}
The most important part of the above implementation is the ask which is passed to the plan execution. This is the guideline for the planner to look for skills/plugins that match the requested criteria. The generated plan can be seen by examining the planner instance.
It's worth mentioning that creating and executing these plans takes tokens as they are generated on the Azure OpenAI/Chat GPT end. I quickly ran out of tokens and hit rpm(requests per minute) limits testing the solution with Chat GPT free plan.
ExecuteAsync is initiated outside the context of the Semantic Kernel application. This could be the user prompt for finding specific book reviews. Input is passed to the context which is then handled in the function side.
await flow.ExecuteAsync("foundation", default(CancellationToken));
This prompt finds the correct title from the data source and returns the results(shortened for readability). Extra care has to be taken when generating the prompt. Even small changes to the prompt string might create a different plan with unexpected results.
Isaac Asimov's 'Foundation' is a science fiction masterpiece that captivates the imagination with its grand scope, intricate world-building, and thought-provoking exploration of the rise and fall of civilizations. Originally published as a series of short stories in the 1940s, 'Foundation' is the first installment of Asimov's eponymous series, which has had a profound influence on the genre. The narrative unfolds in a distant future where a mathematician named Hari Seldon has developed 'psychohistory', a revolutionary concept that combines history, sociology, and mathematics to predict the future behavior of large populations. Recognizing the inevitable decline of the Galactic Empire, Seldon orchestrates the creation of the Foundation, a hidden colony tasked with preserving knowledge and guiding the galaxy through a dark age of chaos...
Result after the translation and summarization.
Translated output
Summary
I was excited to get it all working even though the result was not quite perfect. My initial thoughts were something like "wow, I can create an orchestrated workflow with natural language" and "sky's the limit with this thing". But when I paused for a while and started thinking about the bigger picture and how it all integrates to new or existing software, I found out that the initial excited thoughts were mostly technical.
Since then, I've sometimes revisited the solution and thought about the whole skill/plugin framework and all that and I'm stuck with the dilemma of making this naturally approachable solution available to users. If it's just for the orchestration like shown above, it doesn't do much good compared to a more traditional implementation. And if the user's are allowed to create the plans themselves, how to make them aware of all the possible skills/plugins in the underlying system? It feels like there's an extra layer that just adds complexity with little added value. But I can definitely see some potential use cases for this kind of workflow as well.
There is definitely more to learn from Semantic Kernel than what I've just shown. There are more fine-grained use cases available with features like memory and vector databases. Maybe I'll come back to it after clearing my head with some traditional development duties.