AI練習:串接Ollama API
目前AI的熱潮,讓大家都想試著架自己的LLM伺服器,在有GPU的情況下,LM Studio是個不錯的選擇。但對我而言,它執行時會有UI,對於只需背景執行的伺服器而言,不算很方便,好在還有Ollama這個選擇。
在安裝Ollama後,可以透過指令ollama pull [model name]
來下載模型,這個跟Docker有點類似,下載之後,就可以透過Ollama的Web API來跟模型互動,它的API位址是http://localhost:11434/api/generate
,由於它是串流式的回傳JSNON,所以介接上會跟一般的Web API不同。
首先定義它的回傳值Model:
public class OllamaResponseModel
{
public string model { get; set; } = string.Empty;
public string created_at { get; set; } = string.Empty;
public string response { get; set; } = string.Empty;
public bool done { get; set; } = false;
}
而它的Request Model如下:
public class PromptRequestModel
{
public string model { get; set; } = string.Empty;
public string prompt { get; set; } = string.Empty;
}
接下來是送出Prompt的部份:
public async IAsyncEnumerable<OllamaResponseModel> SendPrompt(string userPrompt)
{
string systemPrompt = "You are a knowledgeable and friendly assistant. Answer the following question as clearly and concisely as possible, providing any relevant information and examples.";
string result = string.Empty;
PromptRequestModel requestModel = new PromptRequestModel
{
model = _llmModel,
prompt = $"<|system|>{systemPrompt}<|end|><|user|>{userPrompt}<|end|><|assistant|>"
};
HttpClient client = new HttpClient();
client.BaseAddress = new Uri(_host);
var jsonResponse = await client.PostAsJsonAsync(@"/api/generate", requestModel);
Stream? stream = await jsonResponse.Content.ReadAsStreamAsync();
var r = ReadJsonStreamMultipleContent(stream);
int count = 0;
foreach (string item in r)
{
if (item != null)
{
var i = System.Text.Json.JsonSerializer.Deserialize<OllamaResponseModel>(item);
yield return i;
}
await Task.Delay(1);
count++;
}
}
public IEnumerable<string?> ReadJsonStreamMultipleContent(Stream stream)
{
StreamReader sr = new StreamReader(stream);
while (!sr.EndOfStream)
{
yield return sr.ReadLine();
}
}
最後是呼叫方式:
OllamaHelper ollamaHelper = new OllamaHelper();
await foreach (var item in ollamaHelper.SendPrompt(prompt))
{
Console.Write(item.response);
}
參考資料