目前AI的熱潮,讓大家都想試著架自己的LLM伺服器,在有GPU的情況下,LM Studio是個不錯的選擇。但對我而言,它執行時會有UI,對於只需背景執行的伺服器而言,不算很方便,好在還有Ollama這個選擇。

在安裝Ollama後,可以透過指令ollama pull [model name]來下載模型,這個跟Docker有點類似,下載之後,就可以透過Ollama的Web API來跟模型互動,它的API位址是http://localhost:11434/api/generate,由於它是串流式的回傳JSNON,所以介接上會跟一般的Web API不同。

首先定義它的回傳值Model:

public class OllamaResponseModel
{
    public string model { get; set; } = string.Empty;
    public string created_at { get; set; } = string.Empty;
    public string response { get; set; } = string.Empty;
    public bool done { get; set; } = false;
}

而它的Request Model如下:

public class PromptRequestModel
{
    public string model { get; set; } = string.Empty;
    public string prompt { get; set; } = string.Empty;
}

接下來是送出Prompt的部份:

public async IAsyncEnumerable<OllamaResponseModel> SendPrompt(string userPrompt)
{
    string systemPrompt = "You are a knowledgeable and friendly assistant. Answer the following question as clearly and concisely as possible, providing any relevant information and examples.";
    string result = string.Empty;
    PromptRequestModel requestModel = new PromptRequestModel
    {
        model = _llmModel,
        prompt = $"<|system|>{systemPrompt}<|end|><|user|>{userPrompt}<|end|><|assistant|>"
    };

    HttpClient client = new HttpClient();
    client.BaseAddress = new Uri(_host);
    var jsonResponse = await client.PostAsJsonAsync(@"/api/generate", requestModel);

    Stream? stream = await jsonResponse.Content.ReadAsStreamAsync();
    var r = ReadJsonStreamMultipleContent(stream);
    int count = 0;
    foreach (string item in r)
    {
        if (item != null)
        {
            var i = System.Text.Json.JsonSerializer.Deserialize<OllamaResponseModel>(item);
            yield return i;
        }
        await Task.Delay(1);
        count++;
    }
}

public IEnumerable<string?> ReadJsonStreamMultipleContent(Stream stream)
{
    StreamReader sr = new StreamReader(stream);
    while (!sr.EndOfStream)
    {
        yield return sr.ReadLine();
    }
}

最後是呼叫方式:

OllamaHelper ollamaHelper = new OllamaHelper();
await foreach (var item in ollamaHelper.SendPrompt(prompt))
{
    Console.Write(item.response);
}

參考資料