手把手教你配置Ollama并与Spring AI集成

本文将详细介绍Ollama这个开源项目，它能帮你在本地轻松运行LLM。同时，还会一步步教你把Ollama和Spring AI集成起来，让你能在Spring AI项目里用上Ollama的模型。我们会从Ollama的基础概念讲起，再到它的安装、模型下载运行，以及和Spring AI集成的具体步骤，内容很全面，快来一起看看吧！

一、Ollama是什么？

Ollama是一个超实用的开源项目，有了它，在自己电脑上运行大语言模型就变得轻松多了。它和Docker有点像，Docker主要是管理项目外部的依赖，像数据库或者JMS；而Ollama专注于大语言模型的运行。它把下载、安装大语言模型，还有和模型交互这些复杂的流程都简化了，支持好多热门的模型，像LLaMA – 2、Mistral、CodeLLaMA等等。而且，我们还能根据自己的需求微调模型的表现呢。

二、安装Ollama

Ollama提供了三种主要的安装方式，下面来详细说说。

（一）使用安装文件安装

对于Windows和Mac的新手用户来说，这种方式最简单。

打开Ollama的官方网站：https://ollama.com/ 。
找到并点击“Download”按钮，网站会自动识别你的操作系统，给出适合的安装文件。
下载对应的安装文件（Windows是.exe后缀，Mac是.dmg后缀）。
双击下载好的文件，按照屏幕上的提示一步步完成安装。
安装完成后，打开终端，输入下面的命令启动Ollama：

ollama serve

（二）使用命令行安装

这种方式对于Linux用户更灵活。

打开终端窗口。
在终端里输入下面这条命令，它能一步完成Ollama的下载和安装：

curl -fsSL https://ollama.com/install.sh | sh

安装好后，还是在终端里输入下面的命令启动Ollama：

ollama serve

（三）使用Docker安装

如果你想在容器化的环境里运行Ollama，这种方式就很合适，它能更好地隔离和管理资源。不过，使用前要确保你的系统已经安装了Docker Desktop。

打开终端，运行下面的命令拉取Ollama的官方Docker镜像：

docker pull ollama/ollama

（可选操作）要是你有兼容的GPU，可以指定GPU的使用方式。运行下面这条命令：

docker run -d --gpus all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

如果有多块GPU，把命令里的‘all’换成具体的GPU设备ID就行。这条命令挂载了一个卷（ollama）来持久化数据，还把容器的端口（11434）映射到了主机的端口（11434）。
3. 不管有没有使用GPU，都可以用下面这条命令启动容器：

docker start ollama

最后，在终端里输入下面的命令启动Ollama：

ollama serve

三、下载并运行模型

安装好Ollama之后，就可以下载它支持的大语言模型，然后运行模型并开始和它交互啦。不过要注意，这些模型文件可能会很大，有好几个GB，所以要确保你的电脑磁盘空间足够。

在命令行里使用ollama run modelname命令，是下载和运行模型最简单的方法。比如，要下载并运行Gemma2模型，可以输入：

ollama run gemma2

运行这个命令后，Ollama会初始化并准备好Gemma2模型。之后，我们就能输入文本提示或者符合模型能力的命令，Ollama会用下载好的模型来处理这些输入。

四、连接到Ollama API

Ollama提供的REST风格API，能方便地把AI功能集成到第三方客户端应用里。这个API默认的访问地址是http://localhost:11434。要确保Ollama在后台运行，这样才能访问API。

我们可以访问它支持的API端点来执行操作。比如，使用POST /api/generate端点来生成内容，示例命令如下：

curl http://localhost:11434/api/generate -d '{ "model": "llama3", "prompt": "Why is the sky blue?" }'

除了用命令行工具cURL，我们还能在自己的应用程序里发送HTTP请求来使用这个API。为了演示方便，也可以用像Postman这样的API客户端来发送请求：

curl http://localhost:11434/api/generate -d '{"model": "gemma2", "prompt": "Why is the sky blue?"}'

在Postman里测试时，如果把“stream”设置为“false”，会得到类似下面的响应：

{ "model":"gemma2", "created_at":"2024-07-21T19:18:07.6379526Z", "response": "{n "sunrise": "orange, red, yellow",n "morning": "blue, light blue",n "afternoon": "blue, bright blue", n "sunset": "orange, red, purple,pink", n "night": "black or dark blue (with stars)"n}", "done":true, "done_reason":"stop", "context":[ 106, 1645, 108, 1841, 2881 ] }

五、与Spring AI集成

和其他大语言模型供应商一样，Spring AI支持通过它的ChatModel和EmbeddingModel接口来调用Ollama的API。在内部，Spring AI会创建OllamaChatModel和OllamaEmbeddingModel类的实例。具体的集成步骤如下：

（一）添加Maven依赖

首先，要在项目里添加必要的依赖。如果是新建项目，可以参考Spring AI的入门指南来进行相关设置。在项目的pom.xml文件里添加下面的依赖：

<dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-ollama-spring-boot-starter</artifactId> </dependency>

（二）配置基础URL和模型名称

默认情况下，Spring AI使用的基础URL是http://localhost:11434，默认的模型名称是mistral。要是你的Ollama运行在其他端口，可以在属性文件里进行配置。比如：

spring.ai.ollama.base-url=http://localhost:11434 spring.ai.ollama.chat.options.model=gemma spring.ai.ollama.chat.options.temperature=0.4

你还可以参考其他支持的属性，根据自己的需求进行修改。

如果使用Java配置，我们可以在OllamaOptions的构建实例里传递基础URL和模型名称。示例代码如下：

import org.springframework.ai.ollama.OllamaApi; import org.springframework.ai.ollama.OllamaChatModel; import org.springframework.ai.ollama.OllamaOptions; import org.springframework.beans.factory.annotation.Value; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; // 配置类，用于定义Spring容器中的Bean @Configuration public class OllamaConfig { // 定义一个名为ollamaChatModel的Bean，类型为OllamaChatModel @Bean OllamaChatModel ollamaChatModel(@Value("spring.ai.ollama.base-url") String baseUrl) { return new OllamaChatModel(new OllamaApi(baseUrl), OllamaOptions.create() .withModel("gemma") .withTemperature(0.4f)); } }

（三）发送提示并获取响应

在使用Spring AI时，推荐通过它的Model类（像ChatModel、ImageModel、EmbeddingModel等）来和大语言模型交互。对于Ollama，我们就用ChatModel接口。

下面这段代码展示了如何调用聊天流API stream(prompt)，它会以流的形式生成输出：

import org.springframework.ai.ollama.OllamaChatModel; import org.springframework.ai.prompt.Prompt; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.CommandLineRunner; import org.springframework.stereotype.Component; // 定义一个组件类，实现CommandLineRunner接口 @Component public class OllamaChatApp implements CommandLineRunner { // 自动注入OllamaChatModel实例 @Autowired OllamaChatModel chatModel; // 实现CommandLineRunner接口的run方法 @Override public void run(String... args) throws Exception { // 调用OllamaChatModel的stream方法，发送提示并处理响应流 chatModel.stream(new Prompt( "Generate the names of 5 famous pirates.", OllamaOptions.create() .withModel("gemma2") .withTemperature(0.4F) )).subscribe(chatResponse -> { // 输出响应内容 System.out.print(chatResponse.getResult().getOutput().getContent()); }); } }

要是想同步访问聊天模型，可以使用chat(prompt)方法。示例代码如下：

import org.springframework.ai.ollama.OllamaChatModel; import org.springframework.ai.prompt.Prompt; import org.springframework.ai.response.ChatResponse; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.boot.CommandLineRunner; import org.springframework.stereotype.Component; import java.util.stream.Collectors; // 定义一个组件类，实现CommandLineRunner接口 @Component public class OllamaSyncChatApp implements CommandLineRunner { // 自动注入OllamaChatModel实例 @Autowired OllamaChatModel chatModel; // 实现CommandLineRunner接口的run方法 @Override public void run(String... args) throws Exception { // 调用OllamaChatModel的call方法，发送提示并获取响应 ChatResponse response = chatModel.call( new Prompt( "Generate the names of 5 famous pirates.", OllamaOptions.create() .withModel("gemma2") .withTemperature(0.4F) )); // 处理响应结果，输出内容 response.getResults() .stream() .map(generation -> generation.getOutput().getContent()) .forEach(System.out::println); } }

运行上面的代码，可能会得到类似下面的输出：

Here are 5 famous pirates: 1. **Blackbeard** (Edward Teach) 2. **Captain Henry Morgan** 3. **Anne Bonny** 4. **Bartholomew Roberts** ("Black Bart") 5. **Captain Jack Sparrow** (fictional, but very famous!) Let me know if you'd like more!

六、总结

在这篇关于Spring AI和Ollama本地配置的教程里，我们先是学习了怎么用Ollama下载、安装和运行大语言模型。Ollama就像一个贴心的助手，帮我们管理本地运行的大语言模型的整个生命周期，还提供API让我们能根据模型的能力和它交互。接着，我们学会了用cURL、Postman这些工具来访问安装好的Ollama模型及其API。最后，我们成功搭建了Spring AI项目，通过Spring AI模块提供的ChatModel抽象接口，实现了对Ollama模型聊天API的访问。希望大家通过这篇文章能顺利掌握这些技术，在开发中灵活运用！要是有啥问题，欢迎一起讨论。