MongoDB¶

面向文档的 NoSQL 数据库，适合半结构化数据、快速迭代业务、内容系统、日志事件、用户画像等场景。以 MongoDB 6.x/7.x 为参考。

目录¶

基础篇
- 安装与连接
- 核心概念
- 数据库与集合
- 文档 CRUD
查询篇
- 条件查询
- 投影、排序、分页
- 数组与嵌套文档查询
- 聚合管道
索引篇
- 单字段索引
- 复合索引
- 唯一索引
- TTL 索引
- 文本索引
建模篇
- 内嵌与引用
- 一对一、一对多、多对多
- Schema 设计原则
事务与一致性篇
复制集与分片篇
性能优化篇
Python 使用
运维与监控篇
常见业务场景
面试要点

一、基础篇¶

1.1 安装与连接¶

Docker 快速启动：

docker run -d --name mongodb \
  -p 27017:27017 \
  -e MONGO_INITDB_ROOT_USERNAME=root \
  -e MONGO_INITDB_ROOT_PASSWORD=example \
  mongo:7

连接：

mongosh "mongodb://root:example@localhost:27017/admin"

常用命令：

show dbs
use mydb
show collections
db.stats()
db.version()

常用端口：

端口	说明
27017	MongoDB 默认服务端口
27018	常用于分片节点
27019	常用于 config server

1.2 核心概念¶

关系型数据库	MongoDB
Database	Database
Table	Collection
Row	Document
Column	Field
Index	Index
Join	`$lookup` 或应用层关联

文档示例：

{
  "_id": ObjectId("..."),
  "username": "alice",
  "age": 20,
  "tags": ["python", "backend"],
  "profile": {
    "city": "Shanghai",
    "level": 3
  },
  "created_at": ISODate("2026-04-28T12:00:00Z")
}

特点：

文档结构类似 JSON，实际以 BSON 存储。
同一集合中的文档可以有不同字段。
支持二级索引、聚合、事务、复制集、分片。
更强调按业务访问模式建模。

1.3 数据库与集合¶

use blog

db.createCollection("users")
db.createCollection("posts")

show collections

db.users.drop()
db.dropDatabase()

带校验规则的集合：

db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["username", "email"],
      properties: {
        username: { bsonType: "string" },
        email: { bsonType: "string" },
        age: { bsonType: "int", minimum: 0 }
      }
    }
  }
})

1.4 插入文档¶

db.users.insertOne({
  username: "alice",
  email: "alice@example.com",
  age: 20,
  created_at: new Date()
})

db.users.insertMany([
  { username: "bob", email: "bob@example.com", age: 23 },
  { username: "tom", email: "tom@example.com", age: 28 }
])

1.5 查询文档¶

db.users.find()
db.users.findOne({ username: "alice" })

db.users.find({ age: { $gte: 18 } })
db.users.find({ username: "alice" }, { password: 0 })
db.users.find({}, { username: 1, email: 1, _id: 0 })

1.6 更新文档¶

db.users.updateOne(
  { username: "alice" },
  { $set: { age: 21, updated_at: new Date() } }
)

db.users.updateMany(
  { age: { $lt: 18 } },
  { $set: { status: "minor" } }
)

db.users.updateOne(
  { username: "new_user" },
  { $set: { email: "new@example.com" } },
  { upsert: true }
)

常用更新操作符：

操作符	说明
`$set`	设置字段
`$unset`	删除字段
`$inc`	数值自增
`$push`	数组追加
`$addToSet`	数组去重追加
`$pull`	从数组删除元素
`$rename`	重命名字段

1.7 删除文档¶

db.users.deleteOne({ username: "alice" })
db.users.deleteMany({ status: "inactive" })

删除数据要谨慎，生产环境建议先 find() 确认条件。

二、查询篇¶

2.1 条件查询¶

db.users.find({ age: { $gt: 18 } })
db.users.find({ age: { $gte: 18, $lte: 30 } })
db.users.find({ username: { $in: ["alice", "bob"] } })
db.users.find({ email: { $exists: true } })
db.users.find({ username: /^a/ })

逻辑操作：

db.users.find({
  $or: [
    { age: { $lt: 18 } },
    { status: "vip" }
  ]
})

db.users.find({
  $and: [
    { age: { $gte: 18 } },
    { status: "active" }
  ]
})

2.2 投影、排序、分页¶

db.users
  .find({ status: "active" }, { username: 1, email: 1, _id: 0 })
  .sort({ created_at: -1 })
  .skip(20)
  .limit(10)

深分页不建议长期使用 skip，数据量大时应使用游标分页：

db.posts
  .find({ _id: { $lt: last_id } })
  .sort({ _id: -1 })
  .limit(20)

2.3 数组与嵌套文档查询¶

db.users.find({ tags: "python" })

db.users.find({
  profile: {
    city: "Shanghai",
    level: 3
  }
})

db.users.find({ "profile.city": "Shanghai" })

db.orders.find({
  items: {
    $elemMatch: {
      sku: "A001",
      quantity: { $gte: 2 }
    }
  }
})

2.4 聚合管道¶

聚合管道由多个 stage 组成，数据依次流过。

db.orders.aggregate([
  { $match: { status: "paid" } },
  {
    $group: {
      _id: "$user_id",
      total_amount: { $sum: "$amount" },
      order_count: { $sum: 1 }
    }
  },
  { $sort: { total_amount: -1 } },
  { $limit: 10 }
])

常用 stage：

Stage	说明
`$match`	过滤
`$project`	字段投影和计算
`$group`	分组聚合
`$sort`	排序
`$limit` / `$skip`	限制和跳过
`$unwind`	展开数组
`$lookup`	关联查询
`$addFields`	添加字段

2.5 `$lookup` 关联查询¶

db.orders.aggregate([
  {
    $lookup: {
      from: "users",
      localField: "user_id",
      foreignField: "_id",
      as: "user"
    }
  },
  { $unwind: "$user" },
  {
    $project: {
      order_no: 1,
      amount: 1,
      username: "$user.username"
    }
  }
])

建议：

高频关联优先考虑内嵌或冗余字段。
$lookup 适合低频后台查询或小规模关联。
关联字段需要建立索引。

三、索引篇¶

3.1 单字段索引¶

db.users.createIndex({ username: 1 })
db.users.getIndexes()
db.users.dropIndex({ username: 1 })

排序规则：

1 表示升序。
-1 表示降序。

3.2 复合索引¶

db.orders.createIndex({ user_id: 1, created_at: -1 })

遵循最左前缀原则：

// 可使用索引
db.orders.find({ user_id: 1001 })
db.orders.find({ user_id: 1001 }).sort({ created_at: -1 })

// 难以充分使用上述索引
db.orders.find({ created_at: { $gte: ISODate("2026-01-01") } })

复合索引字段顺序建议：

等值查询字段。
排序字段。
范围查询字段。

3.3 唯一索引¶

db.users.createIndex({ email: 1 }, { unique: true })

适合用户名、邮箱、手机号等唯一业务键。

3.4 TTL 索引¶

db.sessions.createIndex(
  { created_at: 1 },
  { expireAfterSeconds: 3600 }
)

TTL 索引用于自动删除过期数据，常见于会话、验证码、临时日志。

3.5 文本索引¶

db.articles.createIndex({ title: "text", content: "text" })

db.articles.find({
  $text: { $search: "mongodb index" }
})

中文复杂搜索一般使用 Elasticsearch 或 OpenSearch 更合适。

3.6 查看执行计划¶

db.orders.find({ user_id: 1001 })
  .sort({ created_at: -1 })
  .explain("executionStats")

重点关注：

COLLSCAN：全表扫描，通常需要优化。
IXSCAN：使用索引扫描。
totalDocsExamined：扫描文档数量。
totalKeysExamined：扫描索引数量。
executionTimeMillis：执行耗时。

四、建模篇¶

4.1 内嵌与引用¶

内嵌示例：

{
  "_id": ObjectId("..."),
  "username": "alice",
  "addresses": [
    { "city": "Shanghai", "detail": "xxx" },
    { "city": "Hangzhou", "detail": "yyy" }
  ]
}

引用示例：

{
  "_id": ObjectId("..."),
  "user_id": ObjectId("..."),
  "amount": 99.9,
  "status": "paid"
}

选择建议：

场景	建模方式
子对象随主对象一起读取	内嵌
子对象数量有限	内嵌
子对象无限增长	引用
多个对象共享同一实体	引用
需要独立更新和查询	引用

4.2 一对多¶

少量一对多可以内嵌：

{
  "post_id": ObjectId("..."),
  "title": "MongoDB",
  "comments": [
    { "user": "alice", "content": "good" },
    { "user": "bob", "content": "nice" }
  ]
}

大量一对多应拆集合：

// posts
{ "_id": ObjectId("p1"), "title": "MongoDB" }

// comments
{ "post_id": ObjectId("p1"), "user": "alice", "content": "good" }

4.3 Schema 设计原则¶

以查询模式为中心设计，而不是完全照搬关系模型。
高频一起读取的数据尽量放在同一文档。
避免文档无限增长。
高频排序和过滤字段必须有索引。
可以适度冗余字段，换取查询性能。
单文档最大 16MB，设计时要避免接近上限。

五、事务与一致性篇¶

5.1 单文档原子性¶

MongoDB 对单个文档的写操作天然原子。

db.accounts.updateOne(
  { _id: 1, balance: { $gte: 100 } },
  { $inc: { balance: -100 } }
)

5.2 多文档事务¶

const session = db.getMongo().startSession()
session.startTransaction()

try {
  const accounts = session.getDatabase("bank").accounts

  accounts.updateOne({ _id: 1 }, { $inc: { balance: -100 } })
  accounts.updateOne({ _id: 2 }, { $inc: { balance: 100 } })

  session.commitTransaction()
} catch (e) {
  session.abortTransaction()
  throw e
} finally {
  session.endSession()
}

注意：

多文档事务依赖复制集或分片集群。
事务会带来性能开销。
能通过单文档建模解决的问题，不要过度使用事务。

5.3 Read Concern 与 Write Concern¶

db.collection.find().readConcern("majority")

db.collection.insertOne(
  { name: "alice" },
  { writeConcern: { w: "majority", j: true } }
)

参数	说明
`w: 1`	主节点确认即可
`w: "majority"`	多数节点确认
`j: true`	写入 journal 后确认
`readConcern: "local"`	读取本地已写入数据
`readConcern: "majority"`	读取多数节点确认的数据

六、复制集与分片篇¶

6.1 复制集¶

Primary  <--- 写入
  |
  +--> Secondary
  +--> Secondary

作用：

自动故障转移。
数据冗余。
读写分离。
支持事务。

常用命令：

rs.status()
rs.conf()
rs.stepDown()

6.2 分片集群¶

Application
    |
  mongos
    |
Config Servers
    |
Shard 1 / Shard 2 / Shard 3

启用分片：

sh.enableSharding("shop")
sh.shardCollection("shop.orders", { user_id: "hashed" })

分片键选择：

基数高。
写入分布均匀。
高频查询能带上分片键。
避免单调递增键导致写入热点。

七、性能优化篇¶

7.1 查询优化¶

优化清单：

使用 explain("executionStats") 检查执行计划。
高频过滤、排序字段建立索引。
避免返回过多字段，使用投影。
深分页使用游标分页。
聚合管道尽量把 $match 放前面。
$sort 尽量借助索引。
控制 $lookup 规模。

7.2 写入优化¶

db.logs.insertMany(docs, { ordered: false })

建议：

批量写入优于逐条写入。
索引不是越多越好，索引会降低写入速度。
高并发计数可使用分桶计数。
避免单文档热点更新。

7.3 常见反模式¶

反模式	问题
无索引查询大集合	全表扫描
滥用 `$lookup`	类似关系库多表 JOIN，性能不稳定
文档数组无限增长	接近 16MB 限制，更新变慢
深分页 `skip`	跳过大量数据
过多索引	写入变慢，占用内存和磁盘
分片键低基数	数据倾斜

八、Python 使用¶

安装：

pip install pymongo

连接：

from pymongo import MongoClient

client = MongoClient("mongodb://root:example@localhost:27017/admin")
db = client["blog"]
users = db["users"]

插入：

result = users.insert_one({
    "username": "alice",
    "email": "alice@example.com",
    "age": 20,
})

print(result.inserted_id)

查询：

for user in users.find({"age": {"$gte": 18}}, {"password": 0}):
    print(user)

更新：

users.update_one(
    {"username": "alice"},
    {"$set": {"age": 21}},
)

事务：

with client.start_session() as session:
    with session.start_transaction():
        db.accounts.update_one(
            {"_id": 1},
            {"$inc": {"balance": -100}},
            session=session,
        )
        db.accounts.update_one(
            {"_id": 2},
            {"$inc": {"balance": 100}},
            session=session,
        )

九、运维与监控篇¶

常用命令：

db.serverStatus()
db.currentOp()
db.killOp(opid)
db.collection.stats()
db.collection.getIndexes()

备份恢复：

mongodump --uri="mongodb://root:example@localhost:27017/admin" --out=backup
mongorestore --uri="mongodb://root:example@localhost:27017/admin" backup

监控指标：

QPS、连接数、慢查询。
CPU、内存、磁盘 IO。
Page Fault、缓存命中。
复制延迟。
锁等待和队列。
分片数据倾斜。

开启慢查询：

db.setProfilingLevel(1, { slowms: 100 })
db.system.profile.find().sort({ ts: -1 }).limit(5)

十、常见业务场景¶

10.1 用户画像¶

{
  "user_id": 1001,
  "basic": { "gender": "female", "city": "Shanghai" },
  "tags": ["vip", "python", "active"],
  "metrics": { "login_days": 120, "order_count": 30 }
}

10.2 内容系统¶

文章、评论、标签可以根据访问模式选择内嵌或引用。

10.3 日志与事件¶

短期日志可用 TTL 索引自动清理：

db.events.createIndex({ created_at: 1 }, { expireAfterSeconds: 86400 * 30 })

十一、面试要点¶

11.1 MongoDB 和 MySQL 的区别？¶

对比	MongoDB	MySQL
数据模型	文档模型	关系模型
Schema	灵活	固定
扩展	原生分片	通常依赖中间件或分库分表
事务	支持，但不宜滥用	成熟事务
查询	文档查询、聚合管道	SQL

11.2 MongoDB 索引原理？¶

MongoDB 索引通常基于 B-Tree。索引可以减少扫描文档数量，但会增加写入成本和存储空间。复合索引遵循最左前缀原则。

11.3 什么时候用内嵌，什么时候用引用？¶

一起读、一起改、数量有限：内嵌。
独立读写、数量无限增长、多处共享：引用。

11.4 如何排查慢查询？¶

开启 profiler 或查看慢日志。
使用 explain("executionStats")。
检查是否 COLLSCAN。
优化索引和查询字段。
检查返回数据量、排序、聚合和 $lookup。

参考资源¶

MongoDB 官方文档：https://www.mongodb.com/docs/
PyMongo 文档：https://pymongo.readthedocs.io/
MongoDB 聚合管道：https://www.mongodb.com/docs/manual/aggregation/

MongoDB¶

目录¶

一、基础篇¶

1.1 安装与连接¶

1.2 核心概念¶

1.3 数据库与集合¶

1.4 插入文档¶

1.5 查询文档¶

1.6 更新文档¶

1.7 删除文档¶

二、查询篇¶

2.1 条件查询¶

2.2 投影、排序、分页¶

2.3 数组与嵌套文档查询¶

2.4 聚合管道¶

2.5 $lookup 关联查询¶

三、索引篇¶

3.1 单字段索引¶

3.2 复合索引¶

3.3 唯一索引¶

3.4 TTL 索引¶

3.5 文本索引¶

3.6 查看执行计划¶

四、建模篇¶

4.1 内嵌与引用¶

4.2 一对多¶

4.3 Schema 设计原则¶

五、事务与一致性篇¶

5.1 单文档原子性¶

5.2 多文档事务¶

5.3 Read Concern 与 Write Concern¶

六、复制集与分片篇¶

6.1 复制集¶

6.2 分片集群¶

七、性能优化篇¶

7.1 查询优化¶

7.2 写入优化¶

7.3 常见反模式¶

八、Python 使用¶

九、运维与监控篇¶

十、常见业务场景¶

10.1 用户画像¶

10.2 内容系统¶

10.3 日志与事件¶

十一、面试要点¶

11.1 MongoDB 和 MySQL 的区别？¶

11.2 MongoDB 索引原理？¶

11.3 什么时候用内嵌，什么时候用引用？¶

11.4 如何排查慢查询？¶

参考资源¶

2.5 `$lookup` 关联查询¶