在开始用CURL操作Elasticsearch之前,我们需要先准备好基础环境。我建议使用Docker快速搭建一个ES实例,这样既干净又不会影响本地环境。这里以ES 7.x版本为例(虽然原始文章用的是6.8,但7.x是目前的主流版本):
bash复制docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.17.0
安装完成后,可以用这个简单的CURL命令测试ES是否正常运行:
bash复制curl -X GET "localhost:9200/"
你会看到类似这样的响应:
json复制{
"name" : "node-1",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "abcdefg",
"version" : {
"number" : "7.17.0",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "abcdef",
"build_date" : "2022-02-01T00:00:00.000Z",
"build_snapshot" : false,
"lucene_version" : "8.11.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
这里有几个关键概念需要理解清楚:
创建索引是使用ES的第一步,但很多人直接照搬网上的配置,其实每个参数都应该根据业务需求仔细考量。下面我通过一个电商商品搜索的案例,详细解释每个配置项的作用。
假设我们要创建一个商品索引,预期日增数据量在100万左右,查询QPS在500左右。这样的业务场景下,合理的索引配置应该是:
bash复制curl -X PUT "localhost:9200/products" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 1,
"refresh_interval": "30s",
"analysis": {
"analyzer": {
"product_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "stemmer"]
}
}
}
}
}
'
让我解释下这些配置的考量:
Mapping设计是ES使用中最关键也最容易踩坑的环节。我见过太多项目因为初期Mapping设计不合理,后期不得不重建索引。下面以商品数据为例,分享几个实用技巧:
bash复制curl -X PUT "localhost:9200/products/_mapping" -H 'Content-Type: application/json' -d'
{
"properties": {
"product_id": {
"type": "keyword"
},
"product_name": {
"type": "text",
"analyzer": "product_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"price": {
"type": "scaled_float",
"scaling_factor": 100
},
"categories": {
"type": "keyword"
},
"attributes": {
"type": "nested"
},
"sales": {
"type": "integer"
},
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||epoch_millis"
}
}
}
'
这里有几个设计要点:
索引创建好后,接下来就是数据的CRUD操作。这里我分享一些实际项目中总结的高效操作技巧。
批量插入数据(Bulk API):
bash复制curl -X POST "localhost:9200/_bulk" -H 'Content-Type: application/json' -d'
{ "index" : { "_index" : "products", "_id" : "1001" } }
{ "product_id": "1001", "product_name": "无线蓝牙耳机", "price": 299.99, "categories": ["电子产品", "音频设备"], "sales": 1500 }
{ "index" : { "_index" : "products", "_id" : "1002" } }
{ "product_id": "1002", "product_name": "智能手表", "price": 899.00, "categories": ["电子产品", "智能设备"], "sales": 800 }
'
更新部分字段:
bash复制curl -X POST "localhost:9200/products/_update/1001" -H 'Content-Type: application/json' -d'
{
"doc": {
"sales": 1600
}
}
'
条件删除:
bash复制curl -X POST "localhost:9200/products/_delete_by_query" -H 'Content-Type: application/json' -d'
{
"query": {
"range": {
"sales": {
"lt": 100
}
}
}
}
'
批量查询(Multi Get):
bash复制curl -X GET "localhost:9200/_mget" -H 'Content-Type: application/json' -d'
{
"docs": [
{
"_index": "products",
"_id": "1001"
},
{
"_index": "products",
"_id": "1002"
}
]
}
'
查询是ES的核心功能,但写出高效的查询需要理解很多细节。下面通过几个典型场景说明:
基础搜索:
bash复制curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{ "match": { "product_name": "蓝牙耳机" } }
],
"filter": [
{ "range": { "price": { "gte": 200, "lte": 500 } } },
{ "term": { "categories": "电子产品" } }
]
}
},
"sort": [
{ "sales": { "order": "desc" } }
],
"from": 0,
"size": 10
}
'
聚合分析:
bash复制curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"category_stats": {
"terms": { "field": "categories" },
"aggs": {
"avg_price": { "avg": { "field": "price" } },
"total_sales": { "sum": { "field": "sales" } }
}
}
}
}
'
搜索建议:
bash复制curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
"suggest": {
"product_suggest": {
"prefix": "蓝牙",
"completion": {
"field": "product_name.suggest",
"size": 5
}
}
}
}
'
性能优化建议:
索引创建后还需要定期维护,这里分享几个实用的维护命令:
查看索引状态:
bash复制curl -X GET "localhost:9200/_cat/indices/products?v"
强制合并段文件(减少碎片):
bash复制curl -X POST "localhost:9200/products/_forcemerge?max_num_segments=1"
修改副本数(动态调整):
bash复制curl -X PUT "localhost:9200/products/_settings" -H 'Content-Type: application/json' -d'
{
"index.number_of_replicas": 2
}
'
关闭/打开索引(维护时使用):
bash复制curl -X POST "localhost:9200/products/_close"
curl -X POST "localhost:9200/products/_open"
索引别名管理(实现无缝切换):
bash复制curl -X POST "localhost:9200/_aliases" -H 'Content-Type: application/json' -d'
{
"actions": [
{ "add": { "index": "products_v1", "alias": "products" } }
]
}
'
在实际项目中,我通常会设置一个定时任务,每天凌晨执行forcemerge和缓存清理,保持索引性能稳定。同时建议使用Elasticsearch自带的监控API定期检查集群健康状态。