原创

Elasticsearch-tmdb学习

开始学习tmdb

PUT /movie
{
 "settings": {
   "number_of_shards": 1,
   "number_of_replicas": 0
 },
 "mappings": {
   "properties": {
     "title":{"type": "text","analyzer":"english"},
     "tagline":{ "type": "text","analyzer": "english"},
     "release_date":{"type": "date","format": "8yyyy/MM/dd||yyyy/M/dd||yyyy/MM/d||yyyy/M/d"},
     "popularity":{"type": "double"},
     "overview":{"type": "text","analyzer": "english"},
     "cast":{
       "type": "object",
       "properties": {
         "character":{"type":"text","analyzer":"standard"},
         "name":{"type":"text","analyzer":"standard"}
       }
     }
   }
 }
}

通过csvImporter工程导入数据

http://127.0.0.1:8091/es/import-data
http://127.0.0.1:8091/es/get?id=931

  • match模糊查询
    GET /movie/_search
    {
    "query": {
     "match": {
       "cast.name": "Neeson"
     }
    }
    }
    
  • term精确查询

    课程说的是:不进行分词分析。
    但是实测是小写后和分词完全一样才匹配,即:"cast.name": "Neeson" 搜不到,"cast.name": "neeson" 搜得到,即入参要先转小写。

    GET /movie/_search
    {
    "query": {
     "term": {
       "cast.name": "neeson"
     }
    }
    }
    
  • 分词后的 且and 和 或or 的逻辑

    match默认使用or

    GET /movie/_search
    {
    "query": {
      "match": {
        "title": "basketball with cartoom aliens"
      }
    }
    }
    

    分词

    GET /movie/_analyze
    {
    "field": "title",
    "text": "basketball with cartoom aliens"
    }
    

    改成且and

GET /movie/_search
{
  "query": {
    "match": {
      "title": {
        "query": "basketball with cartoom aliens",
        "operator": "and"
      }
    }

  }
}
  • 最小词匹配项 minimum_should_match

    默认operator = or minimum_should_match=1

    GET /movie/_search
    {
    "query": {
      "match": {
        "title": {
          "query": "Alien: Resurrection",
          "operator": "or",
          "minimum_should_match": 2 
        }
      }
    }
    }
    
  • 短语查询
不会被分词,需要整体存在,bc in abcd

GET /movie/_search
{
  "query": {
    "match_phrase": {
      "title": "Alien: "
    }
  }
}
  • 多字段查询
    GET /movie/_search
    {
    "query": {
      "multi_match": {
        "query": "basketball with cartoom aliens",
        "fields": ["title","overview"]
      }
    }
    }
    
  • 解释打分 _explanation
    GET /movie/_search
    {
    "explain": true,
    "query": {
      "match": {
        "title": "steve"
      }
    }
    }
    
    GET /movie/_search
    {
    "explain": true,
    "query": {
      "multi_match": {
        "query": "basketball with cartoom aliens",
        "fields": ["title","overview"]
      }
    }
    }
    
  • 优化多字段查询
 字段权重优先级,^10表示当前字段高10倍(boost放大系数从2.2变成22)。
 tie_breaker 乘以 0.3

GET /movie/_search
{
  "explain": true,
  "query": {
    "multi_match": {
      "query": "basketball with cartoom aliens",
      "fields": ["title^10","overview"],
      "tie_breaker": 0.3
    }
  }
}
  • bool查询
must:必须都是true
must not:必须都是false
should:其中只有一个为true即可
为true的越多则得分越高

GET /movie/_search
{
  "query": {
    "bool": {
      "should": [
           { "match": {"title": "basketball with cartoom aliens"}},
           { "match": {"overview": "basketball with cartoom aliens"}}
      ]
    }
  }
}
不同的multi_query其实是有不同的type
best_fields:默认的得分方式,取得最高的分数作为对应文档的对应分数,“最匹配模式”。等同于dis_max。

GET /movie/_search
{
  "query": {
    "multi_match": {
      "query":"basketball with cartoom aliens",
      "fields": ["title","overview"],
      "type": "best_fields"
    }
  }
}

GET /movie/_search
{
  "query": {
    "dis_max": {
      "queries": [
           { "match": {"title": "basketball with cartoom aliens"}},
           { "match": {"overview": "basketball with cartoom aliens"}}
      ]
    }
  }
}
另一种解释

GET /movie/_validate/query?explain
{
  "query": {
    "multi_match": {
      "query":"basketball with cartoom aliens",
      "fields": ["title","overview"],
      "type": "best_fields"
    }
  }
}
most_fields: 考虑绝大多数(所有的)文档的字段得分相加,获得我们想要的结果

GET /movie/_search
{
  "query": {
    "multi_match": {
      "query":"basketball with cartoom aliens",
      "fields": ["title^10","overview^0.1"],
      "type": "most_fields"
    }
  }
}
GET /movie/_validate/query?explain
{
  "query": {
    "multi_match": {
      "query":"basketball with cartoom aliens",
      "fields": ["title^10","overview^0.1"],
      "type": "most_fields"
    }
  }
}
cross_fields:以分词为单位计算栏位的总分,适用于词导向的匹配。

GET /movie/_search
{
  "query": {
    "multi_match": {
      "query":"steve job",
      "fields": ["title","overview"],
      "type": "cross_fields"
    }
  }
}
  • query string
方便的利用 AND、OR、NOT (是关键字,大写)
steve AND Jobs --> 仅仅能命中同时含有Steve和Jobs,
steve NOT Jobs --> 有steve 但是 没有Jobs

GET /movie/_search
{
  "query": {
    "query_string": {
      "fields":["title"],
      "query":"steve NOT Jobs"
    }
  }
}
  • filter过滤查询
单条件过滤 (不打分,"_score" : 0.0)

GET /movie/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "title": "steve"
        }
      }
    }
  }
}
  • 多条件过滤(AND条件)
实际开发Bullock需要转小写

GET /movie/_search
{
  "query": {
    "bool": {
      "filter": [
        {"term":{"title":"steve"}},
        {"term":{"cast.name":"bullock"}},
        {"range":{"release_date":{"lte":"2015/01/01"}}},
        {"range":{"popularity":{"gte":"12"}}}
      ]
    }
  },
  "sort": [
    {
      "popularity": {
        "order": "desc"
      }
    }
  ]
}
  • 带match打分的filter(_score为0)
should放进来会有打分。

GET /movie/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "Search"
          }
        }
      ], 
      "filter": [
        {"term":{"title":"steve"}},
        {"term":{"cast.name":"bullock"}},
        {"range":{"release_date":{"lte":"2015/01/01"}}},
        {"range":{"popularity":{"gte":"12"}}}
      ]
    }
  }
}
  • 自定义score计算
查全率查准率
functionscore

GET /movie/_search
{
  "explain": true,
  //原始查询得到的oldScore
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "steve job",
          "fields": ["title", "overview"],
          "operator": "or",
          "type": "most_fields"
       }
     },
     "functions": [
       {"field_value_factor": {
         "field": "popularity", //对应要调整处理的字段
         "modifier": "log2p",
         "factor": 10}},
          {"field_value_factor": {
         "field": "popularity",  //对应要调整处理的字段
         "modifier": "log2p",
         "factor": 5}}
     ],
     "score_mode": "sum",
     "boost_mode": "sum"
   }
 }
}
正文到此结束
本文目录