indexing - MongoDB - shard collection by hashed index on custom _id field -


Issue: How can the collection be hidden by the Handed Index on the Custom _id field?

Problem details:

  • Add me url = & gt; My_value in mogodibi
  • The URL must be unique
  • I will execute a lot of queries to check if I already have the document with this URL {_id: md5 (url_to_check) }
  • The collection will be very large (billions of yuan => billions of my_value), so I want to push it through the URL.
  • Li>
  • : My_value I do not create an index _id indexed by default

    Question:

    • I by _id The key to Hashed's share will be perfect, but do I have to make a haired shard key or can I just break through the regular _ID key? I already calculate MD5 myself.
    • _id What do you think about storage and query by non-hash URL? I will use less space (stored MD5 (url)), but index and index will be on large string (greater than 32 people in normal URL) on big text field
    • What is the problem What is the best way to solve? Ask me better questions and use it as less space for the index, because it is needed?

      I would like to cover the collection by _id Hashed's share key would be perfect, But do I have to make a haired shard key or can I just break through the regular _ID key? I already calculate MD5 myself.

      A aim is to experiment with areas that increase monotonically (such as ObjectId () values ​​or timestamps) in order to write across your shards If you have already entered your _id value (or a field that you want to terminate) to provide equal distribution of load, you can use it as your shard key Instead of to calculate this for the server To avoid the FII (FYI), MongodiBi (as 2.6) uses to compute a hashed shard, so effectively you are already doing the same work in your application code and In the case of the use of your preceding _id value, you need only one _id index (compared to two indexes) ( {_ id} : 1's default index} plus an additional head Ndeks {_ id: Hasd} ).

      What do you think about the URL in the URL and about the query by it? I will use less space (stored MD5 (URL)), but there will be big text fields and large strings on the index (more than 32 people in the normal URL)

      If the index Size is a matter of concern, then small precomposed values ​​will definitely save you space in the _id index (especially if you are saving billions of URLs and only by MD5 hash Tavejon want to find)

      The best way to solve such a problem? The best way for me to use it as a lesser space for faster inquiries and indexing, as it is needed?

      The best solution is extremely subjective, but it seems that this is the proper solution given by you in the case of your use.

      It is worth noting that there may be conflicts at any palm named place, so you may want to consider in relation to your hash algorithm namespace. Although the collision is not very likely, with the hash value as your _id , you will only store the first url seen for any hash collision (or something less efficient, such as document URL vs. Compare the original URL you were expecting).

Comments

Popular posts from this blog

ios - Adding an SKSpriteNode to SKScene from a child SKSpriteNode -

Matlab transpose a table vector -

c# - Textbox not clickable but editable -