Postgres troubles

(Regular readers of this blog, please ignore this post. We’re casting a wide net to try and find help for our problems.)

UPDATE: This problem has been resolved and all of our services are returning to their normally dubious service levels. For a technical explanation of what went wrong, see here.

Dear Postgres gurus:

We at MusicBrainz have been very happy postgres users for over a decade now and Postgres is something that gives is very few headaches compared to all the other things that we run. But last week we started having some really vexing issues with our server. Here is some back-story:

https://blog.musicbrainz.org/2015/03/14/hosting-issues-downtime-tonight/

When our load spiked, we did the normal set of things that you do:

  • Check for missing indexes, made some new ones, no change. (see details below)
  • Examined for new traffic; none of our web front end servers showed an increase in traffic.
  • Eliminated non-mission-critical uses of the DB server: stop building indexes for search, turn off lower priority sites. No change.
  • Review the performance settings of the server. Debate each setting as a team and tune. shared_buffers and work_mem tuning has made the server more resilient to recover from spikes, but still, we get massive periodic spikes.

From a restart, everything is happy and working well. Postgres will use all available ram for a while, but stay out of swap, exactly what we want it to do. But then, it tips the scales and digs into swap and everything goes to hell. We’ve studied this post for quite some time and ran queries to understand how Posgres manages its ram:

http://www.depesz.com/2012/06/09/how-much-ram-is-postgresql-using/

And sure enough ram usage just keeps increasing and once we go beyond physical ram, it goes into swap. Not rocket science. We’ve noticed that our back ends keep growing in size. According to top, once we start having processes that are 10+% of ram, we’re nearly on the cusp of entering swap. It happens predictably time and time again. Selective use of pg_terminate_backend() of these large back ends can keep us out of swap. A new, smaller backend gets created, RAM usage goes down. However, this is hardly a viable solution.

We’re now on Postgres 9.1.15, and we have a lot of downstream users who also need to upgrade when we do, so this is something that we need to coordinate months in advance. Going to 9.4 is out in the short term. 😦 Ideally we can figure out what might be going wrong so we can fix it post-haste. MusicBrainz has been barely usable for the past few days. 😦

One final thought: We have an several tables from a previous version of the DB sitting in the public schema not being used at all. We keep meaning to drop those tables, but haven’t gotten around to it yet. They tables are not being used at all, so we assume that they should not impact the performance of Postgres. Might this be a problem?

So, any tips or words of advice you have for us, would be deeply appreciated. And now for way too much information about our setup:

Postgres:

9.1.15 (from ubuntu packages)

Host:

  • Linux totoro 3.2.0-57-generic #87-Ubuntu SMP Tue Nov 12 21:35:10 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
  • 48GB ram
  • Raid 1,0 disks
  • PgBouncer in use.
  • Running postgres is its only task

postgresql.conf:

archive_command = '/bin/true'
archive_mode = 'on'
autovacuum = 'on'
checkpoint_segments = '128'
datestyle = 'iso, mdy'
default_statistics_target = '300'
default_text_search_config = 'pg_catalog.english'
data_directory = '/home/postgres/postgres9'
effective_cache_size = '30GB'
hot_standby = 'on'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
listen_addresses = '*'
log_destination = 'syslog'
log_line_prefix = '<%r %a %p>'
log_lock_waits = 'on'
log_min_duration_statement = '1000'
maintenance_work_mem = '64MB'
max_connections = '500'
max_prepared_transactions = '25'
max_wal_senders = '3'
custom_variable_classes = 'pg_stat_statements'
pg_stat_statements.max = '1000'
pg_stat_statements.save = 'off'
pg_stat_statements.track = 'top'
pg_stat_statements.track_utility = 'off'
shared_preload_libraries = 'pg_stat_statements,pg_amqp'
shared_buffers = '12GB'
silent_mode = 'on'
temp_buffers = '8MB'
track_activities = 'on'
track_counts = 'on'
wal_buffers = '16MB'
wal_keep_segments = '128'
wal_level = 'hot_standby'
wal_sync_method = 'fdatasync'
work_mem = '64MB'

pgbouncer.ini:

[databases]
musicbrainz_db_20110516 = host=127.0.0.1 dbname=musicbrainz_db_20110516

[pgbouncer]
pidfile=/home/postgres/postgres9/pgbouncer.pid
listen_addr=*
listen_port=6899
user=postgres
auth_file=/etc/pgbouncer/userlist.txt
auth_type=trust
pool_mode=session
min_pool_size=10
default_pool_size=320
reserve_pool_size=10
reserve_pool_timeout=1.0
idle_transaction_timeout=0
max_client_conn=400
log_connections=0
log_disconnections=0
stats_period=3600
stats_users=postgres
admin_users=musicbrainz_user

Monitoring:

To see these, enter these anti-spam passwords: User: “musicbrainz” passwd: “musicbrainz”

Load: http://stats.musicbrainz.org/mrtg/drraw/drraw.cgi?Mode=view;Template=1196205794.8081;Base=%2Fvar%2Fwww%2Fmrtg%2F%2Ftotoro_load.rrd

Disk IO: http://stats.musicbrainz.org/mrtg/drraw/drraw.cgi?Mode=view;Template=1196376086.1393;Base=%2Fvar%2Fwww%2Fmrtg%2F%2Ftotoro_diskstats-sda-count.rrd

RAM Use: http://stats.musicbrainz.org/mrtg/drraw/drraw.cgi?Mode=view;Template=1196204920.6439;Base=%2Fvar%2Fwww%2Fmrtg%2F%2Ftotoro_disk-physicalmemory.rrd

Swap use: http://stats.musicbrainz.org/mrtg/drraw/drraw.cgi?Mode=view;Template=1196204920.6439;Base=%2Fvar%2Fwww%2Fmrtg%2F%2Ftotoro_disk-swapspace.rrd

Processes: http://stats.musicbrainz.org/mrtg/drraw/drraw.cgi?Mode=view;Template=1196376477.1968;Base=%2Fvar%2Fwww%2Fmrtg%2F%2Ftotoro_processes.rrd

Indexes:

We ran the query from this suggestion to identify possible missing indexes:

http://stackoverflow.com/questions/3318727/postgresql-index-usage-analysis

this is our result:

https://gist.github.com/mayhem/423b084043235fb78642

Most of these tables are tiny and kept in ram. Postgres opts to not use any indexes we create on the DB, so no change.

UPDATES:

  • Five months ago we double the RAM from 24GB to 48GB, but our traffic has not increased.
  • We’ve set kernel.swapiness to 0 with no real change.
  • free -m:
             total       used       free     shared    buffers     cached
Mem:         48295      31673      16622          0          5      12670
-/+ buffers/cache:      18997      29298
Swap:        22852       2382      20470

15 thoughts on “Postgres troubles

  1. Nico

    RAM is cheap. Buy more RAM to give you time to analyse the problem? Lowering the urgency would probably help…

  2. ruaok Post author

    We doubled the RAM from 24 to 48GB about 5 months ago and our traffic hasn’t increased. That shouldn’t be the problem.

  3. pronik

    Judging from my one visit of PGConf.eu, I would ping 2ndQuadrant or EnterpriseDB for help. Or maybe ask on one of the PostgreSQL mailing lists, these guys are usually extremely competent and helpful.

  4. Rovastar

    Although I don’t have any experience in the technology in your setup I do specialise in troubleshooting in Microsoft environments.

    Things I would be looking at:

    although you talk about load being the same I would focus more on the type of traffic as that might have changed.

    Running a profiler/analyser type tool. This is so you can see live what are the long running queries and how many of them.
    In the Microsoft I know the tools and queries needed.
    It seems that PGfouine http://pgfouine.projects.pgfoundry.org/
    will do that and there sample queries and reports there should tell you the bare bones of this information and you can tweak the queries to deep dive further.
    Obviously this is most useful to have a set of average benchmarks in a normal working version but you don’t have this. It is always worth saving this results as benchmarks somewhere.

    You might see long running queries, maybe duplicate queries, maybe loads of one type. total time of queries sorted by query type, etc

    Marry this up with your Web front end requests. I presume you have a multi tiered siloed architecture and all request to the db have to be passed through the Web tier.
    try and compare this to older logs when it was working (as you might well have these unlike profiler Sql type stats)
    is the front end to the db any different in the traffic?

    I hope that helps somehow in the troubleshooting process.

  5. Rovastar

    Also make sure there are not regular maintenence tasks running out of control.
    I have seen re indexing and backup type tasks that are scheduled to run every x hours but they are not completing in that time and are rerun and not killing the old process/task.
    so reviewing all your schedule tasks like that.

  6. J-C Berthon (@jcberthon)

    Most of the monitoring information you posted are from the helicopter view which does not help providing you with an answer. You need to dig deeper to be able to understand what is going on.

    I would recommend using a tool such as sar (on Ubuntu it is in the package sysstat), see here for some quick start info: http://www.leonardoborda.com/blog/how-to-configure-sysstatsar-on-ubuntudebian/ Once you have collected information when the problem occured, use the sar command with those flags (specific for memory analysis) for RAM: -BrRH; for swap: -BSW. You can use each flag separately or all together.
    Then I would also use the vmstat command and log its output to a file.
    Finally, I would add the tools pidstat (flag -r for memory) or ps (but you would need to write a small script to iterate several times) to the list of monitoring tools to better track the process which you suspect is the culprit.

    Now if I had to shoot in the dark, I would recommend having a look at the huge page configuration of your OS. Do you use huge page, or do you use transparent huge page, or is it all deactivated? If you don’t know what huge page are, check on google as I cannot give a lecture on this topic in a comment. There are many good resources regarding this on LWN.net, and on the major DB vendors. Briefly, huge pages are a means to boost DB performances when using lots of memory (like you), pages are usually 4kB and with huge page it can be 2MB or more. This helps the OS managing more efficiently huge chunks of RAM. Transparent huge pages are a feature on Linux where the kernel tries to identify huge contiguous allocated memory and switch this area to huge page. This takes some resources and it is sometimes buggy, especially because huge pages (and transparent HP) cannot be put in the swap, or are freed with many difficulties. This sometimes can result in behaviour that looks like there is a memory leak. So, if you want also to shoot in the dark, deactivate transparent huge pages and see if things gets better. If it seems better, try to optimise your DB by activating huge page (so telling the kernel to reserve chunks of memory for huge pages, and then telling Postgres to use them). And anyway, add those tools for the memory monitoring in order to assess that this is indeed solved.

  7. PATATE12

    > Regular readers of this blog, please ignore this post. We’re casting a wide net to try and find help for our problems.

    It was very interesting for us too. We MB users now have better understanding of your current difficulties with MBS, we can morale support you now ! I’m sending you some great psycho energy ! Good luck !

  8. Florian Heigl

    Agreeing with pronik.
    Be professional and get support instead of tinkering.

  9. J-C Berthon (@jcberthon)

    @Florian I don’t think this is being helpful. Musicbrainz is more a community project than a professional service. Professional services means it cost quite a lot, and not everyone has the money to support that.
    In addition, how does one learn and get better if one call for assistance for every problem?
    It is important to learn, relying on external service means that it can take some time before they support you. If you learn, you can fix the problem yourself next time and be more responsive.

    Finally it is important to have methodology. Having methods is being professional. Calling for someone to solve your problem is not being professional. One should not try to tinker first, but take the time to analyse and understand a problem. Once you know why, the problem is solved. If you tinker in the dark, new code paths could be triggered which are temporarily hiding the problem.

  10. Scot Bontrager

    I don’t now PostgreSQL, always been a MySQL and Oracle user myself, but I’d review the logs to see if there is a corrupt table (or tables). Also, the output of dmesg should show if there is any disk thrashing due to bad/failing blocks. If a disk is starting to go, it can lead to failed writes which Postgres would want to keep retrying, tying up server processes/threads, which would consume memory, drive to swap (and more disk thrashing…)

  11. nemenoga

    Beyond PostgreSQL, we had an issue on a computing cluster which gave kind of similar symptoms. They were also not fixed with swapiness=0.
    A brutal fix was to echo 3 > /proc/sys/vm/drop_caches from crontab.
    My colleague who fought a lot with it, mumbled something about a kernel bug, possibly fixed around ~3.19. At least we do not see this behavior anymore. Sorry for being vague, but maybe it is not a problem in PostgreSQL after all.

  12. chtfn

    Reblogged this on goFLOSS! and commented:
    Hi everyone!

    I am reposting this as the lovely people at MusicBrainz are having really serious issues with their server at the moment, and they are looking for help from anyone savvy with similar software.

    Cheers and spread the info!

  13. Nico

    I wasn’t suggesting a ram upgrade as the solution, rather a way to treat the symptom. That way you buy peace of mind (not having to be on call the restart the server that often and not having to worry about impacts on users) to focus on solving the root cause.
    In other words, a temporary workaround.

  14. ruaok Post author

    Josh Berkus says: (he wasn’t able to post the comment himself)

    What you’ve posted doesn’t give us some important detail; I’ve sent you
    an email asking for it. There’s three likely possibilities here (and
    others somewhat less likely):

    1) You are experiencing a Linux kernel, filesystem or hardware issue
    which is causing your memory usage to be very inefficient. Accordingly,
    I’ve asked for your kernel version (and upgrade history).

    2) Your pgbouncer config supports up to 330 database connections and
    does not recycle them quickly. Each connection is allowed up to 128MB
    private RAM (2xwork_mem). That’s 42GB right there, if you’re maxing out
    connections and they’re all using their max allowed RAM, and the peak
    allocation wouldn’t go away until they are recycled (and pgbouncer
    wouldn’t drop them for an hour). Accordingly: what does connection
    concurrency look like, and how often does your app drop connections?

    3) Even though the majority of your database consists of large
    historical tables which you don’t *think* you’re accessing, you are
    nevertheless accessing them, and it’s blowing out your FS cache
    regularly. So, what’s the size distribution of tables, and have you
    done full query logging to see what’s being touched?

    –Josh Berkus
    PostgreSQL Experts, Inc.
    http://www.pgexperts.com

  15. joao cosme

    Could you post your new pgbouncer config file??? How many connections hits your pgbouncer??

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s