Labs Server Admin Log
From Wikitech
Projects are listed in order of most recently updated.
Also see the recent changes for nova resources (atom).
To log a message in #wikimedia-labs, use the following format: !log <project> <message>
Contents
- 1 Nova_Resource:Tools/SAL
- 1.1 2016-05-08
- 1.2 2016-05-05
- 1.3 2016-04-28
- 1.4 2016-04-24
- 1.5 2016-04-11
- 1.6 2016-04-06
- 1.7 2016-04-05
- 1.8 2016-04-04
- 1.9 2016-03-30
- 1.10 2016-03-28
- 1.11 2016-03-27
- 1.12 2016-03-18
- 1.13 2016-03-11
- 1.14 2016-03-02
- 1.15 2016-02-29
- 1.16 2016-02-28
- 1.17 2016-02-26
- 1.18 2016-02-25
- 1.19 2016-02-24
- 1.20 2016-02-22
- 1.21 2016-02-19
- 1.22 2016-02-18
- 1.23 2016-02-16
- 1.24 2016-02-12
- 1.25 2016-02-05
- 1.26 2016-02-03
- 1.27 2016-01-31
- 1.28 2016-01-30
- 1.29 2016-01-29
- 1.30 2016-01-28
- 1.31 2016-01-27
- 1.32 2016-01-26
- 1.33 2016-01-25
- 1.34 2016-01-23
- 1.35 2016-01-21
- 1.36 2016-01-12
- 1.37 2016-01-11
- 1.38 2016-01-09
- 1.39 2016-01-08
- 1.40 2015-12-30
- 1.41 2015-12-29
- 1.42 2015-12-28
- 1.43 2015-12-23
- 1.44 2015-12-22
- 1.45 2015-12-21
- 1.46 2015-12-20
- 1.47 2015-12-18
- 1.48 2015-12-16
- 1.49 2015-12-12
- 1.50 2015-12-10
- 1.51 2015-12-07
- 1.52 2015-12-06
- 1.53 2015-12-04
- 1.54 2015-12-02
- 1.55 2015-12-01
- 1.56 2015-11-25
- 1.57 2015-11-20
- 1.58 2015-11-17
- 1.59 2015-11-16
- 1.60 2015-11-03
- 1.61 2015-11-02
- 1.62 2015-10-26
- 1.63 2015-10-11
- 1.64 2015-10-09
- 1.65 2015-10-06
- 1.66 2015-10-02
- 1.67 2015-10-01
- 1.68 2015-09-30
- 1.69 2015-09-29
- 1.70 2015-09-28
- 1.71 2015-09-25
- 1.72 2015-09-24
- 1.73 2015-09-23
- 1.74 2015-09-16
- 1.75 2015-09-15
- 1.76 2015-09-14
- 1.77 2015-09-13
- 1.78 2015-09-11
- 1.79 2015-09-08
- 1.80 2015-09-07
- 1.81 2015-09-03
- 1.82 2015-09-02
- 1.83 2015-09-01
- 1.84 2015-08-31
- 1.85 2015-08-30
- 1.86 2015-08-29
- 1.87 2015-08-27
- 1.88 2015-08-26
- 1.89 2015-08-25
- 1.90 2015-08-24
- 1.91 2015-08-20
- 1.92 2015-08-19
- 1.93 2015-08-18
- 1.94 2015-08-17
- 1.95 2015-08-15
- 1.96 2015-08-14
- 1.97 2015-08-13
- 1.98 2015-08-12
- 1.99 2015-08-11
- 1.100 2015-08-04
- 1.101 2015-08-03
- 1.102 2015-08-01
- 1.103 2015-07-30
- 1.104 2015-07-29
- 1.105 2015-07-28
- 1.106 2015-07-27
- 1.107 2015-07-19
- 1.108 2015-07-11
- 1.109 2015-07-10
- 1.110 2015-07-10
- 1.111 July 6
- 1.112 July 2
- 1.113 June 29
- 1.114 June 21
- 1.115 June 19
- 1.116 June 10
- 1.117 June 8
- 1.118 June 7
- 1.119 June 5
- 1.120 June 2
- 1.121 May 29
- 1.122 May 28
- 1.123 May 27
- 1.124 May 23
- 1.125 May 22
- 1.126 May 20
- 1.127 May 19
- 1.128 May 18
- 1.129 May 15
- 1.130 May 14
- 1.131 May 10
- 1.132 May 5
- 1.133 May 4
- 1.134 May 2
- 1.135 May 1
- 1.136 April 30
- 1.137 April 29
- 1.138 April 28
- 1.139 April 25
- 1.140 April 24
- 1.141 April 23
- 1.142 April 20
- 1.143 April 18
- 1.144 April 17
- 1.145 April 16
- 1.146 April 15
- 1.147 April 14
- 1.148 April 13
- 1.149 April 12
- 1.150 April 11
- 1.151 April 10
- 1.152 April 9
- 1.153 April 8
- 1.154 April 7
- 1.155 April 5
- 1.156 April 4
- 1.157 April 3
- 1.158 April 2
- 1.159 April 1
- 1.160 March 31
- 1.161 March 30
- 1.162 March 29
- 1.163 March 28
- 1.164 March 26
- 1.165 March 25
- 1.166 March 24
- 1.167 March 23
- 1.168 March 22
- 1.169 March 21
- 1.170 March 15
- 1.171 March 13
- 1.172 March 11
- 1.173 March 9
- 1.174 March 7
- 1.175 March 6
- 1.176 March 2
- 1.177 March 1
- 1.178 February 28
- 1.179 February 27
- 1.180 February 24
- 1.181 February 16
- 1.182 February 13
- 1.183 February 1
- 1.184 January 29
- 1.185 January 27
- 1.186 January 19
- 1.187 January 16
- 1.188 January 15
- 1.189 January 11
- 1.190 January 8
- 1.191 December 23
- 1.192 December 22
- 1.193 December 19
- 1.194 December 17
- 1.195 December 12
- 1.196 December 11
- 1.197 December 8
- 1.198 December 7
- 1.199 December 2
- 1.200 November 26
- 1.201 November 25
- 1.202 November 24
- 1.203 November 22
- 1.204 November 17
- 1.205 November 16
- 1.206 November 15
- 1.207 November 14
- 1.208 November 13
- 1.209 November 12
- 1.210 November 7
- 1.211 November 6
- 1.212 November 5
- 1.213 November 4
- 1.214 November 1
- 1.215 October 30
- 1.216 October 27
- 1.217 October 26
- 1.218 October 24
- 1.219 October 23
- 1.220 October 14
- 1.221 October 11
- 1.222 October 4
- 1.223 October 2
- 1.224 September 28
- 1.225 September 25
- 1.226 September 17
- 1.227 September 15
- 1.228 September 13
- 1.229 September 12
- 1.230 September 8
- 1.231 September 5
- 1.232 September 4
- 1.233 September 2
- 1.234 August 23
- 1.235 August 21
- 1.236 August 15
- 1.237 August 14
- 1.238 August 12
- 1.239 August 2
- 1.240 August 1
- 1.241 July 24
- 1.242 July 21
- 1.243 July 18
- 1.244 July 16
- 1.245 July 15
- 1.246 July 14
- 1.247 July 13
- 1.248 July 12
- 1.249 July 11
- 1.250 July 10
- 1.251 July 9
- 1.252 July 8
- 1.253 July 6
- 1.254 July 5
- 1.255 July 4
- 1.256 July 3
- 1.257 July 2
- 1.258 July 1
- 1.259 June 30
- 1.260 June 29
- 1.261 June 28
- 1.262 June 21
- 1.263 June 20
- 1.264 June 16
- 1.265 June 15
- 1.266 June 13
- 1.267 June 10
- 1.268 June 3
- 1.269 June 2
- 1.270 May 27
- 1.271 May 25
- 1.272 May 23
- 1.273 May 22
- 1.274 May 20
- 1.275 May 16
- 1.276 May 14
- 1.277 May 13
- 1.278 May 10
- 1.279 May 9
- 1.280 May 6
- 1.281 April 28
- 1.282 April 27
- 1.283 April 24
- 1.284 April 20
- 1.285 April 13
- 1.286 April 12
- 1.287 April 11
- 1.288 April 10
- 1.289 April 8
- 1.290 April 4
- 1.291 March 30
- 1.292 March 29
- 1.293 March 28
- 1.294 March 21
- 1.295 March 20
- 1.296 March 5
- 1.297 March 4
- 1.298 March 3
- 1.299 March 1
- 1.300 February 28
- 1.301 February 27
- 1.302 February 25
- 1.303 February 23
- 1.304 February 21
- 1.305 February 20
- 1.306 February 19
- 1.307 February 18
- 1.308 February 14
- 1.309 February 13
- 1.310 February 12
- 1.311 February 11
- 1.312 February 10
- 1.313 February 9
- 1.314 February 6
- 1.315 February 4
- 1.316 January 31
- 1.317 January 30
- 1.318 January 28
- 1.319 January 25
- 1.320 January 24
- 1.321 January 23
- 1.322 January 21
- 1.323 January 20
- 1.324 January 16
- 1.325 January 15
- 1.326 January 14
- 1.327 January 10
- 1.328 January 9
- 1.329 January 8
- 1.330 January 7
- 1.331 January 6
- 1.332 January 1
- 1.333 December 27
- 1.334 December 23
- 1.335 December 21
- 1.336 December 19
- 1.337 December 17
- 1.338 December 14
- 1.339 December 4
- 1.340 December 1
- 1.341 November 25
- 1.342 November 24
- 1.343 November 14
- 1.344 November 13
- 1.345 November 3
- 1.346 November 1
- 1.347 October 23
- 1.348 October 20
- 1.349 October 15
- 1.350 October 10
- 1.351 September 23
- 1.352 September 11
- 1.353 August 24
- 1.354 August 23
- 1.355 August 22
- 1.356 August 20
- 1.357 August 19
- 1.358 August 16
- 1.359 August 15
- 1.360 August 11
- 1.361 August 10
- 1.362 August 6
- 1.363 August 5
- 1.364 August 2
- 1.365 August 1
- 1.366 July 31
- 1.367 July 30
- 1.368 July 29
- 1.369 July 25
- 1.370 July 20
- 1.371 July 19
- 1.372 July 10
- 1.373 July 5
- 1.374 July 3
- 1.375 July 2
- 1.376 July 1
- 1.377 June 30
- 1.378 June 26
- 1.379 June 25
- 1.380 June 24
- 1.381 June 19
- 1.382 June 17
- 1.383 June 16
- 1.384 June 15
- 1.385 June 14
- 1.386 June 13
- 1.387 June 11
- 1.388 June 10
- 1.389 June 9
- 1.390 June 8
- 1.391 June 7
- 1.392 June 5
- 1.393 June 4
- 1.394 June 3
- 1.395 June 2
- 1.396 June 1
- 1.397 May 31
- 1.398 May 30
- 1.399 May 29
- 1.400 May 28
- 1.401 May 27
- 1.402 May 24
- 1.403 May 23
- 1.404 May 22
- 1.405 May 21
- 1.406 May 19
- 1.407 May 14
- 1.408 May 10
- 1.409 May 9
- 1.410 May 6
- 1.411 May 4
- 1.412 May 2
- 1.413 May 1
- 1.414 April 27
- 1.415 April 26
- 1.416 April 25
- 1.417 April 24
- 1.418 April 23
- 1.419 April 19
- 1.420 April 15
- 1.421 April 11
- 2 Server Admin Log
- 3 Nova_Resource:Rcm.cac/SAL
- 4 Nova_Resource:Tools.wikibugs/SAL
- 4.1 2016-05-06
- 4.2 2016-05-05
- 4.3 2016-04-22
- 4.4 2016-04-21
- 4.5 2016-04-15
- 4.6 2016-04-11
- 4.7 2016-04-08
- 4.8 2016-04-01
- 4.9 2016-03-29
- 4.10 2016-03-28
- 4.11 2016-02-18
- 4.12 2016-02-16
- 4.13 2016-02-11
- 4.14 2016-02-08
- 4.15 2016-01-07
- 4.16 2015-12-21
- 4.17 2015-12-07
- 4.18 2015-12-03
- 4.19 2015-11-04
- 4.20 2015-10-28
- 4.21 2015-10-21
- 4.22 2015-10-07
- 4.23 2015-10-06
- 4.24 2015-09-27
- 4.25 2015-09-24
- 4.26 2015-09-23
- 4.27 2015-09-22
- 4.28 2015-09-19
- 4.29 2015-09-18
- 4.30 2015-09-16
- 4.31 2015-09-07
- 4.32 2015-09-04
- 4.33 2015-09-03
- 4.34 2015-09-02
- 4.35 2015-08-28
- 4.36 2015-08-25
- 4.37 2015-08-19
- 4.38 2015-08-04
- 4.39 2015-07-30
- 4.40 2015-07-28
- 4.41 July 2
- 4.42 June 15
- 4.43 June 10
- 4.44 June 9
- 4.45 June 5
- 4.46 May 25
- 4.47 May 19
- 4.48 May 2
- 4.49 May 1
- 4.50 April 29
- 4.51 April 24
- 4.52 April 21
- 4.53 April 20
- 4.54 April 18
- 4.55 April 13
- 4.56 April 7
- 4.57 April 1
- 4.58 March 30
- 4.59 March 23
- 4.60 March 18
- 4.61 March 16
- 4.62 March 13
- 4.63 March 11
- 4.64 March 10
- 4.65 March 9
- 4.66 March 8
- 4.67 March 3
- 4.68 February 28
- 4.69 February 24
- 4.70 February 22
- 4.71 February 20
- 4.72 February 18
- 4.73 February 17
- 4.74 February 16
- 4.75 February 13
- 4.76 February 11
- 4.77 February 7
- 4.78 February 6
- 4.79 February 5
- 4.80 February 3
- 4.81 February 2
- 4.82 January 30
- 4.83 January 28
- 4.84 January 22
- 4.85 January 19
- 4.86 January 15
- 4.87 January 14
- 4.88 January 12
- 4.89 January 11
- 4.90 January 10
- 4.91 January 9
- 4.92 January 8
- 4.93 January 7
- 4.94 January 5
- 4.95 December 31
- 4.96 December 22
- 4.97 December 18
- 4.98 December 17
- 4.99 December 16
- 4.100 December 10
- 4.101 December 9
- 4.102 December 4
- 4.103 December 1
- 4.104 November 29
- 4.105 November 25
- 4.106 November 24
- 4.107 November 18
- 4.108 October 11
- 4.109 September 24
- 4.110 August 19
- 4.111 July 1
- 4.112 May 22
- 4.113 April 30
- 4.114 April 28
- 4.115 April 27
- 5 Nova_Resource:Tools.heritage/SAL
- 6 Release Engineering/SAL
- 7 Nova_Resource:Tools.admin/SAL
- 8 Nova_Resource:Mobile/SAL
- 9 Nova_Resource:Redirects/SAL
- 10 Nova_Resource:Math/SAL
Nova_Resource:Tools/SAL
2016-05-08
- 07:06 YuviPanda: restarted admin tool
2016-05-05
- 13:11 godog: cherry-pick https://gerrit.wikimedia.org/r/#/c/280652/ on puppetmaster
2016-04-28
- 04:15 YuviPanda: delete half of the trusty webservice jobs
- 04:00 YuviPanda: deleted all precise webservice jobs, waiting for webservicemonitor to bring them back up
2016-04-24
- 12:22 YuviPanda: force deleted job 5435259 from pbbot per PeterBowman
2016-04-11
- 14:20 andrewbogott: moving tools-bastion-mtemp to labvirt1009
2016-04-06
- 15:20 bd808: Removed local hack for T131906 from tools-puppetmaster-01
2016-04-05
- 21:24 bd808: Committed local hack on tools-puppetmaster-01 to get elasticsearch working again
- 21:02 bd808: Forcing puppet runs to fix elasticsearch
- 20:39 bd808: Elasticsearch processes down. Looks like a prod puppet change that needs tweaking for tool labs
2016-04-04
- 19:43 YuviPanda: new bastion!
- 19:15 chasemp: reboot tools-bastion-05
2016-03-30
- 15:50 andrewbogott: rebooting tools-proxy-01 in hopes of clearing some bad caches
2016-03-28
- 20:51 yuvipanda: lifted RAM quota from 900Gigs to 1TB?!
- 20:30 chasemp: change perm grant files from create-dbusers for chmod 400 chat chattr +i
2016-03-27
- 17:40 scfc_de: tools-webgrid-generic-1405, tools-webgrid-lighttpd-1411, tools-web-static-01, tools-web-static-02: "apt-get install cloud-init" and accepted changes for /etc/cloud/cloud.cfg (users: + default; cloud_config_modules: + ssh-import-id, + puppet, + chef, + salt-minion; system_info/package_mirrors/arches[i386, amd64]/search/primary: + http://%(region)s.clouds.archive.ubuntu.com/ubuntu/).
2016-03-18
- 15:47 chasemp: had to kill stalkboten as it was logging constant errors filling logs to the tune of hundreds of gigs
- 15:36 chasemp: cleanup huge log collection for broken bot: /srv/project/tools/project/betacommand-dev/tspywiki/irc/logs# rm -fR SpamBotLog.log\.*
2016-03-11
- 20:57 mutante: reverted font changes - puppet runs recovering
- 20:37 mutante: more puppet issues due to font dependencies on trusty, on it
- 19:39 mutante: should a tools-exec server be influenced by font packages on an mw appserver?
- 19:39 mutante: fixed puppet runs on tools-exec (gerrit 276792)
2016-03-02
- 14:56 chasemp: qdel 3956069 and 3758653 for abusing auth
2016-02-29
- 21:49 scfc_de: tools-exec-1218: rm -f /usr/local/lib/nagios/plugins/check_eth to work around "Got passed new contents for sum" (https://tickets.puppetlabs.com/browse/PUP-1334).
- 21:20 scfc_de: tools-exec-1209: rm -f /var/lib/puppet/state/agent_catalog_run.lock (no Puppet process running, probably from the reboots).
- 20:58 scfc_de: Ran "dpkg --configure -a" on all instances.
- 13:50 scfc_de: Deployed jobutils/misctools 1.10.
2016-02-28
- 20:08 bd808: Removed unwanted NFS mounts from tools-elastic-01.tools.eqiad.wmflabs
2016-02-26
- 19:08 bd808: Upgraded Elasticsearch on tools-elastic-0[123] to 1.7.5
2016-02-25
- 21:43 scfc_de: Deployed jobutils/misctools 1.9.
2016-02-24
- 19:46 chasemp: runonce deployed for https://gerrit.wikimedia.org/r/#/c/272891/
2016-02-22
- 15:55 andrewbogott: redirecting tools-login.wmflabs.org to tools-bastion-05
2016-02-19
- 15:58 chasemp: rerollout tools nfs shaping pilot for sanity in anticipation of formalization
- 09:21 _joe_: killed cluebot3 instance on tools-exec-1207, writing 20 M/s to the error log
- 00:50 yuvipanda: failover services to services-02
2016-02-18
- 20:37 yuvipanda: failover proxy back to tools-proxy-01
- 19:46 chasemp: repool labvirt1003 and depool labvirt1004
- 18:19 chasemp: draining nodes from labvirt1001
2016-02-16
- 21:33 chasemp: reboot of bastion-1002
2016-02-12
- 19:56 chasemp: nfs traffic shaping pilot round 2
2016-02-05
- 22:01 chasemp: throttle some vm nfs write speeds
- 16:49 scfc_de: find /data/project/wikidata-edits -group ssh-key-ldap-lookup -exec chgrp tools.wikidata-edits \{\} + (probably a remnant of the work on ssh-key-ldap-lookup last summer).
- 16:45 scfc_de: Removed /data/project/test300 (uid/gid 52080; none of them resolves, no databases, just an unmodified pywikipedia clone inside).
2016-02-03
- 03:00 YuviPanda: upgraded flannel on all hosts running it
2016-01-31
- 20:01 scfc_de: tools-webgrid-generic-1405: Rebooted via wikitech; rebooting via "shutdown -r now" did not seem to work.
- 18:51 bd808: tools-elastic-01.tools.eqiad.wmflabs console shows blocked tasks, possible kernel bug?
- 18:49 bd808: tools-elastic-01.tools.eqiad.wmflabs not responsive to ssh or Elasticsearch requests; rebooting via wikitech interface
- 13:32 hashar: restarted qamorebot
2016-01-30
- 06:38 scfc_de: tools-webgrid-generic-1405: Rebooted for load ~ 175 and lots of processes stuck in D.
2016-01-29
- 21:25 YuviPanda: restarted image-resize-calc manually, no service.manifest file
2016-01-28
- 15:02 scfc_de: tools-cron-01: Rebooted via wikitech as "shutdown -r now" => "@sbin/plymouthd --mode=shutdown" => "/bin/sh -e /proc/self/fd/9" => "/bin/sh /etc/init.d/rc 6" => "/bin/sh /etc/rc6.d/S20sendsigs stop" => "sync" stuck in D. *argl*
- 14:56 scfc_de: tools-cron-01: Rebooted due to high number of processes stuck in D and load >> 100.
- 14:54 scfc_de: tools-cron-01: HUPped 43 processes wikitrends/refresh.sh, though a lot of all processes seem to be stuck in D, so I'll reboot this instance.
- 14:50 scfc_de: tools-cron-01: HUPped 85 processes /usr/lib/php5/sessionclean.
2016-01-27
- 23:07 YuviPanda: removed all members of templatetiger, added self instead, removed active shell sessions
- 20:24 chasemp: master stop, truncate accounting log to accounting.01272016, master start
- 19:34 chasemp: master start grid master
- 19:23 chasemp: stopped master
- 19:11 YuviPanda: depooled tools-webgrid-1405 to prep for restart, lots of stuck processes
- 18:29 valhallasw`cloud: job 2551539 is ifttt, which is also running as 2700629. Killing 2551539 .
- 18:26 valhallasw`cloud: messages repeatedly reports "01/27/2016 18:26:17|worker|tools-grid-master|E|[email protected] reports running job (2551539.1/master) in queue "[email protected]" that was not supposed to be there - killing". SSH'ing there to investigate
- 18:24 valhallasw`cloud: 'sleep' test job also seems to work without issues
- 18:23 valhallasw`cloud: no errors in log file, qstat works
- 18:23 chasemp: master sge restarted post dump and restart for jobs db
- 18:22 valhallasw`cloud: messages file reports 'Wed Jan 27 18:21:39 UTC 2016 db_load_sge_maint_pre_jobs_dump_01272016'
- 18:20 chasemp: master db_load -f /root/sge_maint_pre_jobs_dump_01272016 sge_job
- 18:19 valhallasw`cloud: dumped jobs database to /root/sge_maint_pre_jobs_dump_01272016, 4.6M
- 18:17 valhallasw`cloud: SGE Configuration successfully saved to /root/sge_maint_01272016 directory.
- 18:14 chasemp: grid master stopped
- 00:56 scfc_de: Deployed admin/www bde15df..12a3586.
2016-01-26
- 21:28 YuviPanda: qstat -u '*' | grep E | awk '{print $1}' | xargs -L1 qmod -cj
- 21:16 chasemp: reboot tools-exec-1217.tools.eqiad.wmflabs
2016-01-25
- 20:30 YuviPanda: switched over cron host to tools-cron-01, manually copied all old cron files from tools-submit to tools-cron-01
- 19:06 chasemp: kill python merge/merge-unique.py tools-exec-1213 as it seemed to be overwhelming nfs
- 17:07 scfc_de: Deployed admin/www at bde15df2a379c33edfb8350afd2f0c7186705a93.
2016-01-23
- 15:49 scfc_de: Removed remnant send_puppet_failure_emails cron entries except from unreachable hosts sacrificial-kitten, tools-worker-06 and tools-worker-1003.
2016-01-21
- 22:24 YuviPanda: deleted tools-redis-01 and -02 (are on 1001 and 1002 now)
- 21:13 YuviPanda: repooled exec nodes on labvirt1010
- 21:08 YuviPanda: gridengine-master started, verified shadow hasn't started
- 21:00 YuviPanda: stop gridengine master
- 20:51 YuviPanda: repooled exec nodes on labvirt1007 was last message
- 20:51 YuviPanda: repooled exec nodes on labvirt1006
- 20:39 YuviPanda: failover tools-static too tools-web-static-01
- 20:38 YuviPanda: failover tools-checker to tools-checker-01
- 20:32 YuviPanda: depooled exec nodes on 1007
- 20:32 YuviPanda: repooled exec nodes on 1006
- 20:14 YuviPanda: depooled all exec nodes in labvirt1006
- 20:11 YuviPanda: repooled exec node son 1005
- 19:53 YuviPanda: depooled exec nodes on labvirt1005
- 19:49 YuviPanda: repooled exec nodes from labvirt1004
- 19:48 YuviPanda: failed over proxy to tools-proxy-01 again
- 19:31 YuviPanda: depooled exec nodes from labvirt1004
- 19:29 YuviPanda: repooled exec nodes from labvirt1003
- 19:13 YuviPanda: depooled instances on labvirt1003
- 19:06 YuviPanda: re-enabled queues on exec nodes that were on labvirt1002
- 19:02 YuviPanda: failed over tools proxy to tools-proxy-02
- 18:46 YuviPanda: drained and disabled queues on all nodes on labvirt1002
- 18:38 YuviPanda: restarted all restartable jobs in instances on labvirt1001 and deleted all non-restartable ghost jobs. these were already dead
2016-01-12
- 09:48 scfc_de: tools-checker-01: Removed exim paniclog (OOM).
2016-01-11
- 22:19 valhallasw`cloud: reset maxujobs 0->128, job_load_adjustments none->np_load_avg=0.50, load_ad... -> 0:7:30
- 22:12 YuviPanda: restarted gridengine master again
- 22:07 valhallasw`cloud: set job_load_adjustments from np_load_avg=0.50 to none and load_adjustment_decay_time to 0:0:0
- 22:05 valhallasw`cloud: set maxujobs back to 0, but doesn't help
- 21:57 valhallasw`cloud: reset to 7:30
- 21:57 valhallasw`cloud: that cleared the measure, but jobs still not starting. Ugh!
- 21:56 valhallasw`cloud: set job_load_adjustments_decay_time = 0:0:0
- 21:45 YuviPanda: restarted gridengine master
- 21:43 valhallasw`cloud: qstat -j <jobid> shows all queues overloaded; seems to have started just after a load test for the new maxujobs setting
- 21:42 valhallasw`cloud: resetting to 0:7:30, as it's not having the intended effect
- 21:41 valhallasw`cloud: currently 353 jobs in qw state
- 21:40 valhallasw`cloud: that's load_adjustment_decay_time
- 21:40 valhallasw`cloud: temporarily sudo qconf -msconf to 0:0:1
- 19:59 YuviPanda: Set maxujobs (max concurrent jobs per user) on gridengine to 128
- 17:51 YuviPanda: kill all queries running on labsdb1003
- 17:20 YuviPanda: stopped webservice for quentinv57-tools
2016-01-09
- 21:07 valhallasw`cloud: moved tools-checker/208.80.155.229 back to tools-checker-01
- 21:02 andrewbogott: rebooting tools-checker-01 as it is unresponsive.
- 13:12 valhallasw`cloud: tools-worker-1002. is unresponsive. Maybe that's where the other grrrit-wm is hiding? Rebooting.
2016-01-08
- 19:46 chasemp: couldn't get into tools-mail-01 at all and it seemed borked so I rebooted
- 17:23 andrewbogott: killing tools.icelab as per https://wikitech.wikimedia.org/wiki/User_talk:Torin#Running_queries_on_tools-dev_.28tools-bastion-02.29
2015-12-30
- 04:06 YuviPanda: delete all webgrid jobs to start with a clean slate
- 03:54 YuviPanda: qmod -rj all tools in the continuous queue, they are all orphaned
- 02:39 YuviPanda: remove lbenedix and ebekebe from tools.hcclab
- 00:40 YuviPanda: restarted master on grid-master
- 00:40 YuviPanda: copied and cleaned out spooldb
- 00:10 YuviPanda: reboot tools-grid-shadow
- 00:08 YuviPanda: attempt to stop shadowd
- 00:03 YuviPanda: attempting to start gridengine-master on tools-grid-shadow
- 00:00 YuviPanda: kill -9'd gridengine master
2015-12-29
- 23:31 YuviPanda: rebooting tools-grid-master
- 23:22 YuviPanda: restart gridengine-master on tools-grid-master
- 00:18 YuviPanda: shut down redis on tools-redis-01
2015-12-28
- 22:34 chasemp: attempt to unmount nfs volumes on tools-redis-01 to debug but it hands (I am on console and see root at console hang on login)
- 22:31 YuviPanda: disable NFS on tools-redis-1001 and 1002
- 21:32 YuviPanda: disable puppet on tools-redis-01 and -02
- 21:27 YuviPanda: created tools-redis-1001
2015-12-23
- 21:21 YuviPanda: deleted tools-worker-01 to -05, creating tools-worker-1001 to 1005
- 21:19 valhallasw`cloud: tools-proxy-01: umount /home /data/project /data/scratch /public/dumps
- 19:01 valhallasw`cloud: ah, connections that are kept open. A new incognito window is routed correctly.
- 18:59 valhallasw`cloud: switched to -02, worked correctly, switched back. Switching back does not seem to fully work?!
- 18:40 valhallasw`cloud: scratch that, first going to eat dinner
- 18:38 valhallasw`cloud: dynamicproxy ban system deployed on tools-proxy-02 working correctly for localhost; switching over users there by moving the external IP.
- 14:42 valhallasw`cloud: toollabs homepage is unhappy because tools.xtools-articleinfo is using a lot of cpu on tools-webgrid-lighttpd-1409. Checking to see what's happening there.
- 10:46 YuviPanda: migrate tools-worker-01 to 3.19 kernel
2015-12-22
- 18:30 YuviPanda: rescheduling all webservices
- 18:17 YuviPanda: failed over active proxy to proxy-01
- 18:12 YuviPanda: upgraded kernel and rebooted tools-proxy-01
- 01:42 YuviPanda: rebooting tools-worker-08
2015-12-21
- 18:44 YuviPanda: reboot tools-proxy-01
- 18:31 YuviPanda: failover proxy to tools-proxy-02
2015-12-20
- 00:00 YuviPanda: tools-worker-08 stuck again :|
2015-12-18
- 15:16 andrewbogott: rebooting locked up host tools-exec-1409
2015-12-16
- 23:14 andrewbogott: rebooting tools-exec-1407, unresponsive
- 22:48 YuviPanda: run qmod -c '*' to clear error state on gridengine
- 21:28 andrewbogott: deleted tools-docker-registry-01
- 16:24 andrewbogott: rebooting tools-exec-1221 as it was in kernel lockup
2015-12-12
- 10:08 YuviPanda: restarted cron on tools-submit
2015-12-10
- 12:47 valhallasw`cloud: broke tools-proxy-02 login (for valhallasw, root still works) by restarting nslcd. Restarting; current proxy is -01.
2015-12-07
- 13:46 Coren: The new grid masters are happy, killing the old ones (-shadow, -master)
- 10:46 YuviPanda: restarted nscd on tools-proxy-01
2015-12-06
- 10:29 YuviPanda: did webservice start on tool 'derivative', was missing service.manifest
2015-12-04
- 19:33 Coren: switching master role to tools-grid-master
- 04:42 yuvipanda: disabled puppet on tools-puppetmaster-01 because everything sucks
- 04:09 bd808: Cherry-picked https://gerrit.wikimedia.org/r/#/c/256618 to tools-puppetmaster-01
2015-12-02
- 18:29 Coren: switching gridmaster activity to tools-grid-shadow
- 05:13 yuvipanda: increased security groups quota to 50 because why not
2015-12-01
- 21:07 yuvipanda: added bd808 as admin
- 21:01 andrewbogott: deleted tool/service group tools.test300
2015-11-25
- 15:42 Coren: migrating tools-web-static-02 to labvirt1010 to free space on labvirt1002
2015-11-20
- 22:02 Coren: tools-webgrid-lighttpd-1412 tools-webgrid-lighttpd-1413 tools-webgrid-lighttpd-1414 tools-webgrid-lighttpd-1415 done and back in rotation.
- 21:46 Coren: tools-webgrid-lighttpd-1411 tools-webgrid-lighttpd-1211 done and back in rotation.
- 21:30 Coren: tools-webgrid-lighttpd-1410 tools-webgrid-lighttpd-1210 done and back in rotation.
- 21:25 Coren: tools-webgrid-lighttpd-1409 tools-webgrid-lighttpd-1209 done and back in rotation.
- 21:13 Coren: tools-webgrid-lighttpd-1408 tools-webgrid-lighttpd-1208 done and back in rotation.
- 20:58 Coren: tools-webgrid-lighttpd-1407 tools-webgrid-lighttpd-1207 done and back in rotation.
- 20:53 Coren: tools-webgrid-lighttpd-1406 tools-webgrid-lighttpd-1206 done and back in rotation.
- 20:41 Coren: tools-webgrid-lighttpd-1405 tools-webgrid-lighttpd-1205 tools-webgrid-generic-1405 done and back in rotation.
- 20:28 Coren: tools-webgrid-lighttpd-1404 tools-webgrid-lighttpd-1204 tools-webgrid-generic-1404 done and back in rotation.
- 19:49 Coren: done, and putting back in rotation: tools-webgrid-lighttpd-1403 tools-webgrid-lighttpd-1203 tools-webgrid-generic-1403
- 19:25 Coren: -lighttpd-1403 wants a restart.
- 19:15 Coren: done, and putting back in rotation: tools-webgrid-lighttpd-1402 tools-webgrid-lighttpd-1202 tools-webgrid-generic-1402
- 18:55 Coren: Putting -lighttpd-1401 -lighttpd-1201 -generic-1401 back in rotation, disabling the others.
- 18:24 Coren: Beginning draining web nodes; -lighttpd-1401 -lighttpd-1201 -generic-1401
- 18:10 Coren: disabling puppet on the grid nodes listed at https://phabricator.wikimedia.org/P2337 so that the /tmp change in https://gerrit.wikimedia.org/r/#/c/252506/ do not apply early and break services
2015-11-17
- 19:39 YuviPanda: created tools-worker-03 to be k8s worker node
- 19:34 YuviPanda: blanked 'realm' for tools-bastion-01 to figure out what happens
2015-11-16
- 20:44 PlasmaFury: switch over the proxy to tools-proxy-01
- 17:38 PlasmaFury: deleted tools-webgrid-lighttpd-1412 for https://phabricator.wikimedia.org/T118654
2015-11-03
- 03:59 scfc_de: tools-submit, tools-webgrid-lighttpd-1409, tools-webgrid-lighttpd-1411: Removed exim paniclog (OOM).
2015-11-02
- 22:57 YuviPanda: pooled tools-webgrid-lighttpd-1413
- 22:10 YuviPanda: created tools-webgrid-lighttpd-1414 and 1415
- 22:04 YuviPanda: created tools-webgrid-lighttpd-1412 and 1413
- 19:53 YuviPanda: drained continuous jobs and disabled queues on tools-exec-1203 and tools-exec-1402
- 19:50 YuviPanda: drain webgrid-lighttpd-1408 of jobs
2015-10-26
- 20:53 YuviPanda: updated 6.9 ssh backport to all trusty hosts
2015-10-11
- 22:54 yuvipanda: delete service.manifest for tool wikiviz to prevent it from attempting to be started. It set itself up for nodejs but didn't actually have any code
2015-10-09
- 22:47 yuvipanda: kill NFS on tools-puppetmaster-01 with https://wikitech.wikimedia.org/wiki/Hiera:Tools/host/tools-puppetmaster-01
- 14:37 Coren: Beginning rotation of execution nodes to apply fix for T106170
2015-10-06
- 04:35 yuvipanda: created tools-puppetmaster-02 as hot spare
2015-10-02
- 17:30 scfc_de: tools-webgrid-lighttpd-1402: Removed exim paniclog (OOM).
2015-10-01
- 23:38 yuvipanda: actually rebooting tools-worker-02, had actually rebooted-01 earlier #facepalm
- 23:20 yuvipanda: rebooting tools-worker-02 to pickup new kernel
- 23:10 yuvipanda: failed over tools-proxy-01 to -02, restarting -01 to pick up new kernel
- 22:58 yuvipanda: rebooted tools-proxy-02 to pick up new kernel
2015-09-30
- 07:12 yuvipanda: deleted tools-webproxy-01 and -02, running on proxy-01 and -02 now
- 06:40 yuvipanda: migrated webproxy to tools-proxy-01
2015-09-29
- 12:08 scfc_de: tools-bastion-01: Removed exim paniclog (OOM).
2015-09-28
- 15:24 Coren: rebooting tools-shadow after mount option changes.
2015-09-25
- 16:02 scfc_de: tools-webgrid-lighttpd-1403: Removed exim paniclog (OOM).
2015-09-24
- 14:06 scfc_de: tools-exec-1201: Restarted grid engine exec for T109485.
- 13:56 scfc_de: tools-master: Restarted grid engine master for T109485.
2015-09-23
- 18:22 valhallasw`cloud: here = https://etherpad.wikimedia.org/p/74j8K2zIob
- 18:22 valhallasw`cloud: experimenting with https://github.com/jordansissel/fpm on tools-packages, and manually installing packages for that. Noting them here.
2015-09-16
- 17:33 scfc_de: Removed python-tools-webservice from precise-tools as apparently old version of tools-webservice.
- 01:17 YuviPanda: attempting to move grrrit-wm to kubernetes
- 01:17 YuviPanda: attempting to move to kubernetes
2015-09-15
- 01:18 scfc_de: Added unixodbc_2.2.14p2-5_amd64.deb back to precise-tools to diagnose if it is related to T111760.
2015-09-14
- 23:47 scfc_de: Archived unixodbc_2.2.14p2-5_amd64 from deb-precise and aptly, no reference in Puppet or Phabricator and same version as distribution.
2015-09-13
- 20:53 scfc_de: Archived lua-json_1.3.2-1 from labsdebrepo and aptly, upgraded manually to Trusty's new 1.3.1-1ubuntu0.1~ubuntu14.04.1, restarted nginx on tools-webproxy-01 and tools-webproxy-02, checked that proxy and localhost:8081/list works.
- 20:42 scfc_de: rm -f /etc/apt/apt.conf.d/20auto-upgrades.ucf-dist on all hosts (cf. T110055).
2015-09-11
- 14:54 scfc_de: tools-webgrid-lighttpd-1403: Removed exim paniclog (OOM).
2015-09-08
- 08:05 valhallasw`cloud: Publish for local repo ./trusty-tools [all, amd64] publishes {main: [trusty-tools]} has been successfully updated.
Publish for local repo ./precise-tools [all, amd64] publishes {main: [precise-tools]} has been successfully updated. - 08:04 valhallasw`cloud: added all packages in data/project/.system/deb-precise to aptly repo precise-tools
- 08:03 valhallasw`cloud: added all packages in data/project/.system/deb-trusty to aptly repo trusty-tools
2015-09-07
- 18:49 valhallasw`cloud: ran sudo mount -o remount /data/project on tools-static-01, which also solved the issue, so skipping the reboot
- 18:47 valhallasw`cloud: switched static webserver to tools-static-02
- 18:45 valhallasw`cloud: weird NFS issue on tools-web-static-01. Switching over to -02 before rebooting.
- 17:57 YuviPanda: created tools-k8s-master-01 with jessie, will be etcd and kubernetes master
2015-09-03
- 07:09 valhallasw`cloud: and just re-running puppet solves the issue. Sigh.
- 07:09 valhallasw`cloud: last message in puppet.log.1.gz is Error: /Stage[main]/Toollabs::Exec_environ/Package[fonts-ipafont-gothic]/ensure: change from 00303-5 to latest failed: Could not get latest version: Execution of '/usr/bin/apt-cache policy fonts-ipafont-gothic' returned 100: fonts-ipafont-gothic: (...) E: Cache is out of sync, can't x-ref a package file
- 07:07 valhallasw`cloud: err, is empty.
- 07:07 valhallasw`cloud: uppet failure on tools-exec-1215 is CRITICAL 66.67% of data above the critical threshold -- but /var/log/puppet.log doesn't exist?!
2015-09-02
- 15:01 scfc_de: Added -M option to qsub call for crontab of tools.sdbot.
- 13:58 valhallasw`cloud: rebooting tools-exec-1403; https://phabricator.wikimedia.org/T107052 happening, also causing significant NFS server load
- 13:55 valhallasw`cloud: restarted gridengine_exec on tools-exec-1403
- 13:53 valhallasw`cloud: tools-exec-1403 does lots of locking opreations. Only job there was jid 1072678 = /data/project/hat-collector/irc-bots/snitch.py . Rescheduled that job.
- 13:16 YuviPanda: deleted all jobs of ralgisbot
- 13:12 YuviPanda: suspended all jobs in ralgisbot temporarily
- 12:57 YuviPanda: rescheduled all jobs of ralgisbot, was suffering from stale NFS file handles
2015-09-01
- 21:01 valhallasw`cloud: killed one of the grrrit-wm jobs; for some reason two of them were running?! Not sure what SGE is up to lately.
- 16:12 scfc_de: tools-bastion-01: Killed bot of tools.cobain.
- 15:47 valhallasw`cloud: git reset --hard cdnjs on tools-web-static-01
- 06:23 valhallasw`cloud: seems to have worked. SGE :(
- 06:17 valhallasw`cloud: going to restart sge_qmaster, hoping this solves the issue :/
- 06:08 valhallasw`cloud: e.g. "queue instance "[email protected]" dropped because it is overloaded: np_load_avg=1.820000 (= 0.070000 + 0.50 * 14.000000 with nproc=4) >= 1.75" but the actual load is only 0.3?!
- 06:06 valhallasw`cloud: test job does not get submitted because all queues are overloaded?!
- 06:06 valhallasw`cloud: investigating SGE issues reported on irc/email
2015-08-31
- 23:20 scfc_de: Changed host name tools-webgrid-generic-1405 in "qconf -mq webgrid-generic" to fix the "au" state of the queue on that host.
- 21:21 valhallasw`cloud: webservice: error: argument server: invalid choice: 'generic' (choose from 'lighttpd', 'tomcat', 'uwsgi-python', 'nodejs', 'uwsgi-plain') (for tools.javatest)
- 21:20 valhallasw`cloud: restarted webservicemonitor
- 21:19 valhallasw`cloud: seems to have some errors in restarting: subprocess.CalledProcessError: Command '['/usr/bin/sudo', '-i', '-u', 'tools.javatest', '/usr/local/bin/webservice', '--release', 'trusty', 'generic', 'restart']' returned non-zero exit status 2
- 21:18 valhallasw`cloud: running puppet agent -tv on tools-services-02 to make sure webservicemonitor is running
- 21:15 valhallasw`cloud: several webservices seem to actually have not gotten back online?! what on earth is going on.
- 21:10 valhallasw`cloud: some jobs still died (including tools.admin). I'm assuming service.manifest will make sure they start again
- 20:29 valhallasw`cloud: |sort is not so spread out in terms of affected hosts because a lot of jobs were started on lighttpd-1409 and -1410 around the same time.
- 20:25 valhallasw`cloud: ca 500 jobs @ 5s/job = approx 40 minutes
- 20:23 valhallasw`cloud: doh. accidentally used the wrong file, causing restarts for another few uwsgi hosts. Three more jobs dead *sigh*
- 20:21 valhallasw`cloud: now doing more rescheduling, with 5 sec intervals, on a sorted list to spread load between queues
- 19:36 valhallasw`cloud: last restarted job is 1423661, rest of them are still in /home/valhallaw/webgrid_jobs
- 19:35 valhallasw`cloud: one per second still seems to make SGE unhappy; there's a whole set of jobs dying, mostly uwsgi?
- 19:31 valhallasw`cloud: https://phabricator.wikimedia.org/T110861 : rescheduling 521 webgrid jobs, at a rate of one per second, while watching the accounting log for issues
- 07:31 valhallasw`cloud: removed paniclog on tools-submit; probably related to the NFS outage yesterday (although I'm not sure why that would give OOMs)
2015-08-30
- 13:23 valhallasw`cloud: killed wikibugs-backup and grrrit-wm on tools-webproxy-01
- 13:20 valhallasw`cloud: disabling 503 error page
2015-08-29
- 04:09 scfc_de: Disabled queue [email protected] (qmod -d) because I can't ssh to it and jobs deployed there fail with "failed assumedly before job:can't get password entry for user".
2015-08-27
- 15:00 valhallasw`cloud: killed multiple kmlexport processes on tools-webgrid-lighttpd-1401 again
2015-08-26
- 01:10 scfc_de: Felt lucky: kill -STOP bigbrother on tools-submit, installed I00cd7a90273e0d745699855eb671710afb4e85a7 on tools-services-02 and service bigbrothermonitor start. If it goes berserk, please service bigbrothermonitor stop.
2015-08-25
- 20:23 scfc_de: tools-webgrid-generic-1405: killall mpt-statusd.
- 14:58 YuviPanda: pooled in two new instances for the precise exec pool
- 14:45 YuviPanda: reboot tools-exec-1221
- 14:26 YuviPanda: rebooting tools-exec-1220 because NFS wedge...
- 14:18 YuviPanda: pooled in tools-webgrid-generic-1405
- 10:16 YuviPanda: created tools-webgrid-generic-1405
- 10:04 YuviPanda: apply exec node puppet roles to tools-exec-1220 and -1221
- 09:59 YuviPanda: created tools-exec-1220 and -1221
2015-08-24
- 16:37 valhallasw`cloud: more processes were started, so added a talk page message on User:Coet (who was starting the processes according to /var/log/auth.log) and using 'write coet' on tools-bastion-01
- 16:15 valhallasw`cloud: kill -9'ing because normal killing doesn't work
- 16:13 valhallasw`cloud: killing all processes of tools.cobain which are flooding tools-bastion-01
2015-08-20
- 18:44 valhallasw`cloud: both are now at 3dbbc87
- 18:43 valhallasw`cloud: running git reset --hard origin/master on both checkouts. Old HEAD is 86ec36677bea85c28f9a796f7e57f93b1b928fa7 (-01) / c4abeabd3acf614285a40e36538f50655e53b47d (-02).
- 18:42 valhallasw`cloud: tools-web-static-01 has the same issue, but with different commit ids (because different hostname). No local changes on static-01. The initial merge commit on -01 is 57994c, merging 1e392ab and fc918b8; on -02 it's 511617f, merging a90818c and fc918b8.
- 18:39 valhallasw`cloud: cdnjs on tools-web-static-02 can't pull because it has a dirty working tree, and there's a bunch of weird merge commits. Old commit is c4abeabd3acf614285a40e36538f50655e53b47d, the dirty working tree is changes from http to https in various files
- 17:06 valhallasw`cloud: wait, what timezone is this?!
2015-08-19
- 10:45 valhallasw`cloud: ran `for i in $(qstat -f -xml | grep "<state>au" -B 6 | grep "<name>" | cut -d'@' -f2 | cut -d. -f1); do echo $i; ssh $i sudo service gridengine-exec start; done`; this fixed queues on tools-exec-1404 tools-exec-1409 tools-exec-1410 tools-webgrid-lighttpd-1406
2015-08-18
- 15:53 scfc_de: Added valhallasw as grid manager (qconf -am valhallasw).
- 14:42 scfc_de: tools-webgrid-lighttpd-1411: Killed mpt-statusd (T104779).
- 13:57 valhallasw`cloud: same issue seems to happen with the other hosts: tools-exec-1401.tools.eqiad.wmflabs vs tools-exec-1401.eqiad.wmflabs and tools-exec-catscan.tools.eqiad.wmflabs vs tools-exec-catscan.eqiad.wmflabs.
- 13:55 valhallasw`cloud: no, wait, that's tools-webgrid-lighttpd-1411.eqiad.wmflabs, not the actual host tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs. We should fix that dns mess as well.
- 13:54 valhallasw`cloud: tried to restart gridengine-exec on tools-exec-1401, no effect. tools-webgrid-lighttpd-1411 also just went into 'au' state.
- 13:47 valhallasw`cloud: that brought tools-exec-1403, tools-exec-1406 and tools-webgrid-generic-1402 back up, tools-exec-1401 and tools-exec-catscan are still in 'au' state
- 13:46 valhallasw`cloud: starting gridengine-exec on hosts with queues in 'au' (=alarm, unknown) state using
for i in $(qstat -f -xml | grep "<state>au" -B 6 | grep "<name>" | cut -d'@' -f2 | cut -d. -f1); do echo $i; ssh $i sudo service gridengine-exec start; done - 08:37 valhallasw`cloud: sudo service gridengine-exec start on tools-webgrid-lighttpd-1404.eqiad.wmflabs" tools-webgrid-lighttpd-1406.eqiad.wmflabs" tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs"
- 08:33 valhallasw`cloud: tools-webgrid-lighttpd-1403.eqiad.wmflabs, tools-webgrid-lighttpd-1404.eqiad.wmflabs and tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs are all broken (queue dropped because it is temporarily not available)
- 08:30 valhallasw`cloud: hostname mismatch: host is called tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs in config, but it was named tools-webgrid-lighttpd-1411.eqiad.wmflabs in the hostgroup config
- 08:21 valhallasw`cloud: still sudo qmod -e "*@tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs" -> invalid queue "*@tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs"
- 08:20 valhallasw`cloud: sudo qconf -mhgrp "@webgrid", added tools-webgrid-lighttpd-1411.eqiad.wmflabs
- 08:14 valhallasw`cloud: and the hostgroup @webgrid doesn't even exist? (╯°□°)╯︵ ┻━┻
- 08:10 valhallasw`cloud: /var/lib/gridengine/etc/queues/webgrid-lighttpd does not seem to be the correct configuration as the current config refers to '@webgrid' as host list.
- 08:07 valhallasw`cloud: sudo qconf -Ae /var/lib/gridengine/etc/exechosts/tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs -> [email protected] added "tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs" to exechost list
- 08:06 valhallasw`cloud: ok, success. /var/lib/gridengine/etc/exechosts/tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs now exists. Do I still have to add it manually to the grid? I suppose so.
- 08:04 valhallasw`cloud: installing packages from /data/project/.system/deb-trusty seems to fail. sudo apt-get update helps.
- 08:00 valhallasw`cloud: running puppet agent -tv again
- 07:55 valhallasw`cloud: argh. Disabling toollabs::node::web::generic again and enabling toollabs::node::web::lighttpd
- 07:54 valhallasw`cloud: various issues such as Error: /Stage[main]/Gridengine::Submit_host/File[/var/lib/gridengine/default/common/accounting]/ensure: change from absent to link failed: Could not set 'link' on ensure: No such file or directory - /var/lib/gridengine/default/common at 17:/etc/puppet/modules/gridengine/manifests/submit_host.pp; probably an ordering issue in
- 07:53 valhallasw`cloud: Setting up adminbot (1.7.8) ... chmod: cannot access '/usr/lib/adminbot/README': No such file or directory --- ran sudo touch /usr/lib/adminbot/README
- 07:37 valhallasw`cloud: applying role::labs::tools::compute and toollabs::node::web::generic to \tools-webgrid-lighttpd-1411
- 07:31 valhallasw`cloud: reading puppet suggests I should qconf -ah /var/lib/gridengine/etc/exechosts/tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs but that file is missing?
- 07:26 valhallasw`cloud: andrewbogott built tools-webgrid-lighttpd-1411 yesterday but it's not actually added as exec host. Trying to figure out how to do that...
2015-08-17
- 19:00 scfc_de: tools-checker-01, tools-exec-1410, tools-exec-catscan, tools-redis-01, tools-redis-02, tools-web-static-01, tools-webgrid-lighttpd-1406, tools-webproxy-02: Remounted /public/dumps (T109261).
- 16:17 andrewbogott: disable queues for tools-exec-1205 tools-exec-1207 tools-exec-1208 tools-exec-140 tools-exec-1404 tools-exec-1409 tools-exec-1410 tools-exec-catscan tools-web-static-01 tools-webgrid-lighttpd-1201 tools-webgrid-lighttpd-1205 tools-webgrid lighttpd-1206 tools-webgrid-lighttpd-1406 tools-webproxy-02
- 15:33 andrewbogott: re-enabling the queue on tools-exec-1211 tools-exec-1212 tools-exec-1215 tools-exec-1403 tools-exec-1406 tools-master tools-shadow tools-webgrid-generic-1402 tools-webgrid-lighttpd-1203 tools-webgrid-lighttpd-1208 tools-webgrid-lighttpd-1403 tools-webgrid-lighttpd-1404 tools-webproxy-01
- 14:50 andrewbogott: killing remaining jobs on tools-exec-1211 tools-exec-1212 tools-exec-1215 tools-exec-1403 tools-exec-1406 tools-master tools-shadow tools-webgrid-generic-1402 tools-webgrid-lighttpd-1203 tools-webgrid-lighttpd-1208 tools-webgrid-lighttpd-1403 tools-webgrid-lighttpd-1404 tools-webproxy-01
2015-08-15
- 05:14 andrewbogott: resumed tools-exec-gift, seems not to have been the culprit
- 05:10 andrewbogott: suspending tools-exec-gift, just for a moment...
2015-08-14
- 17:21 andrewbogott: disabling grid jobqueue for tools-exec-1211 tools-exec-1212 tools-exec-1215 tools-exec-1403 tools-exec-1406 tools-master tools-shadow tools-webgrid-generic-1402 tools-webgrid-lighttpd-1203 tools-webgrid-lighttpd-1208 tools-webgrid-lighttpd-1403 tools-webgrid-lighttpd-1404 tools-webproxy-01 in anticipation of monday reboot of labvirt1004
- 15:20 andrewbogott: Adding back to the grid engine queue: tools-exec-1216 tools-exec-1219 tools-exec-1407 tools-mail tools-services-02 tools-webgrid-generic-1401 tools-webgrid-lighttpd-1202 tools-webgrid-lighttpd-1207 tools-webgrid-lighttpd-1210 tools-webgrid-lighttpd-1402 tools-webgrid-lighttpd-1407
- 14:43 andrewbogott: killing remaining jobs on tools-exec-1216 tools-exec-1219 tools-exec-1407 tools-mail tools-services-02 tools-webgrid-generic-1401 tools-webgrid-lighttpd-1202 tools-webgrid-lighttpd-1207 tools-webgrid-lighttpd-1210 tools-webgrid-lighttpd-1402 tools-webgrid-lighttpd-1407
2015-08-13
- 18:51 valhallasw`cloud: which was resolved by scfc earlier
- 18:50 valhallasw`cloud: tools-exec-1201/Puppet staleness was critical due to an agent lock (Ignoring stale puppet agent lock for pid
Run of Puppet configuration client already in progress; skipping (/var/lib/puppet/state/agent_catalog_run.lock exists)) - 18:08 scfc_de: scfc@tools-exec-1201: Removed stale /var/lib/puppet/state/agent_catalog_run.lock; Puppet run was started Aug 12 15:06:08, instance was rebooted ~ 15:14.
- 16:44 andrewbogott: disabling job queue for tools-exec-1216 tools-exec-1219 tools-exec-1407 tools-mail tools-services-02 tools-webgrid-generic-1401 tools-webgrid-lighttpd-1202 tools-webgrid-lighttpd-1207 tools-webgrid-lighttpd-1210 tools-webgrid-lighttpd-1402 tools-webgrid-lighttpd-1407
- 14:48 andrewbogott: and tools-webgrid-lighttpd-1408
- 14:48 andrewbogott: rescheduling (and in some cases killing) jobs on tools-exec-1203 tools-exec-1210 tools-exec-1214 tools-exec-1402 tools-exec-1405 tools-exec-gift tools-services-01 tools-web-static-02 tools-webgrid-generic-1403 tools-webgrid-lighttpd-1204 tools-webgrid-lighttpd-1209 tools-webgrid-lighttpd-1401 tools-webgrid-lighttpd-1405
2015-08-12
- 16:05 andrewbogott: depooling tools-exec-1203 tools-exec-1210 tools-exec-1214 tools-exec-1402 tools-exec-1405 tools-exec-gift tools-services-01 tools-web-static-02 tools-webgrid-generic-1403 tools-webgrid-lighttpd-1204 tools-webgrid-lighttpd-1209 tools-webgrid-lighttpd-1401 tools-webgrid-lighttpd-1405 tools-webgrid-lighttpd-1408
- 15:20 valhallasw`cloud: re-enabling queues on restarted hosts
- 14:41 andrewbogott: forcing reschedule of jobs on tools-exec-1201 tools-exec-1202 tools-exec-1204 tools-exec-1206 tools-exec-1209 tools-exec-1213 tools-exec-1217 tools-exec-1218 tools-exec-1408 tools-webgrid-generic-1404 tools-webgrid-lighttpd-1409 tools-webgrid-lighttpd-1410
2015-08-11
- 18:17 andrewbogott: depooling tools-exec-1201 tools-exec-1202 tools-exec-1204 tools-exec-1206 tools-exec-1209 tools-exec-1213 tools-exec-1217 tools-exec-1218 tools-exec-1408 tools-webgrid-generic-1404 tools-webgrid-lighttpd-1409 tools-webgrid-lighttpd-1410 in anticipation of labvirt1001 reboot tomorrow
2015-08-04
- 13:43 scfc_de: Fixed owner of ~tools.kasparbot/error.log (T99576).
2015-08-03
- 19:13 andrewbogott: deleted tools-static-01
2015-08-01
- 18:09 andrewbogott: depooling/rebooting tools-webgrid-lighttpd-1407 because it’s unable to fork
- 16:54 scfc_de: tools-webgrid-lighttpd-1407: Removed exim paniclog (OOM).
2015-07-30
- 15:00 andrewbogott: rebooting tools-bastion-01 aka tools-login
- 14:46 scfc_de: tools-webgrid-lighttpd-1408, tools-webgrid-lighttpd-1409: Removed exim paniclog (OOM).
- 02:53 scfc_de: "webservice uwsgi-python start" for blogconverter.
- 02:40 scfc_de: qdel 545479 (hazard-bot, "release=trusty-quiet", stuck since July 9th).
- 02:39 scfc_de: qdel 301895 (projanalysis, "release=trust", stuck since July 1st).
- 02:38 scfc_de: tools-webgrid-generic-1401, tools-webgrid-generic-1402, tools-webgrid-generic-1403: Rebooted for T107052 (disabled queue, killall -TERM lighttpd, let tools-manifest restart webservices elsewhere, reboot, enabled queue).
- 01:41 scfc_de: tools-webgrid-lighttpd-1406: Rebooted for T107052 (disabled queue, killall -TERM lighttpd, let tools-manifest restart webservices elsewhere, reboot, enabled queue).
2015-07-29
- 23:43 andrewbogott: draining, rebooting tools-webgrid-lighttpd-1408
- 20:11 andrewbogott: rebooting tools-webgrid-lighttpd-1404
- 19:58 scfc_de: tools-*: sudo rmdir /etc/ssh/userkeys/ubuntu{/.ssh{/authorized_keys\ {/public{/keys{/ubuntu{/.ssh,},},},},},}
2015-07-28
- 17:49 valhallasw`cloud: Jobs were drained at 19:43, but this did not decreade he rate, which is still at ~50k/minute. Now running "sysctl -w sunrpc.nfs_debug=1023 && sleep 2 && sysctl -w sunrpc.nfs_debug=0" which hopefully doesn't kill the server
- 17:43 valhallasw`cloud: rescheduled all webservice jobs on tools-webgrid-lighttpd-1401.eqiad.wmflabs, server is now empty
- 17:16 valhallasw`cloud: disabled queue "[email protected]"
- 02:07 YuviPanda: removed pacct files from tools-bastion-01
2015-07-27
- 21:27 valhallasw`cloud: turned off process accounting on tools-login while we try to find the root cause of phab:T107052:
accton off
2015-07-19
- 01:51 scfc_de: tools-bastion-01: Removed exim paniclog (OOM).
2015-07-11
- 00:01 mutante: fixing puppet runs on tools-webgrid-* via salt
2015-07-10
- 23:59 mutante: fixing puppet runs on tools-exec via salt
2015-07-10
- 20:09 valhallasw`cloud: it took three of us, but adminbot is updated!
July 6
- 09:49 valhallasw`cloud: 10:14 <jynus> s51053 is abusing his/her access to replica dbs and creating lag for other users. His/her queries are to be terminated. (= tools.jackbot / user jackpotte)
July 2
- 17:07 valhallasw`cloud: can't login to tools-mailrelay-01., probably because puppet was disabled for too long. Deleting instance.
- 16:12 valhallasw`cloud: I mean tools-bastion-01
- 16:12 valhallasw`cloud: stopping puppet on tools-login and tools-mail to check for changes in deploying https://gerrit.wikimedia.org/r/#/c/205914/
June 29
- 17:29 YuviPanda: failed over tools webproxy to tools-webproxy-02
June 21
- 18:57 scfc_de: tools-precise-dev: apt-get purge python-ldap3 (the previous fix for "Cache has broken packages, exiting" didn't work).
- 16:39 scfc_de: tools-precise-dev: apt-get clean ("Cache has broken packages, exiting").
- 16:33 scfc_de: tools-submit: Removed exim4 paniclog (OOM).
June 19
- 15:07 YuviPanda: remounting /data/scratch
June 10
- 11:52 YuviPanda: tools-trusty be gone
June 8
- 16:31 YuviPanda: added Nova Tools Bot as admin, for automated nova API access
June 7
- 17:05 YuviPanda: killed sort /data/project/templatetiger/public_html/dumps/ruwiki-2015-03-24.txt -k4,4 -k2,2 -k3,3n -k5,5n -t? -o /data/project/templatetiger/public_html/dumps/sort/ruwiki-2015-03-24.txt -T /data/project/templatetiger to rescue NFS
June 5
- 17:44 YuviPanda: migrate tools-shadow to labvirt1002
June 2
- 18:34 Coren: rebooting tools-webgrid-lighttpd-1406.eqiad.wmflabs
- 16:27 YuviPanda: cleaned out /etc/hosts file on tools-shadow
- 16:20 Coren: switching back to tools-master
- 16:10 YuviPanda: restart nscd on tools-submit
- 15:54 Coren: Switching names for tools-exec-1401
- 15:43 Coren: adding the "new" exec nodes (aka, current nodes with new names)
- 14:34 YuviPanda: turned off dnsmasq for toollabs
- 13:54 Coren: adding new-style names for submit hosts
- 13:53 YuviPanda: moved tools-master / shadow to designate
- 13:52 Coren: new-style names for gridengin admin hosts added
- 13:28 Coren: sge_shadowd started a new master as expected, after /two/ timeouts of 60s (unexpected)
- 13:23 Coren: stracing the shadowd to see what's up; master is down as expected.
- 13:17 Coren: killing the sge_qmaster to test failover
- 12:56 YuviPanda: switched labs webproxies to designate, forcing puppet run and restarting nscd
May 29
- 13:39 YuviPanda: tools-redis-01 is redis master now
- 13:35 YuviPanda: enable puppet on all hosts, redis move-around completed
- 13:01 YuviPanda: recreating tools-redis-01 and -02
- 12:52 YuviPanda: disable puppet on all toollabs hosts for tools-redis update
- 12:27 YuviPanda: created two redis instances (tools-redis-01 and tools-redis-02), beginning to set up stuff
May 28
- 12:22 wm-bot: petrb: inserted some local IP's to hosts file
- 12:15 wm-bot: petrb: shutting nscd off on tools-master
- 12:14 wm-bot: petrb: test
- 11:28 petan: syslog is full of these May 28 11:27:36 tools-master nslcd[1041]: [81823a] <group=550> error writing to client: Broken pipe
- 11:25 petan: rebooted tools-master in order to try fix that network issues
May 27
- 20:10 LostPanda: disabled puppet on tools-shadow too
- 19:46 LostPanda: echo -n 'tools-master.eqiad.wmflabs' > /var/lib/gridengine/default/common/act_qmaster haaail someone?
- 19:10 YuviPanda: reverted gridengine-common on tools-shadow to 6.2u5-4 as well, to match tools-master
- 18:58 YuviPanda: rebooting tools-master after switchoer failed and it can not seem to do DNS
May 23
- 19:56 scfc_de: tools-webgrid-lighttpd-1410: Removed exim4 paniclog (OOM).
May 22
- 20:37 yuvipanda: deleted and depooled tools-exec-07
May 20
- 20:09 yuvipanda: transient shinken puppet alerts because I tried to force puppet runs on all tools hosts but cancelled
- 20:01 yuvipanda: enabling puppet on all hosts
- 20:01 yuvipanda: tested new /etc/hosts on tools-bastion-01, puppet run produced no diffs, all good
- 19:56 yuvipanda: copy cleaned up and regenerated /etc/hosts from tools-precise-dev to all toollabs hosts
- 19:54 yuvipanda: copy cleaned up hosts file to /etc/hosts on tools-precise-dev
- 19:54 yuvipanda: enabled puppet on tools-precise-dev
- 19:33 yuvipanda: disabling puppet on *all* hosts for https://gerrit.wikimedia.org/r/#/c/210000/
- 06:21 yuvipanda: killed a bunch of webservice jobs stuck in dRr state
May 19
- 21:06 yuvipanda: failed over services to tools-services-02, -01 was refusing to start some webservices with permission denied errors for setegid
- 20:16 yuvipanda: qdel -f for all webservice jobs that were in dr state
- 20:12 yuvipanda: force killed croptool webservice
May 18
- 01:36 yuvipanda: created new tools-checker-01, applying role and provisioning
- 01:32 yuvipanda: killed tools-checker-01 instance, recreating
May 15
- 12:06 valhallasw: killed those perl scripts; kmlexport's lighttpd is also using excessive memory (5%), so restarting that
- 12:01 valhallasw: webgrid-lighttpd-1402 puppet failure caused by major memory usage; tools.kmlexport is running heavy perl scripts
- 00:27 yuvipanda: cleared graphite data for /var/* mounts on tools-redis
May 14
- 21:53 valhallasw: shut down & removed "tools-exec-08.eqiad.wmflabs" from execution host list
- 21:11 valhallasw: forced rescheduling of (non-cont) welcome.py job (iluvatarbot, jobid 8869)
- 03:29 yuvipanda: drained, depooled and deleted tools-exec-15
May 10
- 22:08 yuvipanda: created tools-precise-dev instance
- 09:28 yuvipanda: cleared and depooled tools-exec-02 and -13. only job running was deadlocked for a long, long time (week)
- 05:47 scfc_de: tools-submit: Removed paniclog (OOM) and stopped apache2.
May 5
- 18:50 Betacommand: helperbot WP:AVI bot running logged out owner is MIA, Coren killed job from 1204 and commented out crontab
May 4
- 21:24 yuvipanda: reboot tools-submit, was stuck
May 2
- 10:21 yuvipanda: drained all the old webgrid nodes, pooled in all the new webgrid nodes! POTATO!
- 10:13 yuvipanda: cleaned out wegrid jobs from tools-webgrid-03
- 10:12 yuvipanda: pooled tools-webgrid-lighttpd-{06-10}
- 08:56 yuvipanda: drained and deleted tools-webgrid-01
- 07:31 yuvipanda: depooled and deleted tools-webgrid-{01,02}
- 07:31 yuvipanda: disabled catmonitor task / cron, was heavily using an sqlite db on NFS
- 06:56 yuvipanda: pooled tools-webgrid-generic-{01-04}
- 03:44 yuvipanda: drained and deleted old trusty webgrid tools-webgrid-{05-07}
- 02:13 yuvipanda: created tools-webgrid-lighttpd-12{01-05} and tools-webgrid-generic-14{01-04}
- 01:59 yuvipanda: created tools-webgrid-lighttpd-14{01-10}
- 01:58 yuvipanda: increased tools instance quota
May 1
- 03:55 YuviKTM: depooled and deleted tools-exec-20
- 03:54 YuviKTM: killed final job in tools-exec-20 (9911317), decommissioning node
April 30
- 19:33 YuviKTM: depooled and deleted tools-exec-01, -05, -06 and -11.
- 19:31 YuviKTM: depooled and deleted tools-exec-01, -05, -06 and -11.
- 06:30 YuviKTM: added public IPs for all exec nodes so IRC tools continue to work. Removed all associated hostnames, let’s not do those
- 06:13 YuviKTM: allocating new floating IPs for the new instances, because IRC bots need them.
- 05:42 YuviKTM: disabled and drained tools-exec-1{1-5} of continuous jobs
- 05:40 YuviKTM: pooled in tools-exec-121{1-9}
- 05:39 YuviKTM: rebooted tools-exec-121{1-9} instances so they can apply gridengine-common properly
- 05:39 YuviKTM: created new instances tools-exec-121{1-9} as precise
- 05:39 YuviKTM: killed tools-dev, nobody still ssh’d in, no crontabs
- 05:39 YuviKTM: deplooled exec-{06-10} rejigged jobs to newer nodes
- 05:39 YuviKTM: delete tools-exec-10, was out of jobs
- 04:28 YuviKTM: deleted tools-exec-09
- 04:27 YuviKTM: depooled tools-exec-09.eqiad.wmflabs
- 04:23 YuviKTM: repooled tools-exec-1201 is all good now
- 04:19 YuviKTM: rejuggle jobs again in trustyland
- 04:14 YuviKTM: repooled tools-exec-09, apt troubles fixed
- 04:08 YuviKTM: depooled tools-exec-09, apt troubles
- 04:04 YuviKTM: pooled tools-exec-1408 and tools-exec-1409
- 04:00 YuviKTM: pooled tools-exec-1406 and 1407
- 03:58 YuviKTM: pooled tools-exec-12{02-10}, forgot to put appropriate roles on 1201, fixing now
- 03:54 YuviKTM: tools-exec-03 and -04 have been deleted a long time ago
- 03:53 YuviKTM: depooled tools-exec-03 / 04
- 03:31 YuviKTM: depooled and deleted tools-exec-12 had nothing on it
- 03:28 YuviKTM: deleted toolx-exec-21 to 24, one task still running on tools-exec
- 03:24 YuviKTM: disabled and drained continuous tasks off tools-exec-20 to tools-exec-24
- 03:18 YuviKTM: pooled tools-exec-1403, 1404
- 03:13 YuviKTM: pooled tools-exec-1402
- 03:07 YuviKTM: pooled tools-exec-1405
- 03:04 YuviKTM: pooled tools-exec-1401
- 02:53 YuviKTM: created tools-exec-14{06-10}
- 02:14 YuviKTM: created toolx-exec-14{01-05}
- 01:09 YuviPanda: killing local copy of python-requests, there seems to be a newer vesrion in prod
April 29
- 19:33 valhallasw`cloud: re-created tools-mailrelay-01 with precise: Nova_Resource:I-00000bca.eqiad.wmflabs
- 19:30 YuviPanda: set appopriate classes for recreated tools-exec-12* nodes
- 19:28 YuviPanda: recreated tools-static-02
- 19:11 YuviPanda: failed over tools-static to tools-static-01
- 14:47 andrewbogott: deleting tools-exec-04
- 14:44 Coren: -exec-04 drained; removed from queues. Rest well, old friend.
- 14:41 Coren: disabled -exec-04 (going away)
- 02:35 YuviPanda: set tools-exec-12{01-10} to configure as exec nodes
- 02:27 YuviPanda: created tools-exec-12{01-10}
April 28
- 21:41 andrewbogott: shrinking tools-master
- 21:33 YuviPanda: failover is going to take longer than actual recompression for tools-master, so let’s just recompress. tools-shadow should take over automatically if that doesn’t work
- 21:32 andrewbogott: shrinking tools-redis
- 21:28 YuviPanda: attempting to failover gridengine to tools-shadow
- 21:27 andrewbogott: shrinking tools-submit |
- 21:21 YuviPanda: backup crontabs onto NFS
- 21:18 andrewbogott: shrinking tools-webproxy-02
- 21:14 andrewbogott: shrinking tools-static-01
- 21:11 andrewbogott: shrinking tools-exec-gift
- 21:06 YuviPanda: failover tools-webproxy to tools-webproxy-01
- 21:06 andrewbogott: stopping, shrinking and starting tools-exec-catscan
- 21:01 YuviPanda: failover tools-static to tools-static-02
- 20:53 andrewbogott: stopping, shrinking, restarting tools-shadow
- 20:43 andrewbogott: stopping, shrinking, starting tools-static-02
- 20:39 valhallasw`cloud: created tools-mailrelay-01 Nova_Resource:I-00000bac.eqiad.wmflabs
- 20:26 YuviPanda: failed over tools-services to services-01
- 18:11 Coren: reenabled -webgrid-generic-02
- 18:05 Coren: reenabled -webgrid-03, -webgrid-08, -webgrid-generic-01; drained -webgrid-generic-02
- 17:44 Coren: -webgrid-03, -webgrid-08 and -webgrid-generic-01 drained
- 14:04 Coren: reenable -exec-11 for jobs.
- 13:55 andrewbogott: stopping tools-exec-11 for a resize experiment
April 25
- 01:32 YuviPanda: deleted tools-static, tools-static-01 has taken over
- 01:02 YuviPanda: deleted tools-login, tools-bastion-01 has been running for long enoug
April 24
- 16:29 Coren: repooled -exec-02, -08, -12
- 16:05 Coren: -exec-02, -08 and -12 draining
- 15:54 Coren: reenabled tools-exec-07, -10 and -11 after reboot of host
- 15:41 Coren: -exec-03 goes away for good.
- 15:31 Coren: draining -exec-03 to ease migration
- 13:43 Coren: draining tools-exec-07,10,11 to allow virt host reboot
April 23
- 22:41 YuviPanda: disabled *@tools-exec-09
- 22:40 YuviPanda: add tools-exec-09 back to @general
- 22:38 YuviPanda: take tools-exec-09 from @general group
- 20:53 YuviPanda: restart bigbrother
- 20:28 YuviPanda: restarted nscd on tools-login and tools-dev
- 20:22 valhallasw`cloud: removed
10.68.16.4 tools-webproxy tools.wmflabs.orgfrom /etc/hosts - 13:17 andrewbogott: beginning migration of tools instances to labvirt100x hosts
- 01:00 YuviPanda: good bye tools-login.eqiad.wmflabs
April 20
- 13:38 scfc_de: tools-mail: Removed paniclog and killed superfluous exim.
April 18
- 20:09 YuviPanda: sysctl vm.overcommit_memory=1 on tools-redis to allow it to bgsave again
- 19:52 valhallasw`cloud: tools-redis unresponsive (T96485); rebooting
April 17
- 01:48 YuviPanda: disable puppet on live webproxy (-01) to apply firewall changes to -02
April 16
- 20:57 Coren: -webgrid-08 drained, rebooting
- 20:46 Coren: -webgrid-03 repooled, depooling -webgrid-08
- 20:45 Coren: -webgrid-03 drained, rebooting
- 20:38 Coren: -webgrid-03 depooled
- 20:38 Coren: -webgrid-02 repooled
- 20:35 Coren: -webgrid-02 drained, rebooting
- 20:33 Coren: -webgrid-02 depooled
- 20:32 Coren: -webgrid-01 repooled
- 20:06 Coren: -webgrid-01 drained, rebooting.
- 19:56 Coren: depooling -webgrid-01 for reboot
- 14:37 Coren: rebooting -master
- 14:29 Coren: rebooting -mail
- 14:22 Coren: rebooting -shadow
- 14:22 Coren: -exec-15 repooled
- 14:19 Coren: -exec-15 drained, rebooting.
- 13:46 Coren: -exec-14 repooled. That's it for general exec nodes.
- 13:44 Coren: -exec-14 drained, rebooting.
April 15
- 21:06 Coren: -exec-10 repooled
- 20:55 Coren: -exec-10 drained, rebooting
- 20:49 Coren: -exec-07 repooled.
- 20:47 Coren: -exec-07 drained, rebooting
- 20:43 Coren: -exec-06 requeued
- 20:41 Coren: -exec-06 drained, rebooting
- 20:15 Coren: repool -exec-05
- 20:10 Coren: -exec-05 drained, rebooting.
- 19:56 Coren: -exec-04 repooled
- 19:52 Coren: -exec-04 drained, rebooting.
- 19:41 Coren: disabling new jobs on remaining (exec) precise instances
- 19:32 Coren: repool -exec-02
- 19:30 Coren: draining -exec-04
- 19:29 Coren: -exec-02 drained, rebooting
- 19:28 Coren: -exec-03 rebooted, requeing
- 19:26 Coren: -exec-03 drained, rebooting
- 18:50 Coren: dequeuing tools-exec-03 whilst waiting for -02 to drain.
- 18:43 Coren: tools-exec-01 back sans idmap, returning to pool
- 18:40 Coren: tools-exec-01 drained of jobs; rebooting
- 18:39 YuviPanda: disabled puppet on running webproxy, tools-webproxy-01
- 18:25 Coren: disabled -exec-01 and -exec-02 to new jobs.
April 14
- 13:13 scfc_de: tools-submit: Removed exim paniclog (OOM doom).
- 13:13 scfc_de: tools-mail: Killed superfluous exim and removed paniclog.
April 13
- 21:11 YuviPanda: restart portgranter on all webgrid nodes
April 12
- 10:52 scfc_de: tools-mail: Killed superfluous exims and removed paniclog.
April 11
- 21:49 andrewbogott: moved /data/project/admin/toollabs to /data/project/admin/toollabsbak on tools-webproxy-01 and tools-webproxy-02 to fix permission errors
- 02:15 YuviPanda: rebooted tools-submit, was not responding
April 10
- 07:10 PissedPanda: take out tools-services-01 to test switchover and also to recreate as small
- 05:20 YuviPanda: delete the tomcat node finally :D
April 9
- 23:24 scfc_de: rm -f /puppet_{host,service}groups.cfg on all hosts (apparently a Puppet/hiera mishap last November).
- 23:11 scfc_de: tools-webgrid-04: Rescheduled all jobs running on this instance (T95537).
- 08:32 scfc_de: tools-mail: Removed paniclog (multiple exims, but only one found).
April 8
- 13:25 scfc_de: Repaired servicegroups repository and restarted toolhistory job; was stuck at 2015-03-29T09:15:05Z (NFS?).
- 12:01 scfc_de: Removed empty tools with no maintainers javed/javedbaker/shell.
- 09:10 scfc_de: Removed stale proxy entries for analytalks/anno/commons-coverage/coursestats/eagleeye/hashtags/itwiki/mathbot/nasirkhanbot/rc-vikidia/wikistream.
April 7
- 07:42 scfc_de: tools-mail: Killed superfluous exim and removed paniclog.
April 5
- 10:11 scfc_de: tools-mail: Killed superfluous exims and removed paniclog.
April 4
- 22:48 scfc_de: Removed zombie jobs (qdel 1991607,1994800,1994826,1994827,2054201,3449476,3450329,3451518,3451549,3451590,3451628,3451635,3451830,3451869,3452632,3452633,3452654,3452655,3452657,3452668,4218785,4219210,4219674,4219722,4219791,4219923,4220646).
- 08:49 scfc_de: tools-submit: Restarted bigbrother because it didn't notice admin's .bigbrotherrc.
- 08:49 scfc_de: Add webservice to .bigbrotherrc for admin tool.
- 03:35 scfc_de: Deployed jobutils/misctools 1.5 (T91954).
April 3
- 22:55 scfc_de: Removed empty cgi-bin directories.
- 20:35 scfc_de: tools-mail: Killed superfluous exims and removed paniclog.
April 2
- 20:07 scfc_de: tools-mail: Killed superfluous exims and removed paniclog.
- 20:06 scfc_de: tools-submit: Removed exim paniclog (OOM).
- 01:25 YuviPanda: created tools-bastion-02
April 1
- 00:14 scfc_de: tools-webgrid-03: Rebooted, was stuck on console input when unable to mount NFS on boot (per wikitech consule output).
March 31
- 14:02 Coren: rebooting tools-submit
- 07:07 YuviPanda: moved tools.wmflabs.org to tools-webproxy-01
- 07:02 YuviPanda: reboot tools-webgrid-03 and tools-exec-03
- 00:21 andrewbogott: temporarily shutting ‘toolsbeta-pam-sshd-motd-test’ down to conserve resources. It can be restarted any time.
March 30
- 22:53 Coren: resyncing project storage with rsync
- 22:40 Coren: reboot tools-login
- 22:30 Coren: also bastion2
- 22:28 Coren: reboot bastion1 so users can log in
- 21:49 Coren: rebooting dedicated exec nodes.
- 21:49 Coren: rebooting tools-submit
- 17:27 scfc_de: tools-mail: Removed paniclog (multiple exims, but only one found).
March 29
- 19:30 scfc_de: tools-submit: Restarted bigbrother for T90384.
March 28
- 19:42 YuviPanda: created tools-exec-20
March 26
- 21:24 scfc_de: tools-mail: Killed superfluous exims and removed paniclog.
March 25
- 16:49 scfc_de: tools-mail: Removed paniclog (multiple exims, but only one found).
March 24
- 16:03 scfc_de: tools-login: Removed exim paniclog (entries from Sunday).
- 15:51 scfc_de: tools-mail: Killed superfluous exims and removed paniclog.
March 23
- 21:23 scfc_de: tools-login, tools-dev, tools-trusty: Now actually disabled role::labs::bastion per T93661 :-).
- 21:08 scfc_de: tools-login, tools-dev, tools-trusty: role::labs::bastion is still enabled due to T93663.
- 20:57 scfc_de: tools-login, tools-dev, tools-trusty: Disabled role::labs::bastion per T93661.
- 03:02 andrewbogott: wiped out atop.log on tools-dev because /var was filling up
March 22
- 23:08 scfc_de: qconf -ah tools-bastion-01.eqiad.wmflabs
- 23:07 scfc_de: for host in {tools-bastion-01,tools-webgrid-07,tools-webgrid-generic-{01,02}}.eqiad.wmflabs; do qconf -as "$host"; done
- 23:07 yuvipanda: copied /etc/hosts into place on tools-bastion-01
March 21
- 16:18 scfc_de: tools-mail: Killed superfluous exim and removed paniclog.
March 15
- 22:38 scfc_de: tools-mail: Killed superfluous exims and removed paniclog.
March 13
- 16:23 YuviPanda: cleaned out / on tools-trusty
March 11
- 04:28 YuviPanda: tools-redis is back now, as trusty and hopefully slightly more fortified
- 04:14 YuviPanda: kill tools-redis instance, upgrade to trusty while it is down anyway
- 03:56 YuviPanda: restarted redis server, it had OOM-killed
March 9
- 11:02 scfc_de: Deleted probably outdated proxy entry for tool wp-signpost and restarted webservice.
- 10:22 scfc_de: Deleted obsolete proxy entries without webservice for tools bracketbot/herculebot/extreg-wos/pirsquared/searchsbl/translate/yifeibot.
- 10:11 scfc_de: Restarted webservices for tools blahma/catmonitor/catscan2/contributions-summary/eagleeye/imagemapedit/jackbot/tb-dev/vcat/wikihistory/xtools-ec (cf. T91939).
- 08:27 scfc_de: qmod -cq [email protected] (OOM of two jobs in the past).
March 7
- 12:17 scfc_de: Moved obsolete packages that are installed on no instance at all from /data/project/.system/deb to ~tools.admin/archived-packages.
March 6
- 07:46 scfc_de: Set role::labs::tools::toolwatcher for tools-login.
- 07:43 scfc_de: Deployed jobutils/misctools 1.4.
March 2
- 09:53 YuviPanda: added ananthrk to project
- 08:41 YuviPanda: delete tools-uwsgi-01
- 08:11 YuviPanda: delete tools-uwsgi-02 because https://phabricator.wikimedia.org/T91065
March 1
- 15:11 YuviPanda|brb: pooled in tools-webgrid-07 to lighty webgrid, moving some tools off -05 and -06 to relieve pressure
February 28
- 07:51 YuviPanda: create tools-webgrid-07
- 01:00 Coren: Set vm.overcommit_memory=0 on -webgrid-05 (also trusty)
- 01:00 Coren: Also That was -webgrid-05
- 00:59 Coren: set exec-06 to vm.overcommit_memory=0 for now, until the vm behaviour difference between precise and trusty can be nailed down.
February 27
- 17:53 YuviPanda: increased quota to 512G RAM and 256 cores
- 15:33 Coren: Switched back to -master. I'm making a note here: great success.
- 15:27 Coren: Gridengine master failover test part three; killing the master with -9
- 15:20 Coren: Gridengine master failover test part deux - now with verbose logs
- 15:10 YuviPanda: created tools-webgrid-generic-02
- 15:10 YuviPanda: increase instance quota to 64
- 15:10 Coren: Master restarted - test not sucessful.
- 14:50 Coren: testing gridengine master failover starting now
- 08:27 YuviPanda: restart *all* webtools (with qmod -rj webgrid-lighttpd) to have tools-webproxy-01 and -02 pick them up as well
February 24
- 18:33 Coren: tools-submit not recovering well from outage, kicking it.
- 17:58 YuviPanda: rebooting *all* webgrid jobs on toollabs
February 16
- 02:31 scfc_de: rm -f /var/log/exim4/paniclog.
February 13
- 18:01 Coren: tools-redis is dead, long live tools-redis
- 17:48 Coren: rebuilding tools-redis with moar ramz
- 17:38 legoktm: redis on tools-redis is OOMing?
- 17:26 marktraceur: restarting grrrit-wm because it's not behaving
February 1
- 10:55 scfc_de: Submitted dummy jobs for tools ftl/limesmap/newwebtest/osm-add-tags/render/tsreports/typoscan/usersearch to get bigbrother to recognize those users and cleaned up output files afterwards.
- 07:51 YuviPanda: cleared error state of stuck queues
- 06:41 YuviPanda: set chmod +xw manually on /var/run/lighttpd on webgrid-05, need to investigate why it was necessary
- 05:47 YuviPanda: completed migrating magnus' tools to trusty, more details at https://etherpad.wikimedia.org/p/tools-trusty-move
- 05:37 YuviPanda: added tools-webgrid-06 as trusty webnode, operational now
- 04:52 YuviPanda: migrating all of magnus’ tools, after consultation with him (https://etherpad.wikimedia.org/p/tools-trusty-move for status)
- 04:10 YuviPanda: widar moved to trusty
- 03:01 YuviPanda: ran salt -G 'instanceproject:tools' cmd.run 'sudo rm -rf /var/tmp/core’ because disks were getting full.
January 29
- 17:26 YuviPanda: reschedule all tomcat jobs
January 27
- 23:27 YuviPanda: qdel -f 7662482 7661111 for Merlissimo
January 19
- 20:51 YuviPanda: because valhallasw is nice
- 10:34 YuviPanda: manually started tools-webgrid-generic-01
- 09:48 YuviPanda: restarted toold-webgrid-03
- 08:42 scfc_de: qmod -cq {continuous,mailq,task}@tools-exec-{06,10,11,15}.eqiad.wmflabs
- 08:36 scfc_de: tools-mail: rm -f /var/log/exim4/paniclog and killed second exim (belated SAL amendment.
January 16
- 22:11 scfc_de: tools-mail: rm -f /var/log/exim4/paniclog.
January 15
- 22:10 YuviPanda: created instance tools-webgrid-generic-01
January 11
- 06:38 scfc_de: tools-mail: rm -f /var/log/exim4/paniclog.
January 8
- 07:40 YuviPanda: increase memory limit for autolist from 4G to 7G
December 23
- 06:00 YuviPanda: tools-uwsgi-01 randomly went to SHUTOFF state, rebooting from virt1000
December 22
- 07:43 YuviPanda: increased RAM and Cores quota for tools
December 19
- 16:38 YuviPanda: puppet disabled on tools-webproxy because urlproxy.lua is handhacked to remove stupid syntax errors that got merged.
- 12:00 YuviPanda|brb: created tools-static, static http server
- 07:07 scfc_de: tools-webgrid-02: rm -f /var/log/exim4/paniclog (OOM, again).
December 17
- 22:38 YuviPanda: touched /data/project/repo/Packages so tools-webproxy stops complaining about that not xisting and never running apt-get
December 12
- 14:08 scfc_de: Ran Puppet on all hosts to fix puppet-run issue.
December 11
- 07:58 YuviPanda: rebooted tools-login, wasn’t responsive.
December 8
- 00:15 YuviPanda: killed all db and tools-webproxy aliases in /etc/hosts for tools-webproxy, since otherwise puppet fails because ec2id thinks we’re not in labs because hostname -d is empty because we set /etc/hosts to resolve IP directly to tools-webproxy
December 7
- 06:31 scfc_de: tools-webgrid-02: rm -f /var/log/exim4/paniclog (OOM, again).
- 06:31 scfc_de: tools-mail: rm -f /var/log/exim4/paniclog (multiple exim4 processes, again).
December 2
- 21:31 scfc_de: tools-mail: rm -f /var/log/exim4/paniclog (multiple exim4 processes, again).
- 21:30 scfc_de: tools-webgrid-02: rm -f /var/log/exim4/paniclog (OOM, again).
November 26
- 19:26 YuviPanda: created tools-webgrid-05 on trusty to set up a working webnode for trusty
November 25
- 06:53 scfc_de: tools-webgrid-02: rm -f /var/log/exim4/paniclog (OOM, again).
November 24
- 14:02 YuviPanda: rebooting tools-login, OOM'd
- 02:51 scfc_de: tools-webgrid-02: rm -f /var/log/exim4/paniclog (OOM, again).
November 22
- 19:05 scfc_de: tools-webgrid-02: rm -f /var/log/exim4/paniclog (OOM, again).
November 17
- 20:40 YuviPanda: cleaned out /tmp on tools-login
November 16
- 21:31 matanya: back to normal
- 21:27 matanya: "Could not resolve hostname bastion.wmflabs.org"
November 15
- 07:24 YuviPanda|zzz: move coredumps from tools-webgrid-04 to /home/yuvipanda
November 14
- 20:23 YuviPanda: cleared out coredumps on tools-webgrid-01 to free up space
- 18:26 YuviPanda: cleaned out core dumps on tools-webgrid
- 16:55 scfc_de: tools-webgrid-02: rm -f /var/log/exim4/paniclog (OOM).
November 13
- 21:11 YuviPanda: disable puppet on tools-dev to check shinken
- 21:00 scfc_de: qmod -cq continuous@tools-exec-09,continuous@tools-exec-11,continuous@tools-exec-13,continuous@tools-exec-14,mailq@tools-exec-09,mailq@tools-exec-11,mailq@tools-exec-13,mailq@tools-exec-14,task@tools-exec-06,task@tools-exec-09,task@tools-exec-11,task@tools-exec-13,task@tools-exec-14,task@tools-exec-15,webgrid-lighttpd@tools-webgrid-01,webgrid-lighttpd@tools-webgrid-02,webgrid-lighttpd@tools-webgrid-04 (fallout from /var being full).
- 20:38 YuviPanda: didn't actually stop puppet, need more patches
- 20:38 YuviPanda: stopping puppet on tools-dev to test shinken
- 15:30 scfc_de: tools-exec-06, tools-webgrid-01: rm -f /var/tmp/core/*.
- 13:31 scfc_de: tools-exec-09, tools-exec-11, tools-exec-13, tools-exec-14, tools-exec-15, tools-webgrid-02, tools-webgrid-04: rm -f /var/tmp/core/*.
November 12
- 22:07 StupidPanda: enabled puppet on tools-exec-07
- 21:47 StupidPanda: removed coredumps from tools-webgrid-04 to reclaim space
- 21:45 StupidPanda: removed coredump from tools-webgrid-01 to reclaim space
- 20:31 YuviPanda: disabling puppet on tools-exec-07 to test shinken
November 7
- 13:56 scfc_de: tools-submit, tools-webgrid-04: rm -f /var/log/exim4/paniclog (OOM around the time of the filesystem outage).
November 6
- 13:21 scfc_de: tools-dev: Gzipped /var/log/account/pacct.0 (804111872 bytes); looks like root had his own bigbrother instance running on tools-dev (multiple invocations of webservice per second).
November 5
- 19:15 mutante: exec nodes have p7zip-full now
- 10:07 YuviPanda: cleaned out pacct and atop logs on tools-login
November 4
- 19:50 mutante: - apt-get clean on tools-login, and gzipped some logs
November 1
- 12:51 scfc_de: Removed log files in /var/log/diamond older than five weeks (pdsh -f 1 -g tools sudo find /var/log/diamond -type f -mtime +35 -ls -delete).
October 30
- 14:37 YuviPanda: cleaned out pacct and atop logs on tools-dev
- 06:18 paravoid: killed a "vi" process belonging to user icelabs and running for two days saturating the I/O network bandwidth, and rm'ed a 3.5T(!) .final_mg.txt.swp
October 27
- 16:06 scfc_de: tools-mail: Killed -HUP old queue runners and restarted exim4; probably the source of paniclog's "re-exec of exim (/usr/sbin/exim4) with -Mc failed: No such file or directory".
- 15:36 scfc_de: tools-exec-07, tools-exec-14, tools-exec-15: Recreated (empty) /var/log/apache2 and /var/log/upstart.
October 26
- 12:35 scfc_de: tools-exec-07, tools-exec-14, tools-exec-15: Created /var/log/account.
- 12:33 scfc_de: tools-trusty: Went through shadowed /var and rebooted.
- 12:31 scfc_de: tools-exec-07, tools-exec-14, tools-exec-15: Created /var/log/exim4, started exim4 and ran queue.
October 24
- 20:31 andrewbogott: moved tools-exec-12, tools-shadow and tools-mail to virt1006
October 23
- 22:55 Coren: reboot tools-shadow, upstart seems hosed
October 14
- 23:22 YuviPanda|zzz: removed stale puppet lockfile and ran puppet manually on tools-exec-07
October 11
- 15:31 andrewbogott: rebooting tools-master, stab in the dark
- 06:01 YuviPanda: restarted gridengine-master on tools-master
October 4
- 18:31 scfc_de: tools-mail: Deleted /usr/local/bin/collect_exim_stats_via_gmetric and root's crontab; clean-up for Ic9e0b5bb36931aacfb9128cfa5d24678c263886b
October 2
- 17:59 andrewbogott: added Ryan back to tools admins because that turned out to not have anything to do with the bounce messages
- 17:32 andrewbogott: removing ryan lane from tools admins, because his email in ldap is defunct and I get bounces every time something goes wrong in tools
September 28
- 14:45 andrewbogott: rebased /var/lib/git/operations/puppet on toolsbeta-puppetmaster3
September 25
- 14:43 YuviPanda: cleaned up ghost /var/log (from before biglogs mount) that was taking up space, /var space situation better now
September 17
- 21:40 andrewbogott: caused a brief auth outage while messing with codfw ldap
September 15
- 11:00 YuviPanda: tested CPU monitoring on tools-exec-12 by running stress, seems to work
September 13
- 20:52 yuvipanda: cleaned out rotated log files on tools-webproxy
September 12
- 21:54 jeremyb: [morebots] booted all bots, reverted to using systemwide (.deb) codebase
September 8
- 16:08 scfc_de: tools-login: rm -f /var/log/exim4/paniclog (OOM @ 2014-09-07 15:13:59)
September 5
- 22:22 scfc_de: Deleted stale nginx entries for "rightstool" and "svgcheck"
- 22:20 scfc_de: Stopped 12 webservices for tool "meta" and started one
- 18:50 scfc_de: geohack's lighttpd dumped core and left an entry in Redis behind; tools-webproxy: "DEL prefix:geohack"; geohack: "webservice start"
September 4
- 19:47 lokal-profil: local-heritage Renamed two swedish tables
September 2
- 04:31 scfc_de: "iptables -A OUTPUT -d 10.68.16.1 -p udp -m udp --dport 53" on all hosts in support of bug #70076
August 23
- 17:44 scfc_de: qmod -cq task@tools-exec-07 (job #2796555, "11 : before job")
August 21
- 20:05 scfc_de: Deployed release 1.0.11 of jobutils and miscutils
August 15
- 16:45 legoktm: fixed grrrit-wm
- 16:36 legoktm: restarting grrrit-wm
August 14
- 22:36 scfc_de: Removed again jobs in error state due to LDAP with "for JOBID in $(qstat -u \* | sed -ne 's/^\([0-9]\+\) .*Eqw.*$/\1/p;'); do if qstat -j "$JOBID" | fgrep -q "can't get password entry for user"; then qdel "$JOBID"; fi; done"; cf. also bug #69529
August 12
- 03:32 scfc_de: tools-exec-08, tools-exec-wmt, tools-webgrid-02, tools-webgrid-03, tools-webgrid-04: Removed stale "apt-get update" processes to get Puppet working again
August 2
- 16:39 scfc_de: tools.mybot's crontab uses qsub without -M, added that as a temporary measure and will inform user later
- 16:36 scfc_de: Manually rerouted mails for [email protected]
August 1
- 22:41 scfc_de: Deleted all jobs in "E" state that were caused by an LDAP failure at ~ 2014-07-30 07:00Z ("can't get password entry for user [...]")
July 24
- 20:53 scfc_de: Set SGE "mailer" parameter again for bug #61160
- 14:51 scfc_de: Removed ignored file /etc/apt/preferences.d/puppet_base_2.7 on all hosts
July 21
- 18:39 scfc_de: Removed stale Redis entries for currentevents, misc2svg, osm4wiki, wp-signpost, wscredits and yadfa
- 18:38 scfc_de: Restarted webservice for stewardbots because it wasn't in Redis
- 18:33 scfc_de: Stopped eight (!) webservices of tools.bookmanagerv2 and started one again
July 18
- 14:29 scfc_de: admin: Set up .bigbrotherrc for toolhistory
- 13:24 scfc_de: Made tools-webgrid-04 a grid submit host
- 12:58 scfc_de: Made tools-webgrid-03 a grid submit host
July 16
- 22:41 YuviPanda: reloaded nginx on tools-webproxy to pick up https://gerrit.wikimedia.org/r/#/c/146466/3
- 15:18 scfc_de: replagstats OOMed four hours after start on May 6th; with ganglia.wmflabs.org down, not restarting
- 15:14 scfc_de: Restarted toolhistory with 350 MBytes; OOMed June 1st
July 15
- 11:31 scfc_de: Started webservice for sulinfo; stopped at 2014-06-29 18:31:04
July 14
- 20:40 andrewbogott: on tools-login
- 20:39 andrewbogott: manually deleted /var/lib/apt/lists/lock, forcing apt to update
July 13
- 13:13 scfc_de: tools-exec-13: Moved /var/log around, reboot, iptables-restore & reenabled queues
- 13:11 scfc_de: tools-exec-12: Moved /var/log around, reboot & iptables-restore
July 12
- 17:57 scfc_de: tools-exec-11: Stopping apache2 service; no clue how it got there
- 17:53 scfc_de: tools-exec-11: Moved log files around, rebooted, restored iptables and reenabled queue ("qmod -e {continuous,task}@tools-exec-11...")
- 13:00 scfc_de: tools-exec-11, tools-exec-13: qmod -r continuous@tools-exec-1[13].eqiad.wmflabs in preparation of reboot
- 12:58 scfc_de: tools-exec-11, tools-exec-13: Disabled queues in preparation of reboot
- 11:58 scfc_de: tools-exec-11, tools-exec-12, tools-exec-13: mkdir -m 2750 /var/log/exim4 && chown Debian-exim:adm /var/log/exim4; I'll file a bug why the directory wasn't created later
July 11
- 11:59 scfc_de: tools-exec-11, tools-exec-12, tools-exec-13: cp -f /data/project/.system/hosts /etc/hosts
July 10
- 20:35 scfc_de: tools-exec-11, tools-exec-12, tools-exec-13: iptables-restore /data/project/.system/iptables.conf
- 16:00 YuviPanda: manually removed mariadb remote repo from tools-exec-12 instance, won't be added to new instances (puppet patch was merged)
- 01:33 YuviPanda|zzz: tools-exec-11 and tools-exec-13 have been added to the @general hostgroup
July 9
- 23:14 YuviPanda: applied execnode, hba and biglogs to tools-exec-11 and tools-exec-13
- 23:09 YuviPanda: created tools-exec-13 with precise
- 23:08 YuviPanda: created tools-exec-12 as trusty by accident, will keep on standby for testing
- 23:07 YuviPanda: created tools-exec-12
- 23:06 YuviPanda: created tools-exec-11
- 19:23 scfc_de: tools-webproxy: "iptables -A INPUT -p tcp \! --source 127/8 --dport 6379 -j REJECT" to block connections from other Tools instances to Redis again
- 14:12 scfc_de: tools-exec-cyberbot: Reran Puppet successfully and hotfixed the Peachy temporary file issue; will mail labs-l later
- 13:33 scfc_de: tools-exec-cyberbot: Freed 402398 inodes ...
- 12:50 scfc_de: tools-exec-cyberbot: "find /tmp -maxdepth 1 -type f -name \*cyberbotpeachy.cookies\* -mtime +30 -delete" as a first step
- 12:40 scfc_de: tools-exec-cyberbot: Root partition has run out of inodes
- 12:34 scfc_de: tools-exec-gift: Forgot to log yesterday: The problems were due to overload (load >> 150); SGE shouldn't have allowed that
- 12:28 YuviPanda: cleaned out old diamond archive logs on tools-master
- 12:28 YuviPanda: cleaned out old diamond archive logs on tools-webgrid-04
- 12:25 YuviPanda: cleaned out old diamond archive logs from tools-exec-08
July 8
- 20:57 scfc_de: tools-exec-gift: Puppet hangs due to "apt-get update" not finishing in time; manual runs of the latter take forever
- 19:52 scfc_de: tools-exec-wmt, tools-shadow: Removed stale Puppet lock files and reran manually (handy: "sudo find /var/lib/puppet/state -maxdepth 1 -type f -name agent_catalog_run.lock -ls -ok rm -f \{\} \; -exec sudo puppet agent apply -tv \;")
- 18:09 scfc_de: tools-webgrid-03, tools-webgrid-04: killall -TERM gmond (bug #64216)
- 17:57 scfc_de: tools-exec-08, tools-exec-09, tools-webgrid-02, tools-webgrid-03: Removed stale Puppet lock files and reran manually
- 17:26 scfc_de: tools-tcl-test: Rebooted because system said so
- 17:04 YuviPanda: webservice start on tools.meetbot since it seemed down
- 14:55 YuviPanda: cleaned out old diamond archive logs on tools-webproxy
- 13:39 scfc_de: tools-login: rm -f /var/log/exim4/paniclog ("daemon: fork of queue-runner process failed: Cannot allocate memory")
July 6
- 12:09 scfc_de: tools-mail: rm -f /var/log/exim4/paniclog after I20afa5fb2be7d8b9cf5c3bf4018377d0e847daef got merged
July 5
- 22:36 YuviPanda: cleared diamond archive logs on a bunch of machines, submitted patch to get rid of archive logs
- 22:17 YuviPanda: changed grid scheduling config, set weight_priority to 0.1 from 0.0 for https://bugzilla.wikimedia.org/show_bug.cgi?id=67555
July 4
- 08:51 scfc_de: tools-exec-08 (some hours ago): rm -f /var/log/diamond/* && restart diamond
- 00:02 scfc_de: tools-master: rm -f /var/log/diamond/* && restart diamond
July 3
- 16:59 Betacommand: Coren: It may take a while though; what the catscan queries was blocking is a DDL query changing the schema and that pauses replication.
- 16:58 Betacommand: Coren: transactions over 30ks killed; the DB should start catching up soon.
- 14:37 Betacommand: replication for enwiki is halted current lag is at 9876
July 2
- 00:21 YuviPanda: restarted diamond on almost all nodes to stop sending nfs stats, some still need to be flushed
- 00:21 YuviPanda: restarted diamond on all exec nodes to stop sending nfs stats
July 1
- 23:09 legoktm: tools-pywikibot started the webservice, don't know why it wasn't running
- 21:08 scfc_de: Reset queues in error state again
- 17:51 YuviPanda: tools-exec-04 removed stale pid file and force puppet run
- 16:07 YuviPanda: applied biglogs to tools-exec-02 and rejigged things
- 15:54 YuviPanda: tools-exec-02 removed stale puppet pid file, forcing run
- 15:51 Coren: adjusted resource limits for -exec-07 to match the smaller instance size.
- 15:50 Coren: created logfile disk for -exec-07 by hand (smaller instance)
- 01:53 YuviPanda: tools-exec-10 applied biglogs, moved logs around, killed some old diamond logs
- 01:41 YuviPanda: tools-exec-03 restarted diamond, atop, exim4, ssh to pick up new log partition
- 01:40 YuviPanda: tools-exec-03 applied biglogs, moved logs around, killed some old diamond logs
- 01:34 scfc_de: tools-exec-03, tools-exec-10: Removed /var/log/diamond/diamond.log, restarted diamond and bzip2'ed /var/log/diamond/*.log.2014*
June 30
- 22:10 YuviPanda: ran webservice start for enwp10
- 22:06 YuviPanda: stale lockfile in tools-login as well, removing and forcing puppet run
- 22:01 YuviPanda: removed stale lockfile for puppet, forcing run
- 19:58 YuviPanda|food: added tools-webgrid-04 to webgrid queue, had to start portgranter manually
- 17:43 YuviPanda: created tools-webgrid-04, applying webnode role and running puppet
- 17:27 YuviPanda: created tools-webgrid-03 and added it to the queue
June 29
- 19:45 scfc_de: magnustools: "webservice start"
- 18:24 YuviPanda: rebooted tools-webgrid-02. Could not ssh, was dead
June 28
- 21:07 YuviPanda: removed alias for tools-webproxy and tools.wmflabs.org from /etc/hosts on tools-webproxy
June 21
- 20:09 scfc_de: Created tool mediawiki-mirror (yuvipanda + Nemo_bis) and chown'ed & chmod o-w /shared/mediawiki
June 20
- 21:01 scfc_de: tools-webgrid-tomcat: Added to submit host list with "qconf -as" for bug #66882
- 14:47 scfc_de: Restarted webservice for mono; cf. bug #64219
June 16
- 23:50 scfc_de: Shut down diamond services and removed log files on all hosts
June 15
- 17:12 YuviPanda: deleted tools-mongo. MongoDB pre-allocates db files, and so allocating one db to every tool fills up the disk *really* quickly, even with 0 data. Their non preallocating version is 'not meant for production', so putting on hold for now
- 16:50 scfc_de: qmod -cq [email protected]
- 16:48 scfc_de: tools-exec-cyberbot: rm -f /var/log/diamond/diamond.log && restart diamond
- 16:48 scfc_de: tools-exec-cyberbot: No DNS entry (again)
June 13
- 22:59 YuviPanda: "sudo -u ineditable -s" to force creation of homedir, since the user was unable to login before. /var/log/auth.log had no record of their attempts, but now seems to work. straange
June 10
- 21:51 scfc_de: Restarted diamond service on all Tools hosts to actually free the disk space :-)
- 21:36 scfc_de: Deleted /var/log/diamond/diamond.log on all Tools hosts to free up space on /var
June 3
- 17:50 Betacommand: Brief network outage. source: It's not clearly determined yet; we aborted the investigation to rollback and restore service. As far as we can tell, there is something subtly wrong with the switch configuration of LACP.
June 2
- 20:15 YuviPanda: create instance tools-trusty-test to test nginx proxy on trusty
- 19:00 scfc_de: zoomviewer: Set TMPDIR to /data/project/zoomviewer/var/tmp and ./webwatcher.sh; cannot see *any* temporary files being created anywhere, though. iipsrv.fcgi however has TMPDIR set as planned.
May 27
- 18:49 wm-bot: petrb: temporarily hardcoding tools-exec-cyberbot to /etc/hosts so that host resolution works
- 10:36 scfc_de: tools-webgrid-01: removed all files of tools.zoomviewer in /tmp
- 10:22 scfc_de: tools-webgrid-01: /tmp was full, removed files of tools.zoomviewer older than five days
- 07:52 wm-bot: petrb: restarted webservice of tool admin in order to purge that huge access.log
May 25
- 14:27 scfc_de: tools-mail: "rm -f /var/log/exim4/paniclog" to leave only relay_domains errors
May 23
- 14:14 andrewbogott: rebooting tools-webproxy so that services start logging again
- 14:10 andrewbogott: applying role::labs::lvm::biglogs on tools-webproxy because /var/log was full and causing errors
May 22
- 02:45 scfc_de: tools-mail: Enabled role::labs::lvm::biglogs, moved data around & rebooted.
- 02:36 scfc_de: tools-mail: Removed all jsub notifications from hazard-bot from queue.
- 01:46 scfc_de: hazard-bot: Disabled minutely cron job github-updater
- 01:36 scfc_de: tools-mail: Freezing all messages to Yahoo!: "421 4.7.1 [TS03] All messages from 208.80.155.162 will be permanently deferred; Retrying will NOT succeed. See http://postmaster.yahoo.com/421-ts03.html"
- 01:12 scfc_de: tools-mail: /var is full
May 20
- 18:34 YuviPanda: back to homerolled nginx 1.5 on proxy, newer versions causing too many issues
May 16
- 17:01 scfc_de: tools-webgrid-02: rm -f /tmp/core (tools.misc2svg, May 13 06:10, 3861106688)
May 14
- 16:31 scfc_de: tools-webproxy: "iptables -A INPUT -p tcp \! --source 127/8 --dport 6379 -j REJECT" to block connections from other Tools instances to Redis
- 00:23 Betacommand: 503's related to bug 65179
May 13
- 20:36 YuviPanda: restarting redis on tools-webproxy fixed 503s
- 20:36 valhallasw: redis failed, causing tools-webproxy to thow 503's
- 19:09 marktraceur: Restarted grrrit because it had a stupid nick
May 10
- 14:50 YuviPanda: upgraded nginx to 1.7.0 on tools-webproxy to get SPDY/3.1
May 9
- 13:16 scfc_de: Cleared error state of queues {continuous,mailq,task}@tools-exec-06 and webgrid-lighttpd; no obvious or persistent causes
May 6
- 19:31 scfc_de: replagstats fixed; Ganglia graphs are now under the virtual host "tools-replags"
- 17:53 scfc_de: Don't think replagstats is really working ...
- 16:40 scfc_de: Moved ~scfc/bin/replagstats to ~tools.admin/bin/ and enabled as a continuous job (cf. also bug #48694).
April 28
- 11:51 YuviPanda: pywikibugs Deployed bf1be7b
April 27
- 13:34 scfc_de: Restarted webservice for geohack and moved {access,error}.log to {access,error}.log.1
April 24
- 23:39 YuviPanda: restarted grrrit-wm, not greg-g. greg-g does not survive restarts and hence care must be taken to make sure he is not.
- 23:38 YuviPanda: restarted greg-g after cherry-picking aec09a6 for auth of IRC bot
- 23:33 legoktm: restarting grrrit-wm https://gerrit.wikimedia.org/r/129610
- 13:07 scfc_de: tools-mail: rm -f /var/log/exim4/paniclog (relay_domains bug)
April 20
- 14:27 scfc_de: tools-redis: Set role::labs::lvm::mnt and $lvm_mount_point=/var/lib, moved the data around and rebooted
- 14:08 scfc_de: tools-redis: /var is full
- 08:59 legoktm: grrrit-wm: 2014-04-20T08:28:15.889Z - error: Caught error in redisClient.brpop: Redis connection to tools-redis:6379 failed - connect ECONNREFUSED
- 08:48 legoktm: Your job 438884 ("lolrrit-wm") has been submitted
- 08:47 legoktm: [01:28:28] * grrrit-wm has quit (Remote host closed the connection)
April 13
- 14:20 scfc_de: Restarted webservice for wikihistory to see if the change to PHP_FCGI_MAX_REQUESTS increases reliability
- 14:17 scfc_de: tools-webgrid-01, tools-webgrid-02: Set PHP_FCGI_MAX_REQUESTS to 500 in /usr/local/bin/lighttpd-starter per http://redmine.lighttpd.net/projects/1/wiki/docs_performancefastcgi#Why-is-my-PHP-application-returning-an-error-500-from-time-to-time
April 12
- 23:51 scfc_de: tools-mail: rm -f /var/log/exim4/paniclog ("unknown named domain list "+relay_domains"")
April 11
- 16:21 scfc_de: tools-login: Killed -HUP process consuming 2.6 GByte; cf. wikitech:User talk:Ralgis#Welcome to Tool Labs
April 10
- 18:20 scfc_de: tools-webgrid-01, tools-webgrid-02: "kill -HUP" all php-cgis that are not (grand-)children of lighttpd processes
April 8
- 05:06 Ryan_Lane: restart nginx on tools-proxy-test
- 05:03 Ryan_Lane: upgraded libssl on all nodes
April 4
- 15:48 Coren: Moar powar!!1!one: added two exec nodes (-09 -10) and one webgrid node (-02)
- 11:11 scfc_de: Set /data/project/.system/config/wikihistory.workers to 20 on apper's request
March 30
- 18:16 scfc_de: Removed empty directories /data/project/{d930913,sudo-test{,-2},testbug{,2,3}}: Corresponding service groups don't exist (anymore)
- 18:13 scfc_de: Removed /data/project/backup: Only empty dynamic-proxy backup files of January 3rd and earlier
March 29
- 10:14 wm-bot: petrb: disabled 1 job in cron in -login of user tools.tools-info which was killing login server
March 28
- 11:53 wm-bot: petrb: did the same on -mail server (removed /var/log/exim4/paniclog) so that we don't get spam every day
- 11:51 wm-bot: petrb: removed content of /var/log/exim4/paniclog
- 11:49 wm-bot: petrb: disabled default vimrc which everybody hates on -login
March 21
- 16:50 scfc_de: tools-login: pkill -u tools.bene (OOM)
- 16:13 scfc_de: rmdir /home/icinga (totally empty, "drwxr-xr-x 2 nemobis 50383 4096 Mär 17 16:42", perhaps artifact of mass migration?)
- 15:49 scfc_de: sudo cp -R /etc/skel /home/csroychan && sudo chown -R csroychan.wikidev /home/csroychan; that should close [[bugzilla:62132]]
- 15:15 scfc_de: sudo cp -R /etc/skel /home/annabel && sudo chown -R annabel.wikidev /home/annabel
- 15:14 scfc_de: sudo chown -R torin8.wikidev /home/torin8
March 20
- 18:36 scfc_de: Pointed tools-dev.wmflabs.org at tools-dev.eqiad.wmflabs; cf. [[Bugzilla:62883]]
March 5
- 13:57 wm-bot: petrb: test
March 4
- 22:35 wm-bot: petrb: uninstalling it from -login too
- 22:32 wm-bot: petrb: uninstalling apache2 from tools-dev it has nothing to do there
March 3
- 19:20 wm-bot: petrb: shutting down almost all services on webserver-02 in order to make system useable and finish upgrade
- 19:17 wm-bot: petrb: upgrading all packages on webserver-02
- 19:15 petan: rebooting webserver-01 which is totally dead
- 19:07 wm-bot: petrb: restarting apache on webserver-02 it complains about OOM but the server has more than 1.5g memory free
- 19:03 wm-bot: petrb: switched local-svg-map-maker to webserver-02 because 01 is not accessible to me, hence I can't debug that
- 16:44 scfc_de: tools-webserver-03: Apache was swamped by request for /guc. "webservice start" for that, and pkill -HUP -u local-guc.
- 12:54 scfc_de: tools-webserver-02: Rebooted, apache2/error.log told of OOM, though more than 1G free memory.
- 12:50 scfc_de: tools-webserver-03: Rebooted, scripts were timing out
- 12:42 scfc_de: tools-webproxy: Rebooted; wasn't accessible by ssh.
March 1
- 03:42 Coren: disabled puppet in pmtpa tool labs\
February 28
- 14:46 wm-bot: petrb: extending /usr on tools-dev by 800mb
- 00:26 scfc_de: tools-webserver-02: Rebooted; inaccessible via ssh, http said "500 Internal Server Error"
February 27
- 15:28 scfc_de: chmod g-w ~fsainsbu/.forward
February 25
- 22:48 rdwrer: Lol, so, something happened with grrrit-wm earlier and nobody logged any of it. It was yoyoing, Yuvi killed it, then aude did something and now it's back.
February 23
- 20:46 scfc_de: morebots: labs HUPped to reconnect to IRC
February 21
- 17:32 scfc_de: tools-dev: mount -t nfs -o nfsvers=3,ro labstore1.pmtpa.wmnet:/publicdata-project /public/datasets; automount seems to have been stuck
- 15:24 scfc_de: tools-webserver-03: Rebooted, wasn't accessible by ssh and apparently no access to /public/datasets either
February 20
- 21:23 scfc_de: tools-login: Disabled crontab for local-rezabot and left a message at User talk:Reza#Running bots on tools-login, etc. (fa:بحث_کاربر:Reza1615 is write-protected)
- 20:15 scfc_de: tools-login: Disabled crontab for local-chobot and left a message at ko:사용자토론:ChongDae#Running bots on tools-login, etc.
- 10:42 scfc_de: tools-mail: rm -f /var/log/exim4/paniclog ("User 0 set for local_delivery transport is on the never_users list", cf. [[bugzilla:61583]])
- 10:30 scfc_de: tools-login: rm -f /var/log/exim4/paniclog (OOM)
- 10:28 scfc_de: Reset error status of task@tools-exec-09 ("can't get password entry for user 'local-voxelbot'"); "getent passwd local-voxelbot" works on tools-exec-09, possibly a glitch
February 19
- 20:21 scfc_de: morebots: Set "enable_twitter=False" in confs/labs-logbot.py and restarted labs-morebots
- 19:14 scfc_de: tools-login: Disabled crontab and pkill -HUP -u fatemi127
February 18
- 11:42 scfc_de: tools-mail: Rerouted queued mail (@tools-login.pmtpa.wmflabs => @tools.wmflabs.org)
- 11:34 scfc_de: tools-exec-08: Rebooted due to not responding on ssh and SGE
- 10:39 scfc_de: tools-mail: rm -f /var/log/exim4/paniclog ("User 0 set for local_delivery transport is on the never_users list" => probably artifacts from Coren's LDAP changes)
- 10:37 scfc_de: tools-login: rm -f /var/log/exim4/paniclog (OOM)
February 14
- 23:54 legoktm: restarting grrrit-wm since it disappeared
- 08:19 scfc_de: tools-login: rm -f /var/log/exim4/paniclog (OOM)
February 13
- 13:11 scfc_de: Deleted old job of user veblenbot stuck in error state
- 13:08 scfc_de: Deleted old jobs of user v2 stuck in error state
- 10:49 scfc_de: tools-login: Commented out local-shuaib-bot's crontab with a pointer to Tools/Help
February 12
- 07:51 wm-bot: petrb: removed /data/project/james/adminstats/wikitools per request from james on irc
February 11
- 15:47 scfc_de: Restarted webservice for geohack
- 13:02 scfc_de: tools-login: rm -f /var/log/exim4/paniclog (OOM)
- 13:00 scfc_de: Killed -HUP local-hawk-eye-bot's jobs; one was hanging with a stale NFS handle on tools-exec-05
February 10
- 23:16 Coren: rebooting webproxy (braindead autofs)
February 9
- 18:14 legoktm: restarting grrrit-wm, it keeps joining and quitting
- 04:27 legoktm: rebooting grrrit-wm - https://gerrit.wikimedia.org/r/#/c/112308
February 6
- 22:50 legoktm: restarting grrrit-wm https://gerrit.wikimedia.org/r/111889
February 4
- 20:38 legoktm: restarting grrrit-wm: 'Send mediawiki/extension/Thanks to -corefeatures' https://gerrit.wikimedia.org/r/111257
January 31
- 03:43 scfc_de: Cleaned up all exim queues
- 01:26 scfc_de: chmod g-w ~{bgwhite,daniel,euku,fale,henna,hydriz,lfaraone}/.forward (test: sudo find /home -mindepth 2 -maxdepth 2 -type f -name .forward -perm /g=w -ls)
January 30
- 21:48 scfc_de: chmod g-w ~fluff/.forward
- 21:40 scfc_de: local-betabot: Added "-M" option to crontab's qsub call and rerouted queued mail (freeze, exim -Mar, exim -Mmd, thaw)
- 18:33 scfc_de: tools-exec-04: puppetd --enable (apparently disabled sometime around 2014-01-16?!)
- 17:25 scfc_de: tools-exec-06: mv -f /etc/init.d/nagios-nrpe-server{.dpkg-dist,} (nagios-nrpe-server didn't start because start-up script tried to "chown icinga" instead of "chown nagios")
January 28
- 04:27 scfc_de: tools-webproxy: Blocked Phonifier
January 25
- 05:37 scfc_de: tools-webserver-02: rm -f /var/log/exim4/paniclog (OOM)
January 24
- 01:07 scfc_de: tools-db: Removed /var/lib/mysql2, set expire_logs_days to 1 day
- 00:11 scfc_de: tools-db: and restarted mysqld
- 00:11 scfc_de: tools-db: Moved 4.2 GBytes of the oldest binlogs to /var/lib/mysql2/
January 23
- 19:24 legoktm: restarting grrrit-wm now https://gerrit.wikimedia.org/r/#/c/109116/
- 19:23 legoktm: ^ was for grrrit-wm
- 19:23 legoktm: re-committed password to local repo, not sure why that wasn't committed already
January 21
- 17:41 scfc_de: tools-exec-09: iptables-restore /data/project/.system/iptables.conf
January 20
- 07:02 andrewbogott: merged a lint patch to the gridengine module. Should be a noop
January 16
- 17:11 scfc_de: tools-exec-09: "iptables-restore /data/project/.system/iptables.conf" after reboot
January 15
- 13:36 scfc_de: After reboot of tools-exec-09, all continuous jobs were successfully restarted ("Rr"); task jobs (1974113, 2188472) failed ("19 : before writing exit_status")
- 13:27 scfc_de: tools-login: rm -f /var/log/exim4/paniclog (OOM)
- 08:54 andrewbogott: rebooted tools-exec-09
- 08:32 andrewbogott: rebooted tools-db
January 14
- 15:10 scfc_de: tools-login: pkill -u local-mlwikisource: Freed 1 GByte of memory
- 14:58 scfc_de: tools-login: Disabled local-mlwikisource's crontab with explanation
- 13:57 scfc_de: tools-webserver-02: rm -f /var/log/exim4/paniclog (out of memory errors on 2014-01-10)
January 10
- 10:41 legoktm: grrrit-wm: restarting https://gerrit.wikimedia.org/r/106670
- 09:00 legoktm: grrrit-wm: setting up #mediawiki-feed, https://gerrit.wikimedia.org/r/106555
January 9
- 18:26 legoktm: rebased grrrit-wm on origin/master since fetching gerrit was failing
- 18:21 legoktm: restarting grrrit-wm https://gerrit.wikimedia.org/r/#/c/106501/
January 8
- 13:44 scfc_de: Cleared error states of continuous@tools-exec-05, task@tools-exec-05, task@tools-exec-09
January 7
- 18:59 scfc_de: tools-login, tools-mail: rm -f /var/log/exim4/paniclog (apparently some artifacts of the LDAP failure)
January 6
- 14:06 YuviPanda: deleted instance tools-mc, didn't know it had come back from the dead
January 1
- 13:24 scfc_de: tools-exec-02, tools-master, tools-shadow, tools-webserver-01: Commented out duplicate MariaDB entries in /etc/apt/sources.list and re-ran apt-get update
- 11:27 scfc_de: tools-webserver-01, tools-webserver-01: rm -f /var/log/exim4/paniclog; out of memory errors
- 11:18 scfc_de: Emptied /{data/project,home}/.snaplist as the snapshots themselves are not available
December 27
- 07:39 legoktm: grrrit-wm restart didn't really work.
- 07:38 legoktm: restarting grrit-wm, for some reason it reconnected and lost its cloak
December 23
- 18:30 marktraceur: restart grrrit-wm for subbu
December 21
- 06:50 scfc_de: tools-exec-01: Commented out duplicate MariaDB entries in /etc/apt/sources.list and re-ran apt-get update
December 19
- 17:22 marktraceur: deploying grrrit config change
December 17
- 23:19 legoktm: rebooted grrrit-wm with new config stuffs
December 14
- 18:13 marktraceur: restarting grrrit-wm to fix its nickname
- 13:17 scfc_de: tools-exec-08: Purged packages libapache2-mod-suphp and suphp-common (probably remnants from when the host was misconfigured as a webserver)
- 13:09 scfc_de: tools-dev, tools-login, tools-mail, tools-webserver-01, tools-webserver-02: rm /var/log/exim4/paniclog (mostly out of memory errors)
December 4
- 22:15 Coren: tools-exec-01 rebooted to fix the autofs issue; will return to rotation shortly.
- 16:33 Coren: rebooting webproxy with new kernel settings to help against the DDOS
December 1
- 14:05 Coren: underlying virtualization hardware rebooted; tools-master and friends coming back up.
November 25
- 21:03 YuviPanda: created tools-proxy-test instance to play around with the dynamicproxy
- 12:16 wm-bot: petrb: deswapping -login (swapoff -a && swapon -a)
November 24
- 07:19 paravoid: disabled crontab for user avocato on tools-login, see above
- 07:17 paravoid: pkill -u avocato on tools-login, multiple /home/avocato/pywikipedia/redirect.py DoSing the bastion
November 14
- 09:12 ori-l: Added aude to lolrrit-wm maintainers group
November 13
- 22:36 andrewbogott: removed 'imagescaler' class from tools-login because that class hasn't existed for a year. Which, a year ago is before that instance even existed so what the heck?
November 3
- 16:49 ori-l: grrrit-wm stopped receiving events. restarted it; didn't help. then restarted gerrit-to-redis, which seems to have fixed it.
November 1
- 16:11 wm-bot: petrb: restarted terminator daemon on -login to sort out memory issues caused by heavy mysql client by elbransco
October 23
- 15:19 Coren: deleted tools-tyrant and tools-exec-cyberbot (cleanup of obsoleted instances)
October 20
- 18:52 wm-bot: petrb: everything looks better
- 18:51 wm-bot: petrb: restarting apache server on tools-webproxy
- 18:49 wm-bot: petrb: installed links on -dev and going to investigate what is wrong with apaches, documentation, Coren, please update it
October 15
- 21:03 Coren: labs-login rebooted to fix the ownership/take issue with success.
October 10
- 09:49 addshore: tools-webserver-01is getting a 500 Internal Server Error again
September 23
- 06:44 YuviPanda: remove unpuppetized install of openjdk-6 packages causing problems in -dev (for bug: 54444)
- 06:44 YuviPanda: remove unpuppetized install of openjdk-6 packages causing problems in -dev (for bug: 54444)
- 05:15 legoktm: logging a log to test the log logging
- 05:13 legoktm: logging a log to test the log logging
September 11
- 09:39 wm-bot: petrb: started toolwatcher
August 24
- 18:00 wm-bot: petrb: freed 1600mb of ram by killing yasbot processes on -login
- 17:59 wm-bot: petrb: killing all python processes of yasbot on -login, this bot needs to run on grid, -login is constantly getting OOM because of this bot
August 23
- 12:17 wm-bot: petrb: test
- 12:15 wm-bot: petrb: making pv from /dev/vdb on new nodes
- 11:49 wm-bot: petrb: syncing packages of -login with exec nodes
- 11:48 petan: someone installed firefox on exec nodes, should investigate / remove
August 22
- 01:24 scfc_de: tools-webserver-03: Installed python-oursql
August 20
- 23:00 scfc_de: Opened port 3000 for intra-Labs traffic in execnode security group for YuviPanda's proxy experiments
August 19
- 09:52 wm-bot: petrb: deleting fatestwiki tool, requested by creator
August 16
- 00:16 scfc_de: tools-exec-01 doesn't come up again even after repeat reboots
August 15
- 15:14 scfc_de: tools-webserver-01: Simplified /usr/local/bin/php-wrapper
- 14:31 scfc_de: tools-webserver-01: "dpkg --configure -a" on apt-get's advice
- 14:24 scfc_de: chmod 644 ~magnus/.forward
- 03:07 scfc_de: tools-webproxy: Temporarily serving 403s to AhrefsBot/bingbot/Googlebot/PaperLiBot/TweetmemeBot/YandexBot until they reread robots.txt
- 02:02 scfc_de: robots.txt: "Disallow: /"
August 11
- 03:14 scfc_de: tools-mc: Purged memcached
August 10
- 02:36 scfc_de: Disabled terminatord on tools-login and tools-dev
- 02:24 scfc_de: chmod g-w ~whym/.forward
August 6
- 19:26 scfc_de: Set up basic robots.txt to exclude Geohack to see how that affects traffic
- 02:09 scfc_de: tools-mail: Enabled rudimentary Ganglia monitoring in root's crontab
August 5
- 20:32 scfc_de: chmod g-w ~ladsgroup/.forward
August 2
- 23:45 scfc_de: tools-dev: Installed dialog for testing
August 1
- 19:57 scfc_de: Created new instance tools-redis with redis_maxmemory = "7GB"
- 19:56 scfc_de: Added redis_maxmemory to wikitech Puppet variables
July 31
- 10:50 HenriqueCrang: ptwikis added graph with mobile edits
July 30
- 19:08 scfc_de: tools-webproxy: Purged popularity-contest and ubuntu-standard
- 07:32 wm-bot: petrb: deleted local-addbot jobs
- 02:01 scfc_de: tools-webserver-01: Symlinked /usr/local/bin/{job,jstart,jstop,jsub} to /usr/bin; were obsolete versions.
July 29
- 15:15 scfc_de: tools-webserver-01: rm /var/log/exim4/paniclog
- 15:10 scfc_de: Purged popularity-contest from tools-webserver-01.
- 02:40 scfc_de: Restarted toolwatcher on tools-login.
- 02:11 scfc_de: Reboot tools-login, was not responsive
July 25
- 23:37 Ryan_Lane: added myself to lolrrit-wm tool
- 12:06 wm-bot: petrb: test
- 07:11 wm-bot: petrb: created /var/log/glusterfs/bricks/ to stop rotatelogs from complaining about it being missing
July 20
- 15:19 petan: rebooting tools-redis
July 19
- 07:06 petan: instances were rebooted for unknown reasons
- 00:42 helderwiki: it works! :-)
- 00:41 legoktm: test
July 10
- 18:04 wm-bot: petrb: installing mysqltcl on grid
- 18:01 wm-bot: petrb: installing tclodbc on grid
July 5
- 19:38 AzaToth: test
- 19:36 AzaToth: test for example
- 18:23 Coren: brief outage of webproxy complete (back to business!)
- 18:13 Coren: brief outage of webproxy (rollback 2.4 upgrade)
July 3
- 13:44 scfc_de: Set "HostbasedAuthentication yes" and "EnableSSHKeysign yes" in tools-dev's /etc/ssh/ssh_config
- 12:58 petan: rebooting -mc it's aparently OOM dying
July 2
- 16:24 wm-bot: petrb: installed maria to all nodes so we can connect to db even from sge
- 12:19 wm-bot: petrb: installing packages -- libmediawiki-api-perl libdatetime-format-strptime-perl libbot-basicbot-perl libdatetime-format-duration-perl
July 1
- 18:39 wm-bot: petrb: started toolwatcher on - login
- 14:22 wm-bot: petrb: installing following packages on grid: libdata-dumper-simple-perl libhtml-html5-entities-perl libirc-utils-perl libtask-weaken-perl libobject-pluggable-perl libpoe-component-syndicator-perl libpoe-filter-ircd-perl libsocket-getaddrinfo-perl libpoe-component-irc-perl libxml-simple-perl
- 12:05 wm-bot: petrb: starting toolwatcher
- 11:40 wm-bot: petrb: tools is back o/
- 09:42 wm-bot: petrb: installing python -zmg -matplotlib @ dev
- 03:33 scfc_de: Rebooted tools-login apparently out of memory and not responding to ssh
June 30
- 17:58 scfc_de: Set ssh_hba to yes on tools-exec-06
- 17:13 scfc_de: Installed python-matplotlib and python-zmq on tools-login for YuviPanda
June 26
- 21:16 Coren: +Tim Landscheidt to project admins, local-admin
- 14:23 wm-bot: petrb: updating several packages on -login
- 13:43 wm-bot: petrb: killing old instance of redis: Jun15 ? 00:06:49 /usr/bin/redis-server /etc/redis/redis.conf
- 13:42 wm-bot: petrb: restarting redis
- 13:28 wm-bot: petrb: running puppet on -mc
- 13:27 wm-bot: petrb: adding ::redis role to tools-mc - if anything will break, YuviPanda did it :P
- 09:35 wm-bot: petrb: updated status.php to version which display free vmem as well
June 25
- 12:34 wm-bot: petrb: installing php5-mcrypt on exec and web
June 24
- 15:45 wm-bot: petrb: changed colors of root prompt productions vs testing
- 07:57 wm-bot: petrb: 50527 4186 22830 1 Jun23 pts/41 00:08:54 python fill2.py eats 48% of ram on -login
June 19
- 12:17 wm-bot: petrb: increasing limit on mysql connections
June 17
- 17:34 wm-bot: petrb: /var/spool/cron/crontabs/ has -rw------- 1 8006 crontab 1176 Apr 11 14:07 local-voxelbot fixing
June 16
- 21:23 Coren: 1.0.3 deployed (jobutils, misctools)
June 15
- 21:40 wm-bot: petrb: there is no lvm on -db which we need as hell - therefore no swap either nor storage for binary logs :( I got a feeling that mysql will die oom soonish
- 21:39 wm-bot: petrb: db has 5% free RAM eeeek
- 18:36 wm-bot: root: removed lot of ?audit? logs from exec-04 they were eating too much storage
- 18:23 wm-bot: petrb: temporarily disabling /tmp on exec-04 in order to set up lvm
- 18:23 wm-bot: petrb: exec-04 96% / usage, creating a new volume
- 12:33 wm-bot: petrb: installing redis on tools-mc
June 14
- 12:35 wm-bot: petrb: updating logsplitter to new version
June 13
- 21:59 wm-bot: petrb: replaced logsplitter on both apache servers with far more powerfull c++ version thus saving a lot of resources on both servers
- 12:43 wm-bot: petrb: tools-webserver-01 is running quite expensive python job (currently eating almost 1gb of ram) it may need to be fixed or moved to separate webserver, adding swap to prevent machine die OOM
- 12:22 wm-bot: petrb: killing process 31187 sort -T./enwiki/target -t of user local-enwp10 for same reason as previous one
- 12:21 wm-bot: petrb: killing process 31190 sort -T./enwiki/target of user local-enwp10 for same reason as previous one
- 12:17 wm-bot: petrb: killing process 31186 31185 69 Jun11 pts/32 1-13:14:41 /usr/bin/perl ./bin/catpagelinks.pl ./enwiki/target/main_pages_sort_by_ids.lst ./enwiki/target/pagelinks_main_sort_by_ids.lst because it seems to be a bot running on login server eating too many resources
June 11
- 07:36 wm-bot: petrb: installed libdigest-crc-perl
June 10
- 13:05 wm-bot: petrb: installing libcrypt-gcrypt-perl
- 08:45 wm-bot: petrb: updated /usr/local/bin/logsplitter on webserver-01 in order to fix !b 49383
- 08:45 wm-bot: petrb: updated /usr/local/bin/logsplitter on webserver-01 in order to fix become afcbot 49383
- 08:44 wm-bot: petrb: updated /usr/local/bin/logsplitter on webserver-01 in order to fix become afcbot 49383
- 08:25 wm-bot: petrb: fixing missing packages on exec nodes
June 9
- 20:44 wm-bot: petrb: moved logs on -login to separate storage
June 8
- 21:24 wm-bot: petrb: installing python-imaging-tk on grid
- 21:20 wm-bot: petrb: installing python-tk
- 21:16 wm-bot: petrb: installing python-flickrapi on grid
- 21:16 wm-bot: petrb: installing
- 16:49 wm-bot: petrb: turned off wmf style of vi on tools-dev feel free to slap me :o or do cat /etc/vim/vimrc.local >> .vimrc if you love it
- 15:33 wm-bot: petrb: grid is overloaded, needs to be either enlarged or jobs calmed down :o
- 09:55 wm-bot: petrb: backporting tcl 8.6 from debian
- 09:38 wm-bot: petrb: update python requests to version 1.2.3.1
June 7
- 15:29 Coren: Deleted no-longer-needed tools-exec-cg node (spun off to its own project)
June 5
- 09:52 wm-bot: petrb: on -dev
- 09:52 wm-bot: petrb: moving /usr to separate volume expect problems :o
- 09:41 wm-bot: petrb: moved /var/log to separate volume on -dev
- 09:31 wm-bot: petrb: houston we have problem, / on dev is 94%
- 09:28 wm-bot: petrb: installed openjdk7 on -dev
- 09:00 wm-bot: petrb: removing wd-terminator service
- 08:39 wm-bot: petrb: started toolwatcher
- 07:04 wm-bot: petrb: installing maven on -dev
June 4
- 14:49 wm-bot: petrb: installing sbt in order to fix b48859
- 13:28 wm-bot: petrb: installing csh on cluster
- 08:37 wm-bot: petrb: installing python-memcache on exec nodes
June 3
- 21:40 Coren: Rebooting -login; it's trashing. Will keep an eye on it.
- 14:15 wm-bot: petrb: removing popularity contest
- 14:11 wm-bot: petrb: removing /etc/logrotate.d/glusterlogs on all servers to fix logrotate daemon
- 09:43 wm-bot: petrb: syncing packages on exec nodes to avoid troubles with missing libs on some etc
June 2
- 08:39 wm-bot: petrb: installing ack-grep everywhere per yuvipanda and irc
June 1
- 20:57 wm-bot: petrb: installed this to exec nodes because it was on some and not on others cpp-4.4 cpp-4.5 cython dbus dosfstools ed emacs23 ftp gcc-4.4-base iptables iputils-tracepath ksh lsof ltrace lshw mariadb-client-5.5 nano python-dbus python-egenix-mxdatetime python-egenix-mxtools python-gevent python-greenlet strace telnet time -y
- 20:42 wm-bot: petrb: installing wikitools cluster wide
- 20:40 wm-bot: petrb: installing oursql cluster wide
- 10:46 wm-bot: petrb: created new instance for experiments with sasl memcache tools-mc
May 31
- 19:17 petan: deleting xtools project (requested by Cyberpower678)
- 17:24 wm-bot: petrb: removing old kernels from -dev because / is almost full
- 17:17 wm-bot: petrb: installed lsof to -dev
- 15:55 wm-bot: petrb: installed subversion to exec nodes 4 legoktm
- 15:47 wm-bot: petrb: replacing mysql with maria on exec nodes
- 15:46 wm-bot: petrb: replacing mysql with maria on exec nodes
- 15:14 wm-bot: petrb: installing default-jre in order to satisfy its dependencies
- 15:13 wm-bot: petrb: installing /data/project/.system/deb/all/sbt.deb to -dev in order to test it
- 13:04 wm-bot: petrb: installing bashdb on tools and -dev
- 12:27 wm-bot: petrb: removing project local-jimmyxu - per request on irc
- 10:54 wm-bot: petrb: killing process 3060 on -login (mahdiz 3060 1964 88 May30 ? 21:32:51 /bin/nano /tmp/crontab.Ht3bSO/crontab) it takes max cpu and doesn't seem to be attached
May 30
- 12:24 wm-bot: petrb: deleted job 1862 from queue (error state)
- 08:26 wm-bot: petrb: updated sql command
May 29
- 21:05 wm-bot: petrb: running sudo apt-get install php5-gd
May 28
- 20:00 wm-bot: petrb: installing p7zip-full to -dev and -login
May 27
- 08:46 wm-bot: petrb: changed config of mysql to use /mnt as path to save binary logs, this however requires server to be restarted
May 24
- 08:44 petan: setting up lvm on new exec nodes because it is more flexible and allows us to change the size of volumes on the fly
- 08:28 petan: created 2 more exec nodes, setting up now...
May 23
- 09:20 wm-bot: petrb: process 27618 on -login is constantly eating 100% of cpu, changing priority to 20
May 22
- 20:54 wm-bot: petrb: changing ownership of /data/project/bracketbot/ to local-bracketbot
- 14:28 labs-logs-bottie: petrb: installed netcat as well
- 14:28 labs-logs-bottie: petrb: installed telnet to -dev
- 14:02 Coren: tools-webserver-02 now live; / and /cluebot/ moved there
May 21
- 20:27 labs-logs-bottie: petrb: uploaded hosts to -dev
May 19
- 13:40 labs-logs-bottie: petrb: killing that nano process seems to be some hang and unattached anyway
- 12:59 labs-logs-bottie: petrb: changed priority of nano process to 19
- 12:55 labs-logs-bottie: petrb: local-hawk-eye-bot /bin/nano /tmp/crontab.d4JhUj/crontab eat too much cpu
- 12:50 petan: nvm previous line
- 12:50 labs-logs-bottie: petrb: vul alias viewuserlang
May 14
- 21:22 labs-logs-bottie: petrb: created a separate volume for /tmp on login so that temp files do not fragment root fs and it does not get filled up by them, it also makes it easier to track filesystem usage
- 13:16 Coren: reboot -dev, need to test kernel upgrade
May 10
- 15:08 Coren: create tools-webserver-02 for Apache 2.4 experimentation
May 9
- 04:12 Coren: added -exec-03 and -exec-04. Moar power!!1!
May 6
- 19:59 Coren: made tools-dev.wmflabs.org public
- 08:04 labs-logs-bottie: petrb: created a small swap on -login so that users can not bring it to OOM so easily and so that unused memory blocks can be swapined in order to use the remaining memory more effectively
- 08:00 labs-logs-bottie: petrb: making lvm from unused disk from /mnt on -login so that we can eventually use it somewhere if needed
May 4
- 17:50 labs-logs-bottie: petrb: foobar as well
- 17:47 labs-logs-bottie: petrb: removing project flask-stub using rmtool
- 15:33 labs-logs-bottie: petrb: fixing missing db user for local-stub
- 12:51 labs-logs-bottie: petrb: creating mysql accounts by hand for alchimista and fubar
May 2
- 20:49 labs-logs-bottie: petrb: uploaded motd to exec-N as well, with information which server users connected to
May 1
- 16:59 labs-logs-bottie: petrb: fixed invalid permissions on /home
April 27
- 18:54 labs-logs-bottie: petrb: installing pymysql using pip on whole grid because it is needed for greenrosseta (for some reason it is better than python-mysql package)
April 26
- 23:55 Coren: reboot to finish security updates
- 08:00 labs-logs-bottie: petrb: patching qtop
- 07:57 labs-logs-bottie: petrb: added tools-dev to admin host list so that qtop works and fixing the bug of qtop
- 07:28 labs-logs-bottie: petrb: installing GE tools to -dev so that we can develop new j|q* stuff there
April 25
- 19:00 Coren: Maintenance over; systems restarted and should be working.
- 18:18 labs-logs-bottie: petrb: we are getting in troubles with memory on tools-db there is only less than 20% free memory
- 18:01 Coren: Begin maintenance (login disabled)
- 13:21 petan: removing local-wikidatastats from ldap
April 24
- 13:17 labs-logs-bottie: petrb: sudo chown local-peachy PeachyFrameworkLogo.png
- 11:37 labs-logs-bottie: petrb: created new project stats and cloned acl from wikidatastats, which is supposed to be deleted
- 11:32 legoktm: wikidatastats attempting to install limn
- 11:15 labs-logs-bottie: petrb: installing npm to -login instance
- 07:34 petan: creating project wikidatastats for legoktm addshore and yuvipandianablah :P
April 23
- 13:32 labs-logs-bottie: petrb: changing permissions of cyberbot and peachy to 775 so that it is easier to use them
- 12:14 labs-logs-bottie: petrb: qtop on -dev
- 12:12 labs-logs-bottie: petrb: removed part of motd from login server that got there in a mysterious way
April 19
- 22:38 Coren: reboot -login, all done with the NFS config. yeay.
- 17:13 Coren: (final?) reboot of -login with the new autofs configuration
- 16:24 Coren: (rebooted -login)
- 16:24 Coren: autofs + gluster = fail
- 14:45 Coren: reboot -login (NFS mount woes)
April 15
- 22:29 Coren: also a test; note how said bot knows its place. :-)
- 22:14 andrewbogott: this is a test of labs-morebots.
- 21:49 andrewbogott: this is a test
- 15:41 labs-logs-bottie: petrb: installing p7zip everywhere
- 08:00 labs-logs-bottie: petrb: installing dev packages needed for YuviPanda on login box
April 11
- 22:39 Coren: rebooted tools-puppet-test (no end-user impact): hung filesystem prevents login
- 07:42 labs-logs-bottie: petrb: removed reboot information from motd
Server Admin Log
Nova_Resource:Rcm.cac/SAL
2016-05-07
- 22:37 Luke081515: updating the repos
2016-05-04
- 18:00 Luke081515: updating to master
2016-05-01
- 20:51 Luke081515: Updadateing repos & databases
2016-04-22
- 20:50 Luke081515: Upgrading all repos
2016-04-12
- 21:35 Luke081515: updating all repos before testing a patch from gerrit
2016-04-02
- 23:32 Luke081515: updating the repos
2016-03-30
- 23:41 Luke081515: updating all repos
2016-03-29
- 00:37 Luke081515: updating all repos to master
2016-03-26
- 16:44 Luke081515: updating all git repos
2016-03-25
- 14:10 Luke081515: enabling role monobook
- 14:04 Luke081515: starting the vm and running git-update
- 00:33 Luke081515: rebuild the instance
Nova_Resource:Tools.wikibugs/SAL
2016-05-06
- 19:47 valhallasw`cloud: temporarily offline for Danny_B's batch job
- 09:12 wikibugs: Updated channels.yaml to: 904ce69d7f515ea9b47f621686f612a04ae94cc2 Send WMDE-Design to #wikimedia-de-tech
2016-05-05
- 21:37 wikibugs: Updated channels.yaml to: 6e0a2d140f923140c73b0dd5a356816a1cb47aa5 Add wildcard for Collaboration team (Collab-Team(-.*)?)
2016-04-22
- 00:38 wikibugs: Updated channels.yaml to: 98b78de2f209713741863c9dbf5a14eba7164ccd Update for renamed xTools project
2016-04-21
- 18:04 wikibugs: Updated channels.yaml to: 68d22345224b2e64698527a65b21c1a9cba9f84f Added #wikimedia-interactive
2016-04-15
- 16:40 wikibugs: Updated channels.yaml to: 63a166edb4d8218e1edf42346b5b8a90e4b6d647 Merge "Revert "Log hackathon tasks to #wmhack""
2016-04-11
- 17:58 ircnotifier: valhallasw: Deployed 170e3ace519867782ecb709eb095059b063d1cd1 Merge "Try to fix display of colours appearing directly before numbers" wb2-irc
- 17:57 wikibugs: Updated channels.yaml to: 4acf9e002ad00a8af3553833f99b754bfc0e189c Merge "Add #wikimedia-ai channel"
2016-04-08
- 22:51 wikibugs: Updated channels.yaml to: 2fc4b94daebe62f8e5e8712d753fabb5878e418a Echo was renamed to Notifications
2016-04-01
- 09:54 wikibugs: Updated channels.yaml to: a1754bac96c8fcb14532b808d5ca9db15e8f3c25 Merge "Log hackathon tasks to #wmhack"
2016-03-29
- 18:53 wikibugs: Updated channels.yaml to: ad14e0d1a0a07b775d014e6f8f1edaf145349116 Add two Collaboration team boards
2016-03-28
- 02:30 wikibugs: Updated channels.yaml to: fcffc4d11cddf0e11b5353bf50a9c36f4d989090 Send Community-Wishlist-Survey stuff to #wikimedia-commtech
2016-02-18
- 18:58 ircnotifier: legoktm: Deployed 8e84aa72b250ef18421c47ace7b4422e949c5837 Ignore comments posted to Phabricator by Stashbot wb2-irc
2016-02-16
- 16:30 wikibugs: Updated channels.yaml to: f4951fed10578ac63d80a6ec95e0927aaa90411e Remove ContentTranslations bugs from #mediawiki-i18n
2016-02-11
- 02:16 wikibugs: Updated channels.yaml to: 164d65c02aa22ec6e53f05ec74c35dfd58d11c24 -releng and -devtools changes
2016-02-08
- 22:10 wikibugs: Updated channels.yaml to: 2dd0d574c0e2bfcd5285493664f884f2ddc54b99 Send Education-* to wikimedia-ed
2016-01-07
- 20:53 wikibugs: Updated channels.yaml to: db26b7db94db89a49fac63df54d0189cf39ffc90 Send Labs* to `#wikimedia-labs`
2015-12-21
- 18:31 valhallasw`cloud: and restarted with fab start-jobs. Welcome back, wikibugs.
- 18:30 valhallasw`cloud: ah, there are SGE processes running. OK, killing those as well.
- 18:28 valhallasw`cloud: what's even weirder is that it starts both wikibugs.py and redis2irc.py, which are two distinct SGE jobs. Uuh?
- 18:27 valhallasw`cloud: yet it respawns! What on earth. Again from 208.80.155.186, and killed again.
- 18:26 valhallasw`cloud: killed wikibugs manually, no SGE in sight.
- 18:24 valhallasw`cloud: using `listlogins` in nickserv, we find one running on 208.80.155.186 (-1409), one on 208.80.155.145 (-1405, just restarted)
- 18:20 valhallasw`cloud: duplicate wikibugs, trying qmod -rj
2015-12-07
- 20:39 valhallasw`cloud: wb2-irc thinks it's connected but messages don't actually get out to IRC. Restarting.
2015-12-03
- 23:11 wikibugs: Updated channels.yaml to: 74f9c1e0e07d47abc0ca706040faaf90b1ea585d Add PAWS to pywikibot and labs channels
2015-11-04
- 16:44 wikibugs: Updated channels.yaml to: 9a7f239ec5c34604d9e48901fc28e997ea53a5e4 Add #Testing-Initiative-2015 to -releng
- 04:31 wikibugs: Updated channels.yaml to: 775987cc6b7998d7495fcae652546ab2df0d1d6a Send all User-* projects to /dev/null
2015-10-28
- 18:28 wikibugs: Updated channels.yaml to: e5e90fdb7faaa2b992321b1facd2799ae25d61e7 send Mailing lists tickets to #wikimedia-mailman
2015-10-21
- 18:32 wikibugs: Updated channels.yaml to: b4e285f9673929b8547902be466e04d903d3237d Add WMDE-Analytics-Engineering to #wikimedia-de-tech
2015-10-07
- 16:57 wikibugs: Updated channels.yaml to: c78efa6e621b316c26fca060661f59558e8bafa5 Merge "Also exclude TCB-Team- from #wikimedia-fundraising"
2015-10-06
- 00:34 wikibugs: Updated channels.yaml to: 9da0a4809b8d990d2a87d465868d8a8c8fd549b1 Send Beta-Cluster-Infrastructure to #wikimedia-releng
2015-09-27
- 19:34 wikibugs: Updated channels.yaml to: d2f4a855aa8e5f6a9d06e43c35abcc9448f57b73 Add MediaWiki-Codesniffer to -releng
2015-09-24
- 17:41 wikibugs: Updated channels.yaml to: bf1ac0ad9fd2f358aeb62335516c6ca4304b649d Add #MediaWiki-Releasing to -releng
- 16:40 wikibugs: Updated channels.yaml to: eca76c2669d6b0d308b2457b146dbef7f5f91d26 Add #releng-epics to -releng
2015-09-23
- 14:02 wikibugs: Updated channels.yaml to: 06f2a7d0a89baac1a1255c8e2524afa9654e09c1 Send all Community-Tech-* traffic to #wikimedia-commtech
2015-09-22
- 22:54 wikibugs: Updated channels.yaml to: 40a3bfed7706047eb5df3b7f36fda29c54cec3fc Pywikibot-Flow → #wikimedia-collaboration
2015-09-19
- 05:02 wikibugs: Updated channels.yaml to: 4d6ce23148a4fa57c84139465beeb84361ebb6de Add releng-(.*) to catch all releng planning tags
2015-09-18
- 09:20 wikibugs: Updated channels.yaml to: 6a3566594f734c84b9e7d85600ae39b33ab366de releng is now Release-Engineering-Team
- 03:11 wikibugs: Updated channels.yaml to: f9dab0c9be84689bc24046982ef22e22d45402b7 OTRS → #wikimedia-otrs
2015-09-16
- 17:27 ircnotifier: legoktm: Deployed bcce439fda97c0a91b6ef983221f336a3da0cf99 Wait at least 1 second before pushing into redis wb2-phab
- 15:04 wikibugs: Updated channels.yaml to: 08ac39ff3184c179434bb9f36187adcea5ea8f24 Remove ECT and old/dead projects from -devtools
2015-09-07
- 11:07 wikibugs: Updated channels.yaml to: 04c06838cc50d916c6cc11b20776a62b8b5fbdc1 Report ArticlePlaceholder to #wikidata-feed
2015-09-04
- 20:56 wikibugs: Updated channels.yaml to: 0a914ec9da79d3e0b8a6dcbe825a2ac54eb03446 add notifs for #wikimedia-ios room
- 03:13 wikibugs: Updated channels.yaml to: d18a53d6498e5faa77a03b35261f9d26fd51766d Add #Differential for -releng
2015-09-03
- 16:26 wikibugs: Updated channels.yaml to: 6790b1bed753260168db20bb78dcfa5726be2aec Deprecate gitblit, and migrate gerrit
- 05:46 ircnotifier: legoktm: Deployed 1564da8bd2a53f9899e93497f03ba13e4a6b734f Forrestbot → ReleaseTaggerBot wb2-irc
2015-09-02
- 00:03 wikibugs: Updated channels.yaml to: 5f5fbb9243566dd512f1b1bf65ed60e5e08d6e92 Naming is hard
2015-08-28
- 05:14 wikibugs: Updated channels.yaml to: 30a3e422993291ab995a487ae1d4bbc2e7cd4013 Send Community-Tech traffic to #wikimedia-commtech
2015-08-25
- 18:04 wikibugs: Updated channels.yaml to: a541227fe36479f99a74f8d56586bcf4b8f55108 Send Collaboration-Team(-.*)? to #wikimedia-collaboration
2015-08-19
- 08:25 wikibugs: Updated channels.yaml to: b03ba14e0809cf29d6fad807c931b7b9bafb0b2f Add RelEng-Admin, CI-Config, Scap3 plus reorder
2015-08-04
- 17:48 wikibugs: Updated channels.yaml to: 5b039d4ec16094a553be431e4d94f0bf880cfa47 Send GlobalRename changes to #wikimedia-rename
2015-07-30
- 21:11 wikibugs: Updated channels.yaml to: f638b92139d824b52c8e37284b4e5bedf07cf52c Filter WMDE- out of #wikimedia-fundraising
2015-07-28
- 20:59 ircnotifier: legoktm: Deployed 680c8aad81158a3ddb1c4018233c07729c163cc0 Don't notify if multiple ignored actions were triggered wb2-phab
July 2
- 18:21 wikibugs: Updated channels.yaml to: ac57db111909071aa63faf61ff9a2a1fee1c693f xtools moved to wikimedia-xtools Change-Id: Ie84324e718d8025f4a1381f36d9ff5f4e9c5848d
- 17:52 valhallasw`cloud: restarting wikibugs using fab to get verbose logs & logrotate back
- 17:33 ircnotifier: valhallasw: Deployed 0f163852ed56e50bbfeb53377a0913570ee21fea Merge "Revert "Use NOTICE instead of PRIVMSG"" wb2-irc
- 17:29 ircnotifier: valhallasw: Deployed bd6cbfabe79cc254c9525f136ae981ef20479c1e Merge "Use NOTICE instead of PRIVMSG" wb2-irc
June 15
- 20:13 wikibugs: Updated channels.yaml to: 317ea9408296ac9c0e0b8cfe3b9fe1952ac57f04 Change #wikidata to #wikidata-feed
June 10
- 11:38 wikibugs: Updated channels.yaml to: a8b1cb73fc9c7f07e8d8329a5fac09f4974a2c5d Add 3 probjects to #wikimedia-de-tech
June 9
- 17:21 wikibugs: Updated channels.yaml to: fb7b824b7b6310ecbe957d07d66eaa4b1dbb8e6e Move ResourceLoader and Performance-Team #wikimedia-perf
June 5
- 20:59 ircnotifier: legoktm: Deployed 82b0b9f487ece85a40595b80f3f690554743e472 Ignore Forrestbot wb2-phab, wb2-irc
May 25
- 08:18 ircnotifier: valhallasw: Deployed 82b0b9f487ece85a40595b80f3f690554743e472 Ignore Forrestbot wb2-phab, wb2-irc
May 19
- 18:50 ircnotifier: valhallasw: Deployed f9b9d5bda60b9f1f6aac196254f8b6cfff6d58a2 Send Graph-VE MW extension project to VE channel wb2-phab, wb2-irc
May 2
- 12:37 wikibugs: Updated channels.yaml to: f9b9d5bda60b9f1f6aac196254f8b6cfff6d58a2 Send Graph-VE MW extension project to VE channel
May 1
- 22:10 wikibugs: Updated channels.yaml to: 831099cc50dbc6828c2ef5ff8f2e6aa41cd97310 Put a few things into -editing.
- 21:55 wikibugs: Updated channels.yaml to: b6c7fa03a61f5b27061be11900b6e432d500b765 Remove definitions for #wikimedia-mobile
April 29
- 16:43 ircnotifier: valhallasw: Deployed 1d785dc3ad22a434749f8ec0d466180f3de9ea52 channels: Continuous-Integration is now Continuous-Integration-Infrastructure wb2-phab, wb2-irc
April 24
- 21:14 wikibugs: Updated channels.yaml to: 1d785dc3ad22a434749f8ec0d466180f3de9ea52 channels: Continuous-Integration is now Continuous-Integration-Infrastructure
April 21
- 03:39 ircnotifier: legoktm: Deployed 8e88fc89deaa41b2a720845f5d20aa871ffa09d9 Add Blueprint skin to notify list for #wikimedia-design wb2-irc
- 03:39 ircnotifier: legoktm: Deployed 8e88fc89deaa41b2a720845f5d20aa871ffa09d9 Add Blueprint skin to notify list for #wikimedia-design wb2-phab
April 20
- 11:18 wikibugs: Updated channels.yaml to: 8e88fc89deaa41b2a720845f5d20aa871ffa09d9 Add Blueprint skin to notify list for #wikimedia-design
April 18
- 20:24 valhallasw`cloud: file system corruption?? channels.yaml is all \x00s and .git/objects/* is corrupt. Cleared .git/objects, git fetch --all'd and git checkout channels.yaml seems to bring wikibugs back to life
- 19:43 valhallasw`cloud: tools-redis doesn't respond to commands, which could explain why wb2-phab was hanging. But why is tools-redis completely broken?
- 19:41 valhallasw`cloud: now wb2-phab is functioning again, but wb2-irc is not reporting?! Restarting that as well
- 19:38 valhallasw`cloud: that is, the last message to irc. The bot is still running and doing ping/pongs. However, wikibugs.log is completely silent after that time. wb2-phab.err does have errors, but without timestamps, so it's basically useless. Restarting wb2-phab to see if that helps
- 19:36 valhallasw`cloud: last message in redis2irc.log was 2015-04-18 02:10:26,157
- 19:36 valhallasw`cloud: wikibugs has broken down again. Trying to figure out why.
April 13
- 19:05 wikibugs: Updated channels.yaml to: 4500101f021b8eec83899848932edaee98bd680a Merge "Tools-Labs-xTools to #xtools"
April 7
- 16:13 wikibugs: Updated channels.yaml to: 8a4346f6c5d0f826a9b3099d5f76339d7a64dcad Merge "Remove Quality Assurance from -releng"
April 1
- 20:56 wikibugs: Updated channels.yaml to: f09815aee08458b7fb283db7c7e0aed49e3b149d HACK: Always join channels on privmsg
- 20:56 ircnotifier: legoktm: Deployed f09815aee08458b7fb283db7c7e0aed49e3b149d HACK: Always join channels on privmsg wb2-irc
March 30
- 19:45 wikibugs: Updated channels.yaml to: b2c38567b32d82881baab5c3227f14a9b8e9fff5 Send MediaWiki-API-Team and Blocked-on-MediaWiki-API-Team to #mediawiki-core
March 23
- 18:44 wikibugs: Updated channels.yaml to: 90eed2a902164a9a1cf7930c7d9fb599ec9ae660 Send Commons to #wikimedia-commons-tech, per Steinsplitter
March 18
- 11:49 ircnotifier: yuvipanda: Deployed 23240bd0dc5aebcc2a94b6f1ac268e2e3ad41114 Add more projects for devtools and mobile wb2-phab, wb2-irc
March 16
- 17:16 wikibugs: Updated channels.yaml to: 23240bd0dc5aebcc2a94b6f1ac268e2e3ad41114 Add more projects for devtools and mobile
March 13
- 22:29 legoktm: restarted wb2-irc to see if it rejoins channels properly
- 20:44 wikibugs: Updated channels.yaml to: 9eafe437ff005a3232e7f7e89dbb2be54437f76c tox: Rename channels env to standard py34
March 11
- 04:36 legoktm: restarted both wb2-phab and wb2-irc
March 10
- 19:30 wikibugs: Updated channels.yaml to: 614ee42338f6ab3f8d0705d3f0358523189af00e send WMT bugs to #wmt
March 9
- 21:11 ircnotifier: legoktm: Deployed 614ee42338f6ab3f8d0705d3f0358523189af00e send WMT bugs to #wmt wb2-irc
- 20:36 wikibugs: Updated channels.yaml to: 614ee42338f6ab3f8d0705d3f0358523189af00e send WMT bugs to #wmt
- 20:26 wikibugs: Updated channels.yaml to: 614ee42338f6ab3f8d0705d3f0358523189af00e send WMT bugs to #wmt
- 20:18 ircnotifier: legoktm: Deployed 614ee42338f6ab3f8d0705d3f0358523189af00e send WMT bugs to #wmt wb2-irc
March 8
- 19:14 wikibugs: Updated channels.yaml to: 614ee42338f6ab3f8d0705d3f0358523189af00e send WMT bugs to #wmt
March 3
- 20:02 wikibugs: Updated channels.yaml to: 6da78462504cd023e0c31babb5cc56a7eae3a88a Merge "Use brown instead of red for orange (=release) projects"
- 16:44 ircnotifier: legoktm: Deployed 6da78462504cd023e0c31babb5cc56a7eae3a88a Merge "Use brown instead of red for orange (=release) projects" wb2-irc
February 28
- 22:57 valhallasw`cloud: also restart wikibugs; it seems the PRIVMSGs in the log don't actually show up on irc
- 22:53 valhallasw`cloud: false alarm, messages were reported (2015-02-28 22:50:43,202 - irc3.wikibugs - DEBUG - > PRIVMSG #mediawiki-parsoid :10Parsoid, 10VisualEditor, 10VisualEditor-EditingTools, etc) which is a few minutes ago
- 22:52 valhallasw`cloud: restarted wb2-phab to see if we get stuff from phab again
February 24
- 23:16 ircnotifier: legoktm: Deployed 6da78462504cd023e0c31babb5cc56a7eae3a88a Merge "Use brown instead of red for orange (=release) projects" wb2-irc
- 23:16 ircnotifier: legoktm: Deployed 6da78462504cd023e0c31babb5cc56a7eae3a88a Merge "Use brown instead of red for orange (=release) projects" wb2-phab
February 22
- 22:54 ircnotifier: valhallasw: Deployed 6da78462504cd023e0c31babb5cc56a7eae3a88a Merge "Use brown instead of red for orange (=release) projects" wb2-irc
- 21:44 ircnotifier: legoktm: Deployed 1f579477957417a693308fa9a23d2080821eb551 Volunteer? --> Lowest (priority) wb2-irc
February 20
- 20:04 wikibugs: Updated channels.yaml to: 26b8b9f5f812b092b02d33b3b29cf448dafb663a More fundraising projects to #wikimedia-fundraising
- 18:46 ircnotifier: valhallasw: Deployed 4cf33271a75d3655addac95cc16413ab1adc6488 Merge "Always show four tags, most relevant first" wb2-irc
February 18
- 17:31 ircnotifier: legoktm: Deployed 8ba77ed2d2c039a231f3265da01215e721480ce0 Merge "Log ALL the things!" wb2-phab
February 17
- 22:32 wikibugs: Updated channels.yaml to: 8ed0a167e287b7c3374f8b9b7e556e9b4b6180d6 Send AutoWikiBrowser to #autowikibrowser
February 16
- 13:51 valhallasw`cloud: reverted locally (git revert <new formatting commit>) and restarted as it was breaking people's workflows
- 04:35 wikibugs: Updated channels.yaml to: eb4a51a4628a6b26d4e798b3e55d8749231bb72c Add Blocked-on-RelEng to -releng
- 00:54 ircnotifier: legoktm: Deployed 4c82585a9c01bceeb91acabbc5b481ea5928327d Merge "Send Wikibase stuff to #wikidata" wb2-irc
- 00:53 ircnotifier: legoktm: Deployed 4c82585a9c01bceeb91acabbc5b481ea5928327d Merge "Send Wikibase stuff to #wikidata" wb2-phab
- 00:05 ircnotifier: legoktm: Deployed d5922a4d10169ec8870e55de1b74ea9e39dc8c5c Make sure URL is always present wb2-phab
- 00:05 ircnotifier: legoktm: Deployed d5922a4d10169ec8870e55de1b74ea9e39dc8c5c Make sure URL is always present wb2-irc
February 13
- 18:26 ircnotifier: valhallasw: Deployed 0941e5af42ab1c035b023246da5dde30b17c0f63 Remove Phabricator and Code-Review from -releng wb2-irc
- 18:01 ircnotifier: legoktm: Deployed 0941e5af42ab1c035b023246da5dde30b17c0f63 Remove Phabricator and Code-Review from -releng wb2-phab
- 17:30 ircnotifier: legoktm: Deployed 0941e5af42ab1c035b023246da5dde30b17c0f63 Remove Phabricator and Code-Review from -releng wb2-irc
February 11
- 17:53 wikibugs: Updated channels.yaml to: 0941e5af42ab1c035b023246da5dde30b17c0f63 Remove Phabricator and Code-Review from -releng
February 7
- 22:38 ircnotifier: valhallasw: Deployed 6d78d47f1eae25b63f4cd322a6737db58b8d5c7a Rework logging infrastructure wb2-phab, wb2-irc
February 6
- 19:42 ircnotifier: legoktm: Deployed 1b6bbd391ad1f23a8270d3547b2540064e452d94 Fix project tag screen scraping wb2-phab
February 5
- 22:51 ircnotifier: valhallasw: Deployed d9a83a0d71b0dd4500d40ebba5232b2ded362be5 Assume channel list is utf-8 wb2-irc
- 22:13 ircnotifier: valhallasw: Deployed 9054845f4a69a7364f5270e2ada574f696e4f70f Add MoodBar to wikimedia-collaboration wb2-phab
February 3
- 06:05 ircnotifier: legoktm: Deployed 9054845f4a69a7364f5270e2ada574f696e4f70f Add MoodBar to wikimedia-collaboration wb2-phab
- 05:39 wikibugs: Updated channels.yaml to: 9054845f4a69a7364f5270e2ada574f696e4f70f Add MoodBar to wikimedia-collaboration
February 2
- 19:08 wikibugs: Updated channels.yaml to: 490e8ba1784e8ef7b04d2f51d2697f1d670d6cb1 Announce Staging bugs to -releng
January 30
- 04:28 wikibugs: Updated channels.yaml to: 4fe2e5b9f9d699d3547aba5b320fdf9ce1bd96b0 Send fundraising stuff to our channel
January 28
- 14:29 wikibugs: Updated channels.yaml to: 9f4845ee4937cc9bf890bc7ea2251ca6613080e0 Merge "Labs-Team was renamed to Labs"
January 22
- 22:21 wikibugs: Updated channels.yaml to: 6e130fecc19a39a5caed12ca0dda25ad28df62f0 -WikidataRepo is now -WikidataRepository
January 19
- 20:52 valhallasw: is this really broken? :(
January 15
- 20:03 ircnotifier: legoktm: Deployed c61edcfab64d62081edc3ccf89534764017f4a1c Make sure we're in the channel before messaging it wb2-irc
January 14
- 22:52 ircnotifier: legoktm: Deployed 492438a4da3bd10a6e53bd248c997a02edb9d781 Fix wikibugs after Phabricator update wb2-phab
- 12:17 ircnotifier: yuvipanda: Deployed 9521ec19491d35ebc40fdccb34e75e0bd7f9399f Turn ssl off wb2-phab, wb2-irc
- 12:10 ircnotifier: yuvipanda: Deployed 8736032750b4fead35646ea9120621bf9d0ccb7e Only join actual channels wb2-phab, wb2-irc
- 12:09 ircnotifier: yuvipanda: Deployed 8736032750b4fead35646ea9120621bf9d0ccb7e Only join actual channels wb2-phab, wb2-irc
- 11:05 ircnotifier: yuvipanda: Deployed 2b66af26ca2a7343d0743423a4c9fcc6b8296e5e Disentangle tag lists for filtering vs display wb2-phab, wb2-irc
- 11:02 ircnotifier: yuvipanda: Deployed 2b66af26ca2a7343d0743423a4c9fcc6b8296e5e Disentangle tag lists for filtering vs display wb2-phab, wb2-irc
January 12
- 22:04 wikibugs: Updated channels.yaml to: f1ee8fb8bc64186a1613ec9f9faf0aef6315759a Merge "team-practices -> #wikimedia-teampractices"
January 11
- 00:06 wikibugs: Updated channels.yaml to: 2257da8655036ce4555e88c01dad4a85f0b7946e WM-Bot -> #wm-bot
January 10
- 22:26 wikibugs: Updated channels.yaml to: f6b5ed8212e566b726a59de69a707ebea6c70d4e Remove -qa from announce list (moved to -releng)
- 18:06 wikibugs: Updated channels.yaml to: 09630c2cd5beead10c0ab2bfd8df84e7337a0208 LabsDB-Auditor -> labs
January 9
- 18:28 wikibugs: Updated channels.yaml to: 4c4fc344a850a36ef47a7c9965c853a27863baac Make sure to reset to origin/master, and show current sha1 before doing so
January 8
- 10:59 wikibugs: Updated channels.yaml to: 29b1c027a31c7650094b195e70e3a4ac82c05d00 Merge "Add Wikibugs to -labs"
- 09:51 wikibugs: Updated channels.yaml to: 019f6b0366a97df69733f7c80303aec8058ecb79 Wikibugs should listen to the Multimedia project for the multimedia channel
January 7
- 23:32 wikibugs: Updated channels.yaml to: 9538cc69ef4226d248a38fa86dadca6d646b6b37 Merge branch 'master' of https://github.com/wikimedia/labs-tools-wikibugs2
- 15:29 wikibugs: Updated channels.yaml to: cc8bc876e23c6b58f06a7379273f34e858b6ade5 Merge branch 'master' of https://github.com/wikimedia/labs-tools-wikibugs2
January 5
- 21:32 wikibugs: Updated channels.yaml to: 9003536427ce097a62c3a8cec310f7ca4f0edab0 Merge branch 'master' of https://github.com/wikimedia/labs-tools-wikibugs2
- 20:52 wikibugs: Updated channels.yaml to: c45fa33e94f5a34fa7618eaad1669104ace2a342 Merge branch 'master' of https://github.com/wikimedia/labs-tools-wikibugs2
December 31
- 16:48 wikibugs: Updated channels.yaml to: 0ba0b2c47cd593b64c4149931bfdaf022dff230c Merge branch 'master' of https://github.com/wikimedia/labs-tools-wikibugs2
December 22
- 23:18 ircnotifier: legoktm: Deployed 101deca5ee3884d19a61fe7098f8296ddb0c43e0 Escape newlines in IRC output wb2-irc
December 18
- 20:47 wikibugs: Updated channels.yaml to: 9ad8e090c8fb06d487b89255562483f08cf354e3 Send Spam-* to /dev/null
- 20:27 ircnotifier: legoktm: Deployed 3dc8fd7f3f8fdaacec5998913278179382b8594f Report IRC using Python and Yuvi's ircnotifier wb2-irc
December 17
- 00:39 wm-bot: legoktm: Deployed 8502072659ddb8c55ae45026d54867771e3122e7 redis2irc: join channels after reloading config wb2-irc
- 00:28 wm-bot: legoktm: Deployed 8502072659ddb8c55ae45026d54867771e3122e7 redis2irc: join channels after reloading config wb2-irc
December 16
- 23:35 wikibugs: Updated channels.yaml to: 432e66a45273e9798e26a8df08caa5a102eeec97 Add #wikimedia-services reporting
- 22:05 wm-bot: valhallasw: Deployed 3ec300c6605ed2087ad6bf25bf43abb4c0319d18 fab: set use_ssh_config = True (no jobs restarted)
- 21:46 wm-bot: valhallasw: Deployed 366f1b524cb4aecbdf4825a8b96e9f66524fa727 Add fabric runner wb2-phab
- 21:14 wikibugs: Updated channels.yaml to: 9649aa14cf1b8fd63a0e6efd3ac1aff0c351b141 Auto-detect changes to channels.yaml and !log it
- 21:06 wm-bot: valhallasw: Deployed 9649aa14cf1b8fd63a0e6efd3ac1aff0c351b141 wb2-phab, wb2-irc
- 20:44 legoktm: restarting for https://gerrit.wikimedia.org/r/180245
December 10
- 19:18 legoktm: restarting phab listener for https://gerrit.wikimedia.org/r/178880
- 19:16 legoktm: restarting to pick up https://gerrit.wikimedia.org/r/178874 https://gerrit.wikimedia.org/r/178880
December 9
- 19:46 legoktm: restarting for https://gerrit.wikimedia.org/r/178578
- 19:22 legoktm: restarting for https://gerrit.wikimedia.org/r/178561 https://gerrit.wikimedia.org/r/178563
December 4
- 02:48 legoktm: restarted for https://gerrit.wikimedia.org/r/177372
December 1
- 14:31 YuviPanda: killed irc bot for now
November 29
- 21:25 legoktm: restarting for https://gerrit.wikimedia.org/r/176486
- 21:14 legoktm: restarting for https://gerrit.wikimedia.org/r/176483
- 14:19 valhallasw`cloud: deployed 0a6dedd75e203f5005a15e340b2fed5ba4c67224
November 25
- 23:21 legoktm: restarted wikibugs.py listener for https://gerrit.wikimedia.org/r/175890
- 00:07 legoktm: restarting wikibugs for -qa changes
November 24
- 18:29 legoktm: restarting wikibugs for https://gerrit.wikimedia.org/r/175474
- 02:54 legoktm: RIP pywikibugs
November 18
- 11:07 valhallasw`cloud: deployed https://github.com/legoktm/wikibugs2/commit/842d2d25a827dd2311ed98d1e4cd8af078bf10bb
October 11
- 20:14 legoktm: deploying https://gerrit.wikimedia.org/r/166217
September 24
- 04:45 legoktm: deployed https://gerrit.wikimedia.org/r/162201 (add OpenStackManager component to #wikimedia-labs)
August 19
- 19:06 valhallasw`cloud: new version deployed (gerrit 143239 and 143238)
July 1
- 16:47 valhallasw: restarted wikibugs with new channel config / https://gerrit.wikimedia.org/r/#/c/142992/ / Nemo_bis
May 22
- 19:14 valhallasw: changed git repo to have gerrit as master
- 12:27 valhallasw: [email protected] delivery functional again and wikibugs is correctly reporting to IRC
- 12:18 valhallasw: gmail-to-wikibugs delivery is now functional; hopefully [email protected] delivery too...
- 12:17 valhallasw: mail delivery broken; direct mails complain about open("~/mailout.log", "a") in to_redis.py; commented out those lines
April 30
- 20:59 valhallasw: Deployed a48a000
April 28
- 20:11 YuviPanda: restarted wikibugs, seems to have died
- 11:54 YuviPanda: deployed bf1be7b
- 06:42 valhallasw: deployed 2.0-1-gb7f4290
- 06:23 valhallasw: Merging and deploying b7bbf92
- 06:21 valhallasw: NameError: name '_wsp_splitter' is not defined in /data/project/wikibugs/src/pywikibugs/get_unstructured.py. Apparently the line 'from email._header_value_parser import _wsp_splitter, _validate_xtext' had not made it into the git repo, and was cleared by accident on deploymeny
- 06:17 valhallasw: wikibugs stopped reporting; investigating
April 27
Nova_Resource:Tools.heritage/SAL
2016-05-06
- 19:47 Lokal_Profil: Deployed latest pywikibot-core/2.0 from Git
- 19:26 Lokal_Profil: Deployed latest from Git, a724279 , d9ae73d (reverts 766d814 )
- 18:42 Lokal_Profil: Deployed latest from Git, 2d3ee40 (T39974), 766d814, e2fac07 and d2c242a (T39422)
- 14:58 JeanFred: Deployed latest from Git: e5a9f01 and d509343 (T134567)
- 12:15 JeanFred: Deployed latest from Git: db46042, c765e76, b5a731a, 7c27207, d4de720 (T134236), e7823ab & c83003b (T132647), 615ab28
2016-04-20
- 13:01 JeanFred: Deployed latest from Git, 48bce77 and dfbff9b (T132029)
2016-04-01
- 22:51 multichill2: JeanFred did a git pull for Phab:T131344 and others
2016-03-31
- 09:14 multichill: Commented out the Russian Wikipedia in user-config.py for Phab:T131344
2016-03-16
- 20:45 multichill: jsubbed populate_image_table.py for https://phabricator.wikimedia.org/T130107 (see crontab -l for exact command)
2015-08-30
- 14:38 multichill: Made local change to unused_images.py to get it to work, see https://phabricator.wikimedia.org/T110829
- 09:14 multichill: Updated ~/pywikibot to latest version, but still getting a FamilyMaintenanceWarning
2015-08-22
- 13:35 JeanFred: After backporting all local changes to Gerrit, updating local checkout to latest Git version.
2015-07-15
- 16:50 JeanFred: Checked out pywikibot-core
February 23
- 20:30 multichill: Merged https://gerrit.wikimedia.org/r/192258 , but can't deploy it because api/includes/FormatHtml.php has local (I18n) changes. Anyone feels like fixing?
December 21
- 11:49 multichill: After the toolserver.org dns move the http://toolserver.org/~erfgoed/ redirects seem to be broken. Akoopal mentioned this, see https://lists.wikimedia.org/pipermail/labs-l/2014-December/003216.html
September 20
- 15:56 multichill: Fixed https://bugzilla.wikimedia.org/show_bug.cgi?id=70806 and deployed new 2 new sk tables
August 27
- 18:05 multichill: Added Oren to the project
July 10
- 13:25 multichill: dns was broken, because of that api has been acting up for the last 2 (?) hours
- 11:46 lokal-profil: Corrected commands at commons:Commons:Monuments_database/Harvesting
- 11:20 multichill: Created ~/temp so that the change in https://gerrit.wikimedia.org/r/#/c/145254/1/api/includes/Defaults.php doesn't produce an error any more
- 09:57 lokal-profil: Images and markers in Kml now load from // instead of http://, gerrit
- 09:42 lokal-profil: Added se-arbetsliv a list for Working Life Museums in Sweden, gerrit
- 09:42 lokal-profil: Updated Default.php to point to toollabs instead of toolserver, gerrit
June 29
- 21:43 multichill: I put the RCE mysql conversion User:Akoopal made in ~/rce-nl-data . Still need to import it in Mysql to be useful. Data is CC0
- 19:30 multichill: Web service was down for all accounts. Back up and running. Api seems to have been down from 19:30 to 21:15 (Amsterdam time)
- 13:36 multichill: Burned the old ~erfgoed account on the Toolserver and uploading the backup to ~/toolserver_backup/
June 19
- 17:10 multichill: Fixed database_statistics.py after notification on https://commons.wikimedia.org/wiki/Commons_talk:Monuments_database/Statistics#Bug_in_the_URL . Still have to commit it
June 15
- 11:17 multichill: Did some hacks with Krinkle to get i18n working(ish) again (api.php and html formatters). Still need to commit it
June 14
- 19:28 multichill: Did the first steps to import the data to Wikidata. I wonder when we can deprecate the monument database
- 19:26 multichill: I sent out the Toolserver will die email. http://lists.wikimedia.org/pipermail/labs-l/2014-June/002672.html . I plan to drop the database p_erfgoed_p on the 21st.
- 11:33 multichill: Added Lokal Profil per request at Commons
June 7
- 16:57 multichill: While updating documentation I found https://git.wikimedia.org/summary/wikimedia%2Fwlm-api . Should probably be dropped, everything is in http://git.wikimedia.org/log/labs%2Ftools%2Fheritage.git
- 16:12 multichill: http://toolserver.org/~erfgoed/ now redirects to http://tools.wmflabs.org/heritage/ . Didn't move everything so that might give some 404's
- 16:06 multichill: prox_search completed without problems. update_monuments.sh should now run without failures.
- 15:52 multichill: symlinked ~/prox_search, fixed path (need to commit that), create_table_prox_search.sql , doing manual run
- 15:35 multichill: Had to increase memory for statitics to 512M. Still need to commit that. jsubbed build_stats_test again and it finished with Memory usage: 396588928
- 15:04 multichill: Symlinked ~/public_html/maintenance and create tables statistics and statisticsct. jsubbed build_stats_test to test it
- 14:49 multichill: Fixed populate_adm_tree.php and populated the table. Still need to commit it
June 4
- 20:04 multichill: Managed to get the image database updated by switching latin1 -> utf8. Still have to commit. https://commons.wikimedia.org/wiki/Commons:Monuments_database/Indexed_images/Statistics
- 19:58 multichill: Pointed https://commons.wikimedia.org/wiki/Template:Monuments_database_more_images to the api on labs. Was 15K hits on the Toolserver (?!)
- 19:23 multichill: https://gerrit.wikimedia.org/r/137398 pretty images live, see http://tools.wmflabs.org/heritage/api/api.php?action=images&imcountry=ad&imid=100&format=html&props=img_name
- 19:17 multichill: Fixed the mysqldump and enabled /data/project/heritage/erfgoedbot/populate_image_table.py
June 1
- 20:18 multichill: Set up cron to run the update_monuments job every night. Some parts of it will still fail.
- 20:05 multichill: Some tweaks in https://gerrit.wikimedia.org/r/136683 database is filled. Api is working (admintree and statistics still missing)
- 17:14 multichill: Updated ~/bin/create_all_monuments_tables.sh and created 105 tables. Fired up update_database.py to fill the database
- 17:01 multichill: Pull pywikibot (compat) and heritage. Symlinked it and setup the bot
- 16:46 multichill: Moved erfgoedbot, public_html & pywikipedia to ~/old/. to make room
- 16:41 multichill: Fixed ~/.database.inc , still have to do the i18n part
- 16:35 multichill: Cleaned out some code in https://gerrit.wikimedia.org/r/136649 and merged it
- 16:18 multichill: Created the s51138__heritage_p database
- 16:16 multichill: Replaced the .my.cnf with the right credentials
Release Engineering/SAL
Nova_Resource:Tools.admin/SAL
2016-05-06
- 18:45 bd808: Unbroke webservice
- 18:44 bd808: restarted webservice
Nova_Resource:Mobile/SAL
2016-05-05
- 23:22 bd808: Deleted jitsu instance. Replaced with https://tools.wmflabs.org/hatjitsu/
2015-08-24
- 17:05 YuviPanda: kill instance android-build to make space on labvirt1007 (android-builder is the successor)
December 31
- 20:54 bd808: Added jhobs as a project member
January 15
- 08:43 andrewbogott: rebooted mobile-varnish
November 1
- 16:26 MaxSem: Deleted old instances mobile-solr2, mobile-solr3 and mobile-osm2
October 31
- 19:00 andrewbogott: rebased and updated puppet files on mobile-solr2
January 19
- 21:24 Ryan_Lane: adding DNS name for newly allocated IP (mobile-geo.wmflabs.org)
- 21:24 Ryan_Lane: associated new IP with mobile-en
- 21:24 Ryan_Lane: allocated a new IP
- 21:24 Ryan_Lane: upped the floating IP quota to 2
January 4
- 20:31 preilly: associated mobile-feeds host name on wmflabs.org domain
- 20:27 Ryan_Lane: allocated IP 208.80.153.216
- 20:25 Ryan_Lane: upped the quota for floating ips to 1
Nova_Resource:Redirects/SAL
2016-05-05
- 23:21 bd808: Configured redirect for hatjitsu.wmflabs.org
Nova_Resource:Math/SAL
2016-05-05
- 22:58 bd808: Joined project as admin
March 4
- 21:35 andrewbogott: (and also because Howie requested it)
- 21:34 andrewbogott: moved http://drmf-beta.wmflabs.org to point to the drmf-beta instance, and http://drmf.wmflabs.org to point to the drmf instance. Because previously it was the other way around which was super confusing.
September 16
- 19:36 andrewbogott: moving and rebooting mws instance
January 17
- 08:57 andrewbogott: moving math-semantics to a new virt host to avoid a storage crunch. This will reboot the instance.
January 15
- 08:48 andrewbogott: rebooted latexml-test
August 29
- 06:05 Ryan_Lane: adding jiabao to work on math support for visualeditor