How IKEA Patched Shellshock 154
jones_supa writes: Magnus Glantz, IT manager at IKEA, revealed that the Swedish furniture retailer has more than 3,500 Red Hat Enterprise Linux servers. With Shellshock, every single one of those servers needed to be patched to limit the risk of exploitation. So how did IKEA patch all those servers? Glantz showed a simple one-line Linux command and then jokingly walked away from the podium stating "That's it, thanks for coming." On a more serious note, he said that it took approximately two and half hours to upgrade their infrastructure to defend against Shellshock. The key was having a consistent approach to system management, which begins with a well-defined Standard Operating Environment (SOE). Additionally, Glantz has defined a lifecycle management plan that describes the lifecycle of how Linux will be used at Ikea for the next seven years.
What was the command? (Score:1)
I imagine it was sudo rm -rf /, but I could be way off.
Re: (Score:2)
yum update -y && reboot
Re:What was the command? (Score:5, Informative)
yum update -y && reboot
You're going to type that on 3500 servers?
I think you'll want to use your configuration management platform to kick off the update. That's how we did it -- applied the update to the dev servers, did some testing, then the same to qa, then preprod, then finally to the production servers. Took us more than 2.5 hours to test and validate everywhere, but actually pushing out the patch to 1200 servers was a single line command.
Re: (Score:3)
Indeed, you definitely do NOT want hundreds-to-thousands of servers doing an update all at the same time, or, worse, rebooting all at the same time. The first has the potential to saturate your network and bring the entire setup to its knees, and the second will blow your rack supplies. I speak from experience on the latter, having been the one who identified the issue with our weekly DB scrubbing procedure once the company I was working for grew to more than a half dozen servers.
You want to stagger thing
Re: (Score:2)
Indeed, you definitely do NOT want hundreds-to-thousands of servers doing an update all at the same time, or, worse, rebooting all at the same time. The first has the potential to saturate your network and bring the entire setup to its knees, and the second will blow your rack supplies. I speak from experience on the latter, having been the one who identified the issue with our weekly DB scrubbing procedure once the company I was working for grew to more than a half dozen servers.
You want to stagger things by a few 10s of seconds per server on each rack to avoid power supply issues.
Man....I'd forgotten about the PDUs. Had that problem at one place where I brought down the DMZ because I rebooted a server. Fortunately that got a much needed datacenter review underway and people started distributing power correctly.
Re: (Score:2)
In some enterprise shops it is just SOP to reboot, usually a policy written by some change management managerial type who doesn't know when a reboot is actually required.
Re: (Score:1)
Rebooting regularly is good practice to ensure your servers are capable of coming back up if something accidentally knocks them down unexpectedly. See also Netflix's Chaos Monkey for a different but similar concept.
Re: (Score:2)
Here, scheduling the reboot of the 900 servers was the longest part of that patching effort.
O'Reilly? You had to reboot? And you still get paid as a sysadmin?!!(sigh).
Demonoid-Penguin - moderating (the non-stupid).
If you're just running a generic "yum update", then you have pretty good chances a new kernel will be pulled in...so yeah a reboot was probably called for.
Re: (Score:2)
Well I'd wrap it in a loop of some kind:
for host in `cat /dev/storage/admin/servers.dat`; do ssh root@$host "yum update -y && reboot"; done
Re: (Score:3)
Well I'd wrap it in a loop of some kind:
for host in `cat /dev/storage/admin/servers.dat`; do ssh root@$host "yum update -y && reboot"; done
You're going to watch the output for 1000+ servers to see which ones failed?
Re: (Score:2)
duh yeah! Thats why we have intern's!
Re: (Score:2)
Well, no. You'd run that inside a screen session, and with an ampersand not a semicolon.
Re: (Score:1)
Re: (Score:2)
pdsh FTW
Re: (Score:2)
Why not do it the way our ancestors did it? :P
for i in $(cat ips.txt); do
XXXXXXXXX
done;
Re: (Score:3)
You mean in an amateurish way that can overload shell buffers?
Try
while read i; do ...; done < ips.txt
or
xargs ... < ips.txt
Re: (Score:2)
while read i; do ...; done < ips.txt
How amateurish to spawn an unnecessary subshell.
xargs ... < ips.txt
Yes.
Re: (Score:2)
Not so fast, Sherlock.
xargs doesn't handle shell functions, only external binaries.
Re: (Score:2)
Well I don't know your preferred shell, but I suspect updating servers isn't implemented as shell built-ins, so we're good ;)
Re: (Score:1)
Cool story, bro.
Re:What was the command? (Score:4, Informative)
Re: (Score:2)
this is why God invented Ansible.
Re: (Score:2)
We're currently evaluating Ansible. I expect us to make the switch permanently as part of our move to docker containers. Currently, our puppet manifests are unwieldy and a biatch to maintain.
Re: (Score:3)
If you don't mind my asking, what's the difference between QA and preprod for you?
Re: (Score:2)
for i in {1..3500}; do ssh server$i yum update -y; ssh server$i reboot; done
better?
Re: (Score:2)
Your joke mighta been funny if it had contained a humorous punchline.
Re: (Score:2)
yum update -y && reboot
Actually, it kicked off a bash script that consisted of 100,000 commands that took a team of programmers six months to write and debug. But to him, management, it was just a single command that he typed in and took all the credit.
(it's a joke people)
Re: (Score:3)
I like Apple propaganda. It's much better than that awful Windoze propaganda.
Re: (Score:3)
I like Apple propaganda. And hypnotoad.
Someone post the one line command... (Score:2)
Let's save ourselves from unnecessary clickbait.
Re: Someone post the one line command... (Score:1)
The video is on the summit YouTube channel, but the command was ./patch
I was there too, it was a really good presentation.
Re: (Score:1)
You're averse to clickbait so you fequent Slashdot?
They Were Only Able to Do It (Score:2)
stage management (Score:3)
The moment would have been perfect if he'd just dropped the mic.
Comment removed (Score:5, Funny)
Re:a solid business model helps. (Score:5, Insightful)
If you have troubles putting together IKEA furniture, I imagine Duplo LEGO would be out of your league too...
Re: (Score:3)
Re: (Score:2)
Yes I realize thi
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Configuration management (Score:1)
Re: (Score:2)
Shellshock took less than 4 hours to fix across 20k hardware boxes and many many vm's. Most of that was testing the puppet manifest.
Just like their furniture (Score:3, Funny)
By making the customers do most of it themselves.
Re: (Score:1)
Whaddya mean? They made me put in just about every screw and peg on a bookshelf I bought, not just the "last one". I wish it was only the last few.
Note they couldn't pack it into a flat box if they did much of the construction themselves.
In other news (Score:5, Insightful)
Man holding hammer demonstrates ease of driving a nail into wood. Thousands holding screwdrivers are amazed.
Shellshock (Score:1)
was is "chsh -s dash www_data"?
Re: (Score:2)
News for nerds? (Score:2)
Re: (Score:2)
OMG, IKEA uses RH enterprise support for managing their servers... Slash *used* to be news for nerds. I have used scripts, after that RunDeck and now Ansible + Debian. And they do not need a subscription and better yet, are *distribution agnostic*.
Do you manage 3500 servers for a company with $32.65 billion in revenue?
Re: (Score:2)
Ob (Score:2)
# find /placewithtaxes -iregex ".*\(money\|geld\|argent\).*" -exec mv '{}' /offshore \;
How quickly we forget Y2K (Score:2)
If the heyday of Y2K remediation, I helped set up a push of a SOE to 275,000 distributed PCs in a weekend. It went off without a hitch. Management was happy, but the cries of thousands of employees who lost all their personal files and documents were ignored.
If you are willing to be heavy handed and brutal, you can accomplish miracles. Surely there is no news in that.
Re: (Score:2)
This is what Ops is really about (Score:2)
"The key was having a consistent approach to system management, which begins with a well-defined Standard Operating Environment (SOE). Additionally, Glantz has defined a lifecycle management plan that describes the lifecycle of how Linux will be used at Ikea for the next seven years."
And why I regard DevOps as a disaster in the making. While "DevOps" isn't bad for small companies, like ones I've worked for, where you 'wear many hats' or a rapidly moving R and D environment it is very dangerous in a real pro
With satellite, it's easy (Score:1)
Go to satellite, click on errata, set it to update. If you have it set up for communications Ikea would probably have been done in a half hour at the most. Otherwise, when they check in. Up to 4 hours later.
What's the big deal?
The solution? (Score:1)
Re:that's it...thanks (Score:5, Interesting)
I was there. It was said in a very joking manner. From the moment he started he showed his sense of humour.
In fact, his whole presentation was funny, amusing and had some good information.
The idea that he showed a one line command to patch wasn't the biggest shock of the talk. (Sorry, I don't recall the command.) It was the fact that he patches the 3,500 servers ONCE A MONTH. Straight into production. This caused some questions and discussion.
FTFA, "One of the potential challenges of constantly updating servers is the risk that applications break when new server operating system software is loaded. Glantz, however, isn't worried and noted that RHEL offers the promise of Application Binary Interface (ABI) compatibility across updates." The rest of his reasoning, and another amusing moment, is described at the end of the article.
Vip
Re: that's it...thanks (Score:1)
./patch
but the interesting bit was the getting to that, yeah.
Re: (Score:2)
the article did not say what it was , but anyone with redhat experience already KNOWS this ...
as root do
" yum update "
two words , that is it
Re: (Score:2)
Well, I sure as hell wouldn't run that on all my production systems without a wee bit of testing first...
Re: that's it...thanks (Score:2)
Re: (Score:2)
Re: (Score:1)
https://www.youtube.com/watch?... [youtube.com]
-- Red Hat security in a post-Shellshock world - 2015 Red Hat Summit
Re:that's it...thanks (Score:5, Insightful)
From the article the grandparent obviously did not read "Glantz showed a simple one-line Linux command and then jokingly walked away from the podium stating "That's it, thanks for coming," as the audience erupted into boisterous applause.". So in fact top notch people skills.
Re: (Score:2)
What about files which don't contain a . character?
Re: (Score:1)
We keep those.
Re: What was the command? (Score:2)
Re: (Score:2)
Real sysadmins
a) think before executing potentially disastrous commands, and therefore tend to not need the rm -i crutch
b) automate the repetitive parts of their jobs, in which rm -i obviously does not make sense
c) don't experiment around on production servers
d) have arranged their systems so that accidentally removing stuff can be recovered from.
Thanks for playing, though
Re: (Score:3)
If you alias rm to rm -i, what do you think rm -fr gets expanded to?
Could it be rm -i -fr in which case the -f overrides the -i anyway? Oh great sysadmin, can you clarify?
Re: (Score:2)
Are you referring to the zsh option which also wouldn't protect you from rm -fr /, funny man?
Re: (Score:1)
mv *.* /dev/null
With only one matching file, you'll get:
mv: inter-device move failed: `foo.bar' to `/dev/null'; unable to remove target: Permission denied
If you got more than one file matching that pattern, you'll get:
mv: target `/dev/null' is not a directory
But thanks for playing...
Re: (Score:2)
It was in Perl:
./update-all-3500-servers-at-once.pl
one line.
Re: (Score:1)
Re: Ikea running RH? (Score:2, Insightful)
Professionals look and dress like professionals. If you insist on wearing grubby t-shirts and faded jeans at work don't be surprised if you're always kept out of the loop, never ever considered for promotion and ultimately the first to be let go when downsizing.
Re: (Score:2)
Re: (Score:1)
Re: (Score:1, Insightful)
So, what you are saying is I haven't bothered to read anything, or look at anything, but here is my completely irrelevant opinion?
Man, this place used to be something...
Re: (Score:2)
Re: (Score:1)
You and the mod who chose Interesting are fucking idiots. Go kill yourselves to make the world a better place.
Re: (Score:2)
Re: (Score:2)
the man goes full comando and updates everything live without testing.
That's an assumption on your part. Sure, it may be implied, but isn't confirmed. I've seen places large enough that their OS provider would test on their behalf. So he can claim "no testing" and the answer is it was tested. Well tested. I've seen it done before.
Re: (Score:2)
You have obviously never worked with your average big corp windows admin.