Digital census 2020: How the US Census Bureau is surveying 330M people across 4M square miles

digital census 2020

How do you survey 330 million people across 4 million square miles? Every 10 years the United States is constitutionally obligated to do a national census. This year, digital census 2020 was the goal.

In 2010, it cost $12 billion. They printed 17 million pages of paper maps and 134 million paper questionnaires. In 2020, the Census Bureau went digital in 59 languages. It was a massive development project with no real ability to test before you go live in from of the entire country.

And then, of course: Coronavirus! In this edition of TechFirst with John Koetsier, here’s how the Census Bureau did it …

Listen: Digital census 2020

Better yet, subscribe to TechFirst with John Koetsier wherever podcasts are published:

Watch: Digital census 2020

Subscribe to my YouTube channel so you’ll get notified when I go live with future guests, or see the videos later.

And the full transcript: Digital census 2020

John Koetsier: How do you survey 330 million people across 4 million square miles?  Welcome to TechFirst with John Koetsier.

Every 10 years the United States runs a census. In 2010, the last time, it costs $12 billion. They apparently printed 17 million pages of paper maps and had 50 million paper questionnaires. In 2020, they’ve been working on bringing it digital. They’ve had to do it in 59 different languages. It’s been a massive development project, you can just imagine the size of the website that you need, the capacity you need, and there’s not really a huge ability to test before you’re going live, and then of course Coronavirus happens.

To find out how that’s working, how it’s going, I’m bringing in a couple people from the Census Bureau right now… Stephen Buckner and Zachary Schwartz.

Welcome!

Stephen Buckner: Hey John.

Zachary Schwartz: Thank you.

Stephen Buckner digital census 2020John Koetsier: Thank you for taking a little bit of time with us today. Really do appreciate it, I know you’re super busy. Tell us kind of the progress.

Where are you with the 2020 census?

Stephen Buckner: Well, it’s an exciting time for the Census Bureau. We do this once every 10 years, it’s written into the constitution, and we take it pretty serious because it’s so important. It dictates all of your political power in Congress but it also distributes billions and billions of dollars back to federal, state, and local governments every year.

So right now we are nearing the 60%  response rate mark, which is ahead of schedule, which is really encouraging, but we didn’t start this census thinking that we’d be facing a national pandemic in terms of COVID-19 and some of the things we had to do. So midstream, when things started shutting down we halted all of our field operations and stopped hiring people to go do various operations, and we’ve had to delay certain field operations.

And right now as we get into the summer months and things are starting to open up back a little bit, we’re coordinating very closely with local health officials to make sure that as we bring our field workers back on that they’re able to do their work safely and not put themselves at risk, or the public. So we’ve got about 90% of the area census offices open right now in a phase standpoint, working with state and local officials and trying to finish out some of the field operations that are necessary before we have to start going door to door, which we’ve pushed back to August now, so…

John Koetsier: Wow.

Stephen Buckner: Again, normally the census wraps up around December 31st, by law we have to provide the counts to the President and to Congress. We’ve pushed that back til March 31st now, so again, it’s really hitting us.

John Koetsier: Stephen, first of all, amazing that you started working on the whole digitization of it, digital transformation of the census, and imagine where you would be right now if you hadn’t done that in previous years.

But maybe even before we get into that … talk a little bit about, you said billions of dollars rest on this and other things… can you talk a little bit about that? A lot of people might think of the census as, hey, we’re counting the people, we’re finding out where they are, who’s around.

What are the financial, economic implications of the census?

Stephen Buckner: Well, a lot of the funding that comes from the federal government back to state and local organizations and municipalities is totally based on population counts.

So if you are a city or a county of 5,000 or more, you get a certain level of percentage of federal funds for roads, schools, hospitals, daycare, things like that. If you hit 20,000, 50,000, 100,000, 250,000, even a million or more, you start getting more and more resources from the federal government based on those population funding formulas. And so that’s why it’s so important.

And because we only do it once every 10 years, if you’re not counted you’re basically, your community is going to get the funding of a smaller town or a smaller area than it deserves. And so as areas are growing and changing rapidly, it’s very important to make sure you’re getting those funds to be able to fund core social service programs, and right now during the pandemic, healthcare services are heavily dependent upon it. So it’s really, really important, and again, most people that hear about the census do realize it’s important, but we’re trying to make sure everybody’s counted and we need everybody to respond.

And so while we’re nearing 60% as of this weekend, we certainly want to go out and try to get as many people to self respond so that when we have to go knock on doors to get those last couple million homes, we’re able to do so in a safe way during the pandemic.

John Koetsier: So 60% sounds pretty good. I mean, that’s a lot of people. What was your goal and are you ahead of it, are you behind it? Talk a little bit about that.

Stephen Buckner: Yeah, and I’m going to kick it over to Zack a little bit to talk about some of the technology stuff and why this census because we are online has really, really helped us. The internet, more people are responding than we thought by internet, and that’s really good. We’ve really outpaced those numbers. We projected about a 60.5% response rate, but that number was really calculated based on how much workload are we going to have at the end of self response, to have to go and follow up with households.

And so we’re nearing that mark, but survey response rates around the world and at the US have been declining over the decades, and so we projected about a 61% response rate in 2010. The last time we did it and we finished with just around a 63.5-64% I believe, response rates.

So we’d really like America to take a look at that and say, yeah, we may have met our 60.5 projection, but we want to see them do better than what they had in 2010 and we have thousands of partners across the country, state and local governments to community organizations, nonprofit organizations, really trying to make sure that their community is accurately accounting, and we couldn’t do this without them. And so the census is by far over.

We still have a long way to go and that’s sort of where we’re transitioning right now on a lot of our digital properties to just try to remind people through response rate maps and other social media and digital media channels, you know, why should people pay attention to the census when it’s a struggle out there for some people that have either lost their job or have been furloughed right now, trying to figure out how to make ends meet or get food on the table, and so we’ve got to be very sensitive to that.

But at the same time, we need to make sure we get an accurate count for them and not only today, but for tomorrow and the next 10 years.

John Koetsier: Yeah. Let’s talk a little bit about the technology challenges here. You digitized, you can do it online. That’s a wonderful thing, but we’ve seen big technology projects in the past that have really failed, right, or had significant challenges.

And you’re not talking about a small thing here. I mean, I’ve built sites that about 10 million people could use in a given day or something like that, and that was a challenge. You’re building a site that theoretically, I mean, theoretical maximum is you could have 300 million people come in at once. So maybe that’s maybe adults, maybe that’s too high, but at least a hundred million people could hit at once potentially.

How do you plan for that? How do you prep for that? What do you build for that?

Stephen Buckner: Yeah, I would just say this, and I’m going to let Zack sort of talk about concurrent users and how we scaled and how to make sure to deliver a really good experience for the American public.

I would say that government IT projects are sort of, you know, usually don’t get accolades for being successful, and going into the 2020 census there was a lot of discussions over whether the Census Bureau was ready, and whether or not it could really keep up with the demand for the number of people going to the internet. You know, we only do it once every 10 years. We can’t, we don’t get a do-over and we have a deadline. So with all that, you have all these constraints coming at you, and we did our best to calculate out and did a lot of things to project what the concurrent usage would be on 2020 census.gov to make sure that when somebody came there they could fill out their form, and we’ve been extremely successful.

The site has stayed up the entire time period, there’s not been slow periods. If you go to 2020 census if you haven’t filled it out, it’s only going to take you a couple of minutes to do your form online, and you can do that by phone as well. So it’s really,  really simple, and I’d like to maybe shift it to a little bit of the IT because we also went into the cloud for 2020 census.gov, not the response site, but where the content is at again, because 300 million, it’s a lot of people to sort of plan for.

Zack?

Zachary Schwartz: Yeah. You really have to think about it, right? So we luckily saw and learned a lot from areas of the country that, or from other projects throughout the government that had large technology implementations. A lot of lessons learned. We partnered with a lot of different agencies to understand, commercial partners as well to understand what we needed to do.

So for 2020 census.gov which is our main landing site, it’s where advertising, a lot of search engines, that’s the home, real home of it. And we recognized when we started the project with 2020 census.gov, we were on on-prem infrastructure, we realized we were not at where we needed to be to scale to meet 2020 census demands.

So we had an excellent project put together, a full cloud migration to move the 2020 census.gov website to the cloud, recognizing what our concurrent users were going to look like, recognizing what the load was going to be, and that we wanted to have a strong user experience. You don’t want to go to a slow website, you don’t want to complete surveys on a slow website, you don’t want to browse a slow website. So there’s so much that we needed to do and we’re really pleased with the end results. And we truly believe this was a successful cloud implementation and something the government can reuse over and over again.

John Koetsier: Well, that’s exciting to hear.

Can you share a little bit, how many concurrent users did you see at any given time?

Zachary Schwartz: There were definitely periods of time where we had tens of thousands of concurrent users on the 2020 census.gov website, and certainly as well as the ISR: the internet self response tool. One of the biggest drivers of concurrent users was actually a great tech partnership that we had with Facebook and Instagram that actually drove some of the highest concurrent users.

They had a great call out, a civic engagement partnership with them free of charge, that they actually placed the link for 2020 census.gov at the top of your Instagram or Facebook newsfeed, encouraged you to go there, encouraged you to share with your friends and family on those social media platforms, and it drove an unbelievable amount of traffic right there during the peak time of self response, and was really something we saw on our end and our analytics showed.

Our infrastructure stayed up, our website was responsive, and people had an excellent user experience completing the census and understanding what the census was at our website.

John Koetsier: Nice. And I’m guessing that you kind of intelligently staggered those sorts of announcements and maybe your ads as well, so you wouldn’t have everybody hitting all at once.

Zachary Schwartz: Yes, definitely.

Stephen Buckner: Yeah, I think part of the operational design, from a communication standpoint, we had to take advantage of how our Decennial colleagues were planning the operations that we did do a phase mailing from March 12th through the 20th to sort of allow different panels to come in and respond, because we were initially worried about some of the load that could possibly come on the system.

But we built it with a lot of extra, concurrent usage, to make sure that we never had any kind of downtimes. And then we also built in some backup plans. So if something were to go down, or slow, or fail, we had multiple redundant systems in place to be able to address it to where the public wouldn’t ever really know the difference. And that’s some of the great work of our IT counterparts across the Census Bureau and across the Decennial Directorate to make sure that it was easy, it was safe, and that people could do it with relative ease. And so it’s pretty exciting when you launch something like that, especially on the internet and you get the performance you expected, and you don’t get any complaints.

And so that’s been really exciting, especially with more people being at home now, not going into work, you know, it could have really taxed the system, but we just haven’t seen that. So really exciting.

John Koetsier: Congratulations on that, because I mean you guys said it yourself, large scale technology projects by the government don’t have a huge, amazing track record of success, but we haven’t heard anybody saying, hey, the site’s down, can’t get to it, can’t do it.

You’ve had great participation already, so that’s wonderful.

Now, let’s talk a little bit about security, and I’m sure it’s a constitutional requirement to have this data. This is not a survey that I might do to find out what people think about a particular brand of technology or something like that. This is significant data. What are you doing here to make sure that it’s secure, make sure that the people who are coming in are the people who they say they are, and that you’re getting the right data, and nobody’s spoofing it a thousand times to increase the population level in their area or anything like that?

Zachary Schwartz:  Sure.

Stephen Buckner: We could probably answer this together Zack. I’ll start by saying, look it takes a lot of people to make sure that the system is secure and safe to do anything in today’s modern world, right. So we work with a lot of public officials across the CIO areas, reaching out to not only get support from our department, but other federal agencies, from intelligence agencies looking at real time threats and cyber security issues that may be out there, but also then bringing in a lot of private sector expertise and doing large scale IT projects to make sure that we had the best and the brightest working on this to give us advice along the way, and to constantly make sure that we were really looking at every single aspect of sort of that journey that either an employee, or a respondent would actually go through.

But we did countless pen testing [penetration testing] and doing bug bounties and things like that, you know, just constantly straining the system a little bit for years making sure.

John Koetsier: Wow.

Stephen Buckner: And again, there were a lot of naysayers out there that like to come out and say, well, the government’s not doing this, or census isn’t looking at this, but again, part of IT security is knowing that things are going to happen, but how do you isolate it to, you know, once you detect it, that you isolate it and then resolve it quickly, within that ecosystem.

And I can’t give enough credit to our CIO and the Decennial IT folks for really building those relationships across the federal space, but also the private sector space. And Zack, you want to talk a little bit, maybe more about some IT from the encryption and other things.

Zachary Schwartz: Yeah, so our data is encrypted. When you’re responding online, the data is encrypted both when it’s sitting at rest, as well as when it’s in transit, and we always like to say there’s a lot of doors, so your data may be in one room, but it’s in the next room pretty quickly. And next thing you know it’s back in our specific data lake where it is very safe and secure.

And some of the other things when it relates to data also, as you mentioned spoofing and others, we’re using top tier technology to understand when bots may be coming onto our tool, when there’s inauthentic human behavior that’s occurring, all of these different technologies, the things that are we would say, modern 21st century, you know, anti-bot, anti-inauthentic content, those are the types of things that we have both in place with our internet self response tool as well as our website.

And one other area that I’ll just add real briefly and we’ll touch on I’m sure more, mis- and dis- information, right? Our ability to look at and understand what the public is talking about from a misinformation or disinformation around data security as well as many other topics, that’s another area that we’re focused on. We’re not just looking at what’s coming at our website. We’re being proactive and seeing if there might be attempts being planned out there on the web.

John Koetsier: Very, very interesting, Zack. So question would be, have you detected bots on the system? Have you had to kick them off the system, that sort of thing? Have you detected the attempts to sway the results?

Zachary Schwartz: We haven’t detected anything as a result of swaying. We’ve certainly seen attempts where bots are coming on, whether or not they’re bad bots or good bots, there’s a variety of things, but they’re not able to get very far really beyond the first page, because our detection systems are looking to see whether or not you have human behavior of click times and many other things that would be in place.

So we haven’t seen any of those types of issues come involved, and when we do have any types of concerns we isolate that session. We take and we have experts in the security field look at what was happening, look at the IP, look at all the data, look at the information that occurred in that session. And we’re able to isolate that and determine whether or not it was human or inauthentic.

John Koetsier: Excellent.

Zachary Schwartz: And you know what? It’s okay to kick someone out. And that’s what we’re able to do if we feel that we have concerns.

Stephen Buckner: Right, that real time monitoring is really, really key. And, you know, having people 24/7 in op-centers really monitoring all of the systems and having different flags there to take a look at and know what to do with, took a lot of practice in setting that up, but having a standard operating procedure to be able to deal with each one of those scenarios, I think has been really, really key as well.

John Koetsier: Cool. Talk about some of the technologies you use to build a site. What’s on the back end.

Stephen Buckner: Yeah. Zach, you want to take it?

Zachary Schwartz: Yeah, so we’ll start with 2020 census.gov that is an Adobe Experience Manager content-management system. It is again, what we see as one of the driving factors of having a modern website: clean, simple layout, easy to read, easy to understand. We wanted the ability that when you interact with the government, specifically the Census Bureau, you have a good user experience.

So that on the backend for how we customize how we deliver content is through Adobe Experience Manager.

We are hosted in AWS, Amazon Web Service cloud, GovCloud. That’s extremely important certainly understanding that as we talked about security, we wanted to make sure our infrastructure on the backend was some of the top tier safety, certainly. So between our AWS managed infrastructure, our application of Adobe Experience Manager, those are how the 2020 website is run to date. Our internet self response tool uses a mixture of off the shelf products certainly, as well as a lot of custom expertise Census Bureau built code for the actual survey instrument.

John Koetsier: Super interesting. I remember, some things I’ve done on AWS and when I’ve been analyzing a huge amount of data, I remember one query that I ran was about $3,000, so it was just a $3,000 query to get a bunch of data.

I’d hate to see  your bills, I’m assuming they’re going to be significant, but I’m glad to see that you’re in the cloud and you’re burstable. There’s a handful of companies on the planet and most of them are in the US that have an absolutely global digital footprint, a massive digital footprint. We’re talking the Googles, the Facebooks, maybe add Twitter in that, Apple, other companies like that, they have seriously global scale infrastructure.

Have you worked with some of these companies? You mentioned doing some things in terms of getting the word out. Have you also worked with some of these companies in other ways?

Stephen Buckner: Yeah. I’ll take it at first. And you know, I’d start by saying we’ve worked with all of them, those mentioned plus a lot more. And I think that’s one of the keys to the success of the 2020 census and just the Census Bureau being a pretty innovative agency, trying to understand what is really going on there? What are some of the consumer trends? What can we learn from private sector entities in terms of, as we go more digital in the government, to be able to deliver a very positive and delightful experience in terms of what’s going on.

And being able to use real-time data, you know, Census is a very statistically based agency, obviously, but using real-time data to be able to make key decisions is another thing that we really worked on, being able to see what was coming in on response rates. How does that look in terms of our communications program and is it performing at what we thought?

So we had projections down to the census track neighborhood levels in terms of what we expected by response so that we could really measure and see whether or not we were performing above or below that line, and then take steps to actually change it.

And given the size and the complexity of the decennial census, that’s no feat on its own, but I would say forming the first ever trust and safety team really stemmed from conversations that Zack and I were having throughout the decade with the Facebook, Googles, Twitters, Microsoft, and others, so that we actually could emulate some of the things that we started to see around misinformation, disinformation around the mid decade. You know, those attempts to sort of destroy democratic institutions, the foundation of our representational democracy, starts with the census. It’s in the constitution. So we knew we might be a really big target and we needed to set up that ecosystem early on. And we’re really happy with our tech partners in terms of the communications we have with them daily, but also in terms of setting up and following processes around misinformation, disinformation, under a trust and safety umbrella.

So we’re monitoring social media 24/7, we’re looking at things that pop up on the field because we have over 300,000 national partners, we have a great sensory network out there among community based organizations, nonprofits, civic organizations that really come in and tell us, ‘Hey, we’re seeing this, can you take a look at it?’ So we set up a rumors@census.gov email box to flag any kind of content.

We have a nice trust and safety page that basically dispels any kind of rumors that are out there around the census or 2020 census operations. And then we’re constantly having, you know, Zack leads a call, weekly with two different groups to make sure that we’re sharing that information across that entire partnership network so that they can hear what we’re seeing and then help us diffuse it before it becomes sort of out of control.

John Koetsier: Couple super interesting things packed into what you were just saying there. I mean, one, just the idea that you are planning a decade for something that happens, that’s so foreign to my experience in tech companies and in tech journalism and everything else, that kind of blows my mind a little bit.

But the other thing, which I’m glad that you’re on top of and really working, is there are people out there, whether they’re confused, whether they’re being paid, whether they’re just trying to inject chaos, who are actively injecting disinformation into the social platforms, the news platforms, the media platforms that we have, right. We’ve seen it in COVID-19. We’ve seen it before that, we’ve seen it in election cycles. So I’m glad that you’re working on that.

Anything else you can say there just about this era of distrust of data and expertise. You know, how are you ensuring that the data that comes out of here is beyond reproach?

Stephen Buckner: Right. I would say it starts with people responding and people trusting the systems that we’ve put into place to make sure that we’re collecting their data in a safe way and then storing that separately to where nobody can ever get to it.

And we have very tight confidentiality laws at the Census Bureau that we all take an oath for life that we cannot share any personal information. We can’t share it with law enforcement agencies or any other group. We don’t sell the information we collect to private businesses in terms of your personal information.

And I think that’s really key in terms of that sort of trust equation, if you will, with our customers. In terms of the data itself, the census is such a complex machine, we usually, again have the data results by December 31st. Because of Covid we’ve delayed several operations, so the reference date of the census is April 1st. The further we get away with that, people might not remember where they were on April 1st so we’re working with our stakeholders to make sure they understand those nuances as we start to stand up different operations, things like college students that were on campus but then the universities and colleges closed, they went home. Well, they needed to be counted in their college towns, so where they lived on on-campus housing, where the university would count them and let us know, or if they lived in off-campus housing it was on the students to make sure they respond to where they were living. Trying to get that message out.

We also do a lot of outdoor locations with homeless-base or service-base populations. We’ve had to postpone those operations. And so, working with stakeholders to go out and make sure we get an accurate count of those without usual residences is really key here too, at the end of the census. So, working in a transparent way, which the Census Bureau really tries to do its very, very best to make sure people understand what we’re doing, why we’re doing it, will help us on the back end when the data start coming out in 2021.

John Koetsier: Very, very interesting. One question I was going to ask, you’re doing it digitally now more than ever before, obviously, and one would assume that in the future that will be, it may transition to completely digital, it may not, there may be some people that you still need to do via the phone, or paper, or something like that.

Do you anticipate as we become more and more of a digital society that you’ll accelerate this decade pace, or is it the right kind of framework to be looking at for governmental programs and other uses of census data?

Stephen Buckner: Yeah, that’s a very, very good question. I think one that gets talked about quite a bit, I mean, the public sector gets dinged quite a bit for not being on the cutting edge of technology, but for probably good reasons, right?

We have to be a little bit more conservative in our approaches because it’s regarding taxpayer money and we want to make sure we implement it effectively, but we’re already standing up teams that are working on the 2030 census, things that we can start testing right now during the 2020 census, we have several little tests that we’re doing to make sure that when we’re done with this, we’ve learned something that will improve it down the road for the next count.

But I think if anything, Covid has taught us is that we can do our work remotely a lot more easily than any of us thought. I mean, we haven’t been in the office for over 10 weeks based on social distancing here at the headquarters, so teleworking is a thing, it’s going to stick around. What the public sector looks like after that in terms of going in and out of the office is anybody’s guess right now. But I think the more technology can help us bridge those gaps to be able to communicate more easily and collaborate while being distanced from each other, maybe not being in the same room, will really add to some of the things that we come out maybe for 2030.

We’re going to use a lot more government records that we collect that we can maybe share. So we do have administrative records we’re going to be using to help increase the accuracy of the census, and also fill in gaps that maybe people didn’t fill in that we have a good trusted source of data that we already collected. But 2030 maybe it’s more of those types of records at that point.

So, you know, there’s a lot of those types of things that I think feed into it. We stood up a household and business pulse survey right in the middle of Covid to help understand what kind of impacts households were having, loss of income, food security questions, things that were really impacting everybody in a matter of a couple of weeks. And we’re doing all that digitally and we’re trying something, doing it by texting and by email which is not something we traditionally have done in our ongoing surveys. And same for the business community, right? We’re sending out emails and saying, ‘How has your businesses being impacted by COVID?’

And so by having this data it not only helps us plan down the road, but also helps us react and help us mitigate things coming out of it. Like how do we recover from all of this? And we need good data to be able to do that, and that’s really what the Census Bureau is about. And so whatever we can do to get better data with the most efficient way and less disturbing way from the American public and business community, is certainly what our goal is, right? They want something to be really easy.

We’re going to do everything we can to make it easier in subsequent years.

John Koetsier: Wonderful. Wonderful. Well, thank you, Stephen. Thank you, Zack. I really appreciate your time.

Stephen Buckner: John, thanks so much.

Zachary Schwartz: Thank you so much.

John Koetsier: Excellent. For everybody else who’s been along, thank you for joining us on TechFirst. My name is John Koetsier, appreciate you being along for the ride.

Whatever platform you’re watching on, please like it, subscribe, share, comment, all of the above. If you’re on the podcast afterwards, you like this, please rate it, review it, that’d be a massive help. Thank you so much.

Until next time, this is John Koetsier with TechFirst.