Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
With an ever-growing ecosystem of projects forming around Hadoop, each with its own requirements, it’s hard to understand how these pieces come together to form a single data processing platform. Beyond simply making these tools function, administrators are responsible for ensuring a singular identification, authentication, and authorization scheme is applied consistently and in accordance with data handling policies. This is a significant challenge, to use a well-worn, albeit applicable, cliche. However, there are a few things one can do to reduce the pain of building a secure, shared Hadoop platform.
Do your homework and make an informed decision about
whether or not you need Hadoop security. It is not as simple as
setting hadoop.security.authentication to
kerberos and
getting on with life. The repercussions of Kerberizing a cluster are
significant in that every client of the cluster must be able to
properly handle authentication. It’s easy to say and difficult to
make a reality. Can you trust clients to identify themselves
truthfully? If not, you have your work cut out for you. Create a map
of all access points to the cluster, understand which must support
multiple users through a single process, and create a strategy to
handle authentication. There’s no universal answer, although there
are some established patterns you can use.