BIAM 530 Week 5 Step 1: Review Hadoop Tutorial and Answer Questions
Visit the US National Oceanic and Atmospheric Association (NOAA) data access site at http://www.ncdc.noaa.gov/data-access/quick-links#ghcn (Links to an external site.)Links to an external site. and view the GHCN-Daily Sample PDF file illustrating the data used for this analysis. Are there other data columns here that might be relevant to 311 call volume besides what was used in the tutorial? If so, what are they, and how might you include them in the analysis?
Review what other data files are available on this NOAA National centers for Environmental Information Quick Links page. Select one other file that might contain data of interest to a company or government agency and briefly describe an analysis that could be performed using this data.
Visit the NYC Open Data site at https://data.cityofnewyork.us/ (Links to an external site.)Links to an external site. and follow the links to Social Services and the 311 Service Requests page. In addition to the data columns used in the tutorial, what other data columns available might be useful in an analysis of 311 calls? Suggest some analyses that might be …..using these data.
Explore the NYC Open Data site to identify other data sets available besides the 311 service calls. Select one of these data sets and briefly describe how it could be used in an analysis for a government agency or business.
Summarize the major steps performed in the analysis shown in the tutorial, and the IBM BigSheets functions used in the analysis
Compare and contrast the analysis shown in the tutorial, using Hadoop and IBM BigSheets. With how a similar analysis might be performed on a smaller data set using Microsoft Excel. What was similar and what was different? What challenges would be …..by an analyst skilled in Excel when adjusting to working with big data sets using Hadoop?
BIAM 530 Week 5 Step 3: Get Connection Details and View Ambari Console
Q1:What percentage of disk space on the Hadoop ……File System (HDFS) is being ….? How many gigabytes (GB) or terabytes (TB) does this represent? (Hint: Hover over the HDFS Disk Usage section and add the values for DFS Used and non-DFS Used.)
Q2:How many DataNodes are in the cluster? (Hint: Look in the HDFS Links section.)
Q3:How long has the cluster been running since the last restart? (Hint: Look in the NameNode Uptime section.)
BIAM 530 Week 5 Step 4: Use Basic HDFS Commands
biam 530 week 5