Contents
Executive Summary
Back to top
Demographic, socio-economic and crime rate data of the Greater London Region, retrieved from the London Datastore, are used in this project. In this project, 3 key analysis will be performed:
- Exploratory Data Analysis
- Clustering analysis
- Regression Modeling
Motivation
Back to top
With the limited police resources and possible adverse impact when crime occurs, analytics on crime has been done as far back as in the 1800s (Hunt, 2019). Crime occurrence was found to have spatial patterns, and thus predictive analytics should be possible. However, mixed results were obtained in the research to determine whether predictive policing results to lower crime rates (Meijer & Wessels, 2019). Thus, it is more beneficial to use the analytics to determine areas with higher risk of crime and to discover the underlying factors to the increased risk.
Traditionally, crime analysis is done manually or through a spreadsheet program (RAND Corporation, 2013). This project would give the users an easier way to do the analysis using a web application.
Project Objectives
Back to top
This project aims to deliver an interactive user web application interface, whereby users are able to apply actionable insights based on the 3 key analysis
- Understanding hot spots of crime rate, with a visual map of Greater London
- Clustering of areas based on different techniques
- Forecast possible hot spots based on different regression models
- To provide data-driven insights to inform preventative measures such as warnings and allocation of police force resources and influence ward planning policies
Datasets
Back to top
A borough includes wards, which is the primary unit of English electoral geography for civil parishes and district councils. There are a total of 32 boroughs in Greater London, excluding the City of London.
1. MPS Ward Level Crime (historic: Apr 2010 onward)
- Monthly data from Apr 2010 to Dec 2018
- Dimensions include: BoroughName, Wardcode, wardname, major category, minor category
2. MPS Ward Level Crime (most recent 24 months)
- Monthly data from Apr 2010 to Dec 2018
- Dimensions include: BoroughName, Wardcode, wardname, major category, minor category
3. Land Area & Population - Ward
- Yearly data from 2011 to 2050
- Dimensions include: BoroughName, Wardcode, wardname
- Attributes: Population, Hectares, Pop Per square km
4. Income of Taxpayers
- Data every 2 years from 1999 to 2018
- Dimensions: Code, Area
- Attributes: Mean of Tax, Medium of Tax
5. Economic Activity Rate, Employment Rate and Unemployment Rate by Ethnic Group & Nationality, Borough
- Yearly data from 2005 to 2019
- Dimensions: Code, Area
- Attributes: Employment rate, Unemployment rate, Economic inactivity rate
6. Geographical Map of London (LOAC) SHP data
7. Local Authority District Names and Codes
- Dimension: BoroughName, Area Code
Proposed Scope and Methodology
Back to top
1. Data cleaning and preparation
- Datasets should all have consistent depth (ward VS borough) and the same duration
- Using dplyr package, immediate operation and data-manipulation
2. Choose the right R package to visualize:
- Based on our project objectives, we came out with storyboards and evaluated different versions of our interactive visualisation application. With our shortlisted storyboard in mind, we explored the R packages required to build the visualisation.
3. Data visualization and Analysis
Exploratory Data Analysis (EDA)
Exploratory Spatial Data Analysis (ESDA)
- Finding spatial hotspots, outliers and anomalies of wards with high crime rate
Time series of geo-spatic data
- Understanding how crime rates have changed over the years, broken down by wards
Clustering of Location Authority District : Finding similar LAD
- Hierarchical Clustering (Hcluster)
- Hierarchical Clustering with Spatial Constraints (GeoCluster)
- Clustering of Spatio-Temporal Data (STC Model)
Regression: Forecasting of crime rate in each LAD
- Geographically weighted regression (GWR)
- Geographically And Temporally Weighted Regression (GTWR)
4. Building of Artifact - Web Application
- R Markdown development
- Functionality checks
The timeframe for this project is illustrated in the Gantt Chart below
## Storyboard & Visualization Features{#Story} Back to top
Board 1 - Exploratory Data Analysis
- To understand crime rate broken down by wards/borough
- To observe the distribution of crimes
- To observe the crime rate over time period
Board 2 - Clustering Analysis
- To create clustering, based on parameter and clustering methods
Board 3 - Regression Analysis
- To forecast crime rate, based on parameter and regression models
Back to top
RStudio
R-Packages
Back to top
Data Cleaning:
EDA:
- sf - for encoding spatial vector data
- sp - for encoding spatial vector data
- ggplot2 - for data visualisation
- choroplethr - for creating choropleth maps
- tmap - for visualization of geospatial data
Clustering Analysis:
- fastcluster - for hierarchical clustering
- ClustGeo - for hierarchical clustering with geographical constraints
- SpaTimeClus - for clustering with spatio-temporal data
Regression Analysis:
- gstat - for interpolating data, spatial and spatio-temporal modelling, prediction, and simulation
- spgwr - for geographically weighted regression
- GWmodel - for geographically weighted regression
Web Application:
- shiny - for creating web application
References
Back to top
Data Sources