School dropout susceptibility mapping with fuzzy logic – a study in the District of Purulia , India

Multi-input prediction models are gradually nding their places in the arena of social and economic sciences to assess, locate and address the complicated socio-economic issues arising around the globe. These models treat the problems as the output aroused from a complex interaction between a range of variables linked with physical, socio-cultural, economic as well as ambient political systems. The discussion on dropout from the education system belongs to the core of the educational researchers. The researchers within this domain are attempting to develop the 'tools and techniques' for efciently demarcating the space with a given degree of susceptibility. The scope is to drop out and examine the internal functions of the interactive variables associated with the process. In the present study, we try to apply the fuzzy logic in mapping the spatial variation of the susceptibility of school dropout in the district of Purulia, a backwards district in India regarding achieved level of human development. The training datasets for building the fuzzy model based on the available secondary data from different reports published by the Government and a range of primary data collected through a socio-economic survey. The model output is an index, namely the Index of Susceptibility of School Drop Out (ISDO) which reects the levels of susceptibility to school dropout at different parts of the study area. The proposed model should allow the success within the larger social and economic system.


Introduction
The measurement and mapping of the unequal status of development between different classes, communities and spatial units are one of the major research concerns for the social scientists at the global level. A reliable scientic measure, in this perspective, would facilitate public policy discussion and enable executing rational decision-making at all levels on a rm basis (Pulselli et al., 2006;OECD, 2008). Sen (2001) states a very sensible statement about the development as "development is about creating freedom for people and removing obstacles to attain greater freedom". The greater freedom enables people to choose their destiny. Education gives 'power' to people to choose their 'destiny' rationally. Education is considered (by economists) as a kind of human capital and certain established facts entail that a country's stock of human capital confers a positive growth rate to its economy (Barro, 1991;Mankiw et al., 1992). The investments in education are assured with signicant positive returns (Bhaumik and Chakrabarty, 2009). Interestingly, ample examples are showing that this return is comparatively higher for the people belonging to more disadvantaged socio-economic classes (Krueger and Lindahl, 2001). Thus, development, when conceptualised as a process of sustainable human well-being, cannot be addressed properly without linking it with another parameter 'education'.
Education occupies the most important strategic position in India's public development initiatives. The progressive development policies and Five-Year national development plans accorded a high priority to educational development (NUEPA, 2014). There has been a signicant variation in the enrolment and attainment of education across different Indian states after the independence, as reected by National Sample Survey database (Filmer and Pritchett, 1998). Besides, India is not an exception to the very standard features of the developing countries. There, the educational attainment is increasing although the rising average of educational level is often accompanied by an increased inequality in the education (Pieters, 2009). The 'inequality of opportunity' to education between castes, communities, and genders is concerned with the low degree of social mobility (Asadullah and Yalonetzky, 2010), and more relevant, in the purview of the present educational system in India under the prevalence of privatisation of education. Here, the enlarging gap for per capita expenditure to education between the 'wretched' and 'afuent' economic classes is inviting a debate for the equity and quality of education.
School dropout is a social issue, and its roots are linked to the ambient environment, society, culture, ethnicity and politics. The complex patterns of interaction between several factors shape the patterns of dropout differently over the space. Drawing the 'contours' of dropout through a careful integration of all these factors will be helpful to: (i) understand the spatial variations of the level of an education-friendly socio-economic environment; (ii) assess the spatial differences of the response of the contributing or constraining factors on the choice of an individual towards the 'acceptance' or 'refusal' of undergoing an educational level, and (iii) obtain the basic knowledge on the educational disparities to address the issue through future planning and policy formulation.
The multi-criteria based prediction models are gradually making their places in socio-economic sciences as they can mathematise the complex real-world variables within its virtual computational platform and provide an output through accepting multiple (practically, as much as possible) inputs from the users. This study aimed to map the susceptibility of school dropout using the basic algorithms of fuzzy set theory, popularly known as fuzzy logic.

Study area
The district of Purulia ( Figure 1) located in the extreme west of the State of West Bengal in India was selected as the study area. The district lies between 22.702950 N to 23.713350 N latitude and 85.820070 E to 86.875080 E longitude, covering a total area of 6259 sq km and accommodates 2,930,115 inhabitants with an average population density of 468 persons per km2 (Census of India, 2011). The district achieved a marginal level of development in health, education and income -the three basic dimensions of Human Development (West Bengal Human Development Report 2009). The constraints such as unfertile soils, extreme climates and the lack of irrigation opportunity restrict the district to achieve an agricultural yield beyond the subsistence level. Regarding the level of educational attainment, the rural and urban literacy rates are 62.73 and 76.18 percent in the district, respectively. Moreover, the literacy rate amongst urban females is 67.15%, and it has yet to reach the 50 per cent 'benchmark' in the rural areas (District Statistical Handbook 2013).

Sampling design for the primary data collection
The present study used the primary data collected through household surveys constructed using a pre-printed survey schedule. The district of Purulia is constituted of 20 Community Development Blocks (i.e. C.D. Blocks) and a total of 170 Gram Panchayats (GPs) within the administrative jurisdiction of these blocks. There are also three urban municipalities in the district. The survey was designed to estimate simple proportions without any cross-classications in a large population. This was made by collecting the samples randomly from each C.D. Block, provided that the sample is distributed at least one census village in each of 170 GPs and one municipal ward of each of three urban municipalities of the districts ensuring the representation of the entire study area. The sufciency of the collected sample size from each unit was validated by using the following formula (Australian Bureau of Statistics, 2016): where is sample size for × set of population; is the Z value at signicance level; is the population within set x; is the expected population having the attributes which are being estimated from the survey; and c is the condence interval. For the present study, the ratio was assumed to be unknown and was set to 0.5 (i.e. 50%), as this would produce a conservative estimate of variance. The value of Condence Interval (c) was set as 0.05. The coordinates of all the surveyed villages (will be mentioned as 'sites' in the following part of the paper) were recorded with the help of a GPS handset for the purpose of plotting the data with GIS Software platform ( Figure 2).

Secondary Sources of Data
A wide range of secondary data from different reliable sources was also employed in the present study. These data were collected mostly from the reports of Primary Census Abstract, Directory of Village Amenities and other enumeration reports published by the Census of India. The detailed variable wise data sources are listed in the corresponding table in the later part of this paper.

Software
The statistical calculations and algorithms of modelling were solved using MS Excel v2010, SPSS v17.0 and MATLAB v7.12. The mapping was done using the open source GIS software QGIS v2.8.

From classical set theory to fuzzy set approach
The fuzzy set theory was originally proposed by Zadeh (1965). The application of fuzzy logic has gained a wide popularity in the different areas of spatial sciences for the construction of multicriteria based prediction models (e.g. Wang, Hall and Subakyoo, 1990;Burrough, MacMillan and Deursen, 1992;Smith, 1992;Bogardi, Bardossy and Duckstein, 1996;Mays, Bogardi and Bardossay, 1997;Hartkamp, White and Hoogenboom, 1999;Kurtener and Badenko, 2002 and many more). The strength of the fuzzy logic to become a powerful tool for social researchers is its capability to convert the primary eld statement classes like 'mostly favourable' and 'rarely favourable' to statistical classes like '0' and '1'. The classical set theory treats this fact as an observation (x) either belonging to the set A or not: The corresponding membership function only takes two values, i.e., '0' (when, and '1' ( otherwise). However, the fuzzy set uses the concept of 'membership function,' which is the statistical representation of the degree of belonging to a particular observation to the classes with dened boundaries; i.e. the 'maximum degree of belonging' to a class is represented as '1' and the 'minimum degree of belonging' to a class is represented as '0', but the degree of membership can also be assigned a value between '0' and '1' for other classes having intermediate values. Following Ranst, a fuzzy set as a mapping from A to the unit interval [0,1] is written as (Tang et al., 1996):

Input variables
Factors causing dropout and having the degree of the spatial link are considered as the input variables. The incidence of dropout is a multi-dimensional phenom-enon and, further, all the causes are not equally relevant to the different places. Illiteracy, poverty, inadequacy in earning and the consequent poor standard of life have been emphasised as the important factors of dropout in the relevant literature (Desai, 1991;Rao, 2000;Tilak, 2002;Chaudhury, 2006). The expenditure of family towards educating children has a profound impact on the attainment as well as dropout, where India lags behind in terms of the public expenditure to higher education (UNESCO, 2007).
The religious and ethnic characteristics of the society also inuence the dropout scenario of a region (Bhat and Zavier, 2005). Besides these, socioeconomic concerns and the unavailability of educational institutions within a locality are often exhibited as crucial factors in school dropout (Sharriff, 1995;Sengupta and Guha, 2002;Barooah, 2003;Indian Institute of Education, 2004). Rural settings, connectivity issues, distance from urban educational services, isolated geographical location and the rural-urban migration have also been examined as the causes of educational dropout in India (Govindaraju and Venkatasen, 2010;Chung, 2011;Roy, Singh and Roy, 2015). Considering the factors previously examined in different studies in the different parts of India and contrasting them with the scenario of the district of Purulia, the present study could nalise eight factors having a high degree of inuence on the incidence of school dropout. One suitable indicator for each factor (i.e. eight variables) was used as the input variable for the fuzzy models (Table 1 for detailed structures of variables and corresponding data sources). Access to urban educational goods and services URB Weighted index of proximity to nearest urban centers (see Table 4)

Decimal Do
Degree of connectivity (vis-à-vis isolation) CON Weighted index of status of accessibility through roadways (See Table 5 School dropout susceptibility mapping 117 There are four variables, i.e. CII, MDS, URB and CON, which are composite indices, designed for the study. These composite indices have reasonably reduced the necessity of incorporating a large number of indicators through merging many of them within a single composite index. In a district level analysis, it is unlikely to expect any single factor to determine the level of susceptibility of drop out in a particular location drastically; rather the unequal spatial pattern of dropout susceptibility may be expressed as the resultant of the combined effect of all the variables interacting together where the signicance of all the variables in all the places does not remain same. The present model attempts to seek how precisely the entire bundle of inputs can predict the outcome variable (which indicates the susceptibility of drop out) efciently. There are four variables, i.e. CII, MDS, URB and CON, which are the composite indices, designed for the study. These composite indices reasonably reduced the necessity of incorporating a large number of indicators through merging many of them within a single composite index. In a district level analysis, it is unlikely to expect any single factor to determine the level of susceptibility of dropout in a particular location completely, rather the unequal spatial pattern of dropout susceptibility may be expressed as a resultant of the combined effect of all the variables interacting with each other where the signicance of all the variables in all the regions does not remain the same. The present model attempts to seek how precisely the entire set of inputs can predict the outcome variable (which indicates the susceptibility of dropout) efciently.

Fuzzy classes for input variables
Each of the eight input variables (as mentioned in Table 1) is clustered into four classes using Jenk's natural break optimisation technique, which is a popular data clustering method designed to determine the best arrangement of the data into targeted number of classes through seeking the minimum variance 'within' the classes and maximum variance 'between' the classes (Jenks, 1967). The classes are labelled as: Very high (HH), High (H), Low (L) and Very low (LL). The detailed classication is given in Table 6.

Fuzzy classes for the output variable
The execution of fuzzy logic generates the output as the 'Index of Susceptibility of Dropout' (ISDO), which is further disaggregated into four classes, namely 'Highly susceptible', 'Moderately susceptible', 'Marginally susceptible' and 'Rarely susceptible' for educational attainment (Table 7). In the present study, there has been a complete liberty to set the output range of the fuzzy model, as the standardised output has to be used for further mapping and the fuzzy output was set to a range between 0 to 10.  where is the proportion of the population (belonging to age group 25-65 years) attaining up to the 'l' level of education and is the ofcial duration of level 'l' of attainment. The larger value of MYS signies a lower incidence of 25-65Y dropout occurring in a particular spatial unit and vice-versa. This variable was selected for assuring the reliability of the output generated from the fuzzy model.

Assigning fuzzy membership functions
Membership functions express the degree to which the values of a variable, having the likelihood to inuence (directly or reciprocally) the incidence of dropout, fall in a certain susceptibility class. The present study used 'gauss' and 'gauss2' membership functions in MATLAB Fuzzy Membership Function Editor for the input variables. Similarly, the output variable used 'triangular' membership functions (Figure 3).

Developing rules to link input variables with output
The quality of the output from the fuzzy model depends on efciently projecting the real world scenario into the model by establishing links between the inputs and outputs with logical statements. The rules of linking used in the current study are summarized in Figure 4.

Assigning inter-variable weights
The fuzzy model was initially executed without assigning any weight to any of the input variables, in this case, the obtained output showed a very weak level of signicance when tested with the validating variable (i.e. MYS ). It is a clear 25-65Y indication that all the input variables do not contribute equally in determining the levels of susceptibility of dropout. The relative effect of the input variables on the output required to include a weight factor in the fuzzy model. The rules regarding the relative effect, considered here, were based on the application of experience and summarisation of the statements of the respondents regarding the same as collected during eld survey. The considered weight assigning rules were as follows: (i) ILR, CII and EDE are 'two times stronger' than STP, MRG and MDS; (ii) ILR, CII and EDE are 'three times stronger' than URB and CON; (iii) STP, MRG and MDS are 'two times stronger' than URB and CON. These rules were converted into a pairwise comparison matrix by assigning the numerical values according to the subjective relevance to determine the relative  (Table 9). From this matrix, the weight for each variable (i.e. the determination of the priorities to be given for each factor) was determined using the technique used in the Analytical Hierarchy Process (AHP) proposed by Satty (1980;2000). The Consistency Ratio (CR) for the matrix was less than 0.1 indicating a reasonable level of consistency (Malczewski, 1999). The calculated weights were standardised using the Mean and Standard Deviation to use it in the fuzzy model. The effect of assigning the weights to different variables can be viewed from the model comparison surfaces and some sample comparison surfaces are demonstrated in Figure 5.   were plotted along the Y-axis against the values on the X-axis. The Gauss2 curve was tted with the plotted points (see Fig. 6) with an R2 value of 0.6488. The test of signicance of the relationship between the two variables gave the result of Pearson's Correlation Coefcient (r) as -0.775 and the relationship was found 'signicant at 0.01 level' (2 tailed). Therefore, it can be said that the model output showed a signicant level of reliability in assessing the degree of susceptibility of dropout of a particular spatial unit by taking the specied ranges of socio-economic variables as input.

Mapping, discussion and conclusion
The spatial mapping of the different levels of dropout susceptibility was the prime objective of the present work. The output of the fuzzy model resulted in the pointwise values of ISDO, which was then standardised with reference to the Mean and Standard Deviation of the distribution. A total of 173 such points over the total area of the district of 6259 km2 (i.e. on average 36 km2 per point or likelihood of getting one survey point for each 6 km × 6 km grid, approximately) were intensive enough to express the spatial differentiation of dropout susceptibility fairly. All the points with respective values of were used in QGIS 2.8 Software and the map of the spatial variation of favourability of educational attainment was generated (Figure 7). The map was assigned WGS 84 (EPGS 4326) Coordinate Reference System.
A careful observation of Figure 8 reveals the spatial extension of favourable socio-economic environment for a greater attainment as well as the zones with a low degree of favourability, which is highly susceptible for school dropout, are represented by a very low value of . However, the identications of the specic regions with a tendency to favour or disfavour the attainment required the preparation of map with a more specic demarcation of the dropout zones. In connection with this objective, the whole district can be broadly categorised into two parts: (i) the areas with the level of susceptibility 'on and above the average' and (ii) the areas with 'below-average' level of susceptibility.
The regions of the study demarcated with showed that most of southern, south-western and western blocks of the district (except whole of Jhalda-I block) can be attributed as the areas with a higher susceptibility to dropout. There are also some isolated pockets in the eastern and middle blocks of the district with similarly higher susceptibility ( Figure 8). All of the blocks of Baghmundi, Balarampur, Bandowan, Barbazar, Arsha, Jhalda-II and Jaypur regions are demarcated as highly susceptible. However, all these blocks with higher dropout susceptibility also possess some common geographic characteristics: Firstly, all these blocks are located at the western edge of the district which is also an inter-state boundary neighbouring with the state of Jharkhand; secondly, they have the higher share of Scheduled Tribe (ST) population; thirdly, these blocks have a greater share of forest covered area to total geographical area; and lastly, these blocks exhibit comparatively lower rate of female literacy than other blocks in the district.
The primary data regarding the income insecurity indicated that the blocks, which are susceptible for a high dropout, also show a higher degree of insecurity 124 Mukunda Mishra & Soumendu Chatterjee to income generating processes (evidenced by higher values of CII) and the vulnerability to a secured income trims the expenditure towards education and discourages the long-term attainment process. Besides, the lack of essential educational amenities and services, which are strictly associated with the urban spaces, makes the peripheral areas of the district to experience a higher susceptibility of school dropout than that of the areas surrounding the district headquarter and other urban municipal areas (marked in Figure 8.) The above analysis brings to fore a very interesting fact that the degree of susceptibility of school dropout of a given area is linked with the relative position of that area in the settlement hierarchy. Not only the urban areas but also the block headquarters and larger villages exhibit lower susceptibility of dropout than that of the smaller villages at their lower order of hierarchy. The larger settlements with a longer tradition of attainment, diverse occupational opportunities, better educational infrastructure, conscious communities and high-quality human resources associated with the education system offer a better situation that favours the longer attainment and reduces the susceptibility of school dropout.
The factors causing the spatial difference of dropout are multidimensional in nature; and, admittedly, all of the social-economic phenomena cannot be explained easily, thus the task of elaborating all the variable becomes extremely challenging. This study considered eight basic variables for the susceptibility mapping; however, further renement of the data structure and utilisation of more relevant variables may add more precision in the demarcation of advanced or vulnerable zones as well as provide meaningful insight towards addressing the causes of such distribution. Besides this, nding relevant environmentalsocio-political variables with a ner resolution and the 'mathematisation' of human behaviour and cognition are very challenging issues for the socioeconomic scientists, planners and researchers. Thus achieving the accuracy level of the output of such prediction models to a desired 'benchmark' within the domain of human geography becomes a tedious job.