A Web-Based Prototype Course Recommender System using Apache Mahout

Projektarbeit, 2017
88 Seiten, Note: BSc Honours in Computer Science

Informatik - Sonstiges

Leseprobe

Abstract

Declaration

Acknowledgements

List of Figures

List of Tables

1 Introduction
1.1 Introduction
1.2 Definition of Problem
1.3 Importance of Problem
1.4 Overview of the Research
1.5 Structure of the Document
1.6 Conclusion

2 Background and Related Work
2.1 Introduction
2.2 Overview of Recommender Systems
2.2.1 Content-based Systems
2.2.2 Collaborative Systems
2.2.3 Hybrid Systems
2.3 Applications of Recommender Systems
2.3.1 Applications in Video Streaming
2.3.2 Applications in Music Selection
2.3.3 Applications in Public Transport
2.4 Algorithms in Collaborative Systems
2.4.1 Practical Use of Collaborative Systems
2.4.2 Association Rule Mining
2.4.3 User-Based Collaborative Algorithms
2.4.4 Item-Based Collaborative Algorithms
2.5 Related Work
2.6 A Platform for Recommender Creation: Apache Mahout
2.7 Conclusion

3 Research Method
3.1 Introduction
3.2 Research Questions
3.3 End-User System Architecture
3.4 Research Methodology
3.4.1 The CRISP-DM Methodology
3.4.2 Using SAS for Data Processing
3.4.3 Finding Correlations in the Data
3.4.4 Installation of Apache Mahout
3.4.5 Storage of Data in a Database
3.4.6 Using the Functionality of Apache Mahout
3.4.7 Developing the Web-Based Components of the Application
3.5 Evaluating the System
3.5.1 Evaluating Coverage
3.5.2 Evaluating Accuracy
3.5.3 Evaluating Volume of Data
3.6 Conclusion

4 Results
4.1 Introduction
4.2 Overall Dataset Statistics
4.3 Relationships Between Courses and Performance
4.3.1 Relationship Between Computer Science I and Algebra I
4.3.2 Relationship Between Computer Science I and Calculus I
4.3.3 Relationships Between Elective Courses and Performance
4.4 Courses with Low Grades
4.5 Strongest Determinant of Performance
4.6 Combinations of Courses to be Recommended
4.7 Coverage of Generated Recommendations
4.8 Accuracy of Generated Recommendations
4.9 Volume of Data and its Sufficiency
4.10 Web-Based Course Enrolment and Recommender System
4.10.1 Web Browser Specifications
4.10.2 Application Design and Use
4.11 Conclusion

5 Discussion
5.1 Introduction
5.2 The Significance of the Results
5.2.1 Courses and Performance
5.2.2 Reliability and Accuracy of Recommendations
5.2.3 The Web-Based Recommender Application
5.3 Implications of the Results
5.4 Limitations of the Research
5.5 Conclusion

6 Conclusion and Future Work

A Descriptive Statistics

Abstract

Most universities offer a wide range of courses in which students can enrol. As a result, students may feel overwhelmed with the many possibilities and large amount of information, resulting in having a difficult time deciding what to sign up for. To this end, there is a need for a system that can assist students in this crucial process. Collaborative recommenders are known to be useful for finding similarities between students and course combinations that they have taken. Thus, we set out to develop a web-based recom-mender application that could generate a list of valuable, accurate course recommendations, taking into account a student’s likelihood of succeeding academically. We used a historical dataset containing data about past Computer Science I students at the University of the Witwatersrand, including the combina-tions of courses they took and their respective marks. We calculated the relationships between courses and performance, to find which courses students did well or badly in. The dataset was also used with Apache Mahout, a free library of recommender algorithms, in order to generate course recommendations. This was done by using the Spearman correlation to determine similarities between past students in order to recommend courses to new students that other students had performed well in. The web components of the system were developed with JSP and Servlet software. We evaluated the recommender system on the basis of coverage and accuracy. We found there to be strong correlations between individual course marks and overall year marks, indicating that recommending courses past students did well in was likely to increase chances of doing well overall. The implemented system was found to have a coverage of 100%, indicating all students in the test dataset were able to have recommendations generated for them. The accuracy of the system, measured by the F1 metric, was found to be 0.66 (reaching as high as 0.72 at smaller user neighbourhood sizes), meaning that the recommendations generated by the system were accurate in the majority of cases. This allowed us to determine that the size of the dataset used to train the system was sufficient. Finally, the web application that was developed was intuitive, easy-to-use and incorporated the elements of the recommender system successfully in order to convey recommendations to students. Thus, it was possible to conclude that a collaborative recommender approach, developed in a web-based environment using Apache Mahout, is suitable in order to suggest relevant courses to students, while striking a balance between the students’ own interests and crucial field-related material in order to ensure academic success. Such a system would be an asset to a university, increasing its students’ chances of passing and thus increasing its own reputation as a result.

Declaration

I, Mike Nkongolo (Student Number: 1171635), am a student registered for BSc Honours in Com-puter Science in the year 2016. I hereby declare the following:

I am aware that plagiarism (the use of someone elses work without their permission and/or without acknowledging the original source) is wrong.

I confirm that ALL the work submitted for assessment for the above course is my own unaided work except where I have explicitly indicated otherwise.

I have followed the required conventions in referencing the thoughts and ideas of others.

I understand that the University of the Witwatersrand may take disciplinary action against me if there is a belief that this in not my own unaided work or that I have failed to acknowledge the source of the ideas or words in my writing.

Signature: Date:

Acknowledgements

I would like to thank my supervisor, Mr. Michael Mchunu, for his helpful assistance and guidance with my research topic.

List of Figures

2.1 Types of recommender systems

2.2 Example of Item-Based Collaborative Recommendation

3.1 System design as seen by the end-user

3.2 Proposed structure of system

4.1 Frequency Distribution of Marks of All Students

4.2 Scatter Plot of COMS1000 Marks vs MATH1034 Marks

4.3 Scatter Plot of COMS1000 Marks vs MATH1036 Marks

4.4 Scatter Plot of Elective Course Marks vs Weighted Average Mark

4.5 The log in screen of the web application

4.6 Compulsory course page

4.7 Recommended courses page

4.8 Page displaying all possible courses

4.9 Confirmation of courses

A.1 Descriptive Statistics for the Student Dataset

A.2 Descriptive Statistics for the Student Dataset (Part Two)

List of Tables

2.1 A typical set of users and their ratings for items

3.1 Dataset Characteristics

3.2 Transformed Dataset Characteristics

4.1 Overall course results from all years

4.2 Compulsory Course Results

4.3 Average Marks Across All Elective Courses

4.4 Elective course results from all years

4.5 A typical set of recommendations produced by the course recommender system

4.6 Coverage Values for Different Correlation Methods and Neighbourhood Numbers

4.7 Precision and Recall of Generated Recommendations for Different Correlation Tech- niques and Neighbourhood Sizes

4.8 F1 Values of Generated Recommendations for Different Correlation Techniques and Neighbourhood Sizes

4.9 Latest Versions of Recommended Browsers for the Course Enrolment System

Chapter 1 Introduction

1.1 Introduction

In modern-day society, countless opportunities are open to young adults leaving secondary education. For many of them, going straight to work after finishing high school is seen as the most attractive option, for whatever reason one might have. One of the reasons may be the need to support themselves and their families financially, or they may simply be less academically inclined and therefore not keen on studying further. On the other hand, for other high school leavers, pursuing a higher level of education is seen as a safer option and a worthwhile investment for the future. A wider choice of jobs and career paths, and in many instances higher potential salaries, are generally open to people with a degree and a higher level of specific subject knowledge.

Typically, universities and other tertiary institutions are commonly associated with higher levels of study. Universities typically offer academic programs encompassing a range of disciplines, with the promise of the conferment of a degree, diploma or certificate at the end of the academic program. Popular fields of study most students are interested in include the arts, sciences, humanities, engineering and commerce. To qualify in any one of these fields, students must work consistently throughout the program. They must display a thorough understanding of their course content and what they have learnt in order to be recognised as worthy of these prestigious honours.

In addition, the combination of courses that a student must take during their academic program must match their academic programme. The selected combination of courses must offer content at a high enough difficulty level in order for the qualification received to be meaningful and worth the time invested acquiring it. For a student doing a degree program in computer science, for example, it would make no sense for the majority of courses the student is doing to be related to biology or engineering. Courses should also follow on from each other, from year to year, to enable students to use the knowledge gained in a previous year in order to appreciate and solve more challenging problems in the years still to come. Various university faculties therefore have a tremendous task creating a degree structure that not only makes the degree competitive nationally and globally, but also ensures that it is completed within a reasonable amount of time, to enable the students to enter the workplace, should they so wish.

Selecting which courses to do in an academic programme cannot be left solely at the discretion of deci-sion makers in a university. While it is accepted that some courses are absolutely necessary in a particular field in order to gain crucial knowledge and skills, students must also be allowed some level of freedom in selecting the courses they would like to do. Students may have their own personal tastes and interests, and may favour doing certain optional courses over others. Unfortunately, they might not have a full understanding of what each course entails, in terms of its breadth and depth of content coverage. They would thus not be able to make an informed decision in order to maximise both their enjoyment of their studies and their chance of successful academic performance.

This introductory section serves to provide a greater understanding of this issue. In Section 1.2, we describe the subject choice issue that students face in a more detailed manner. This is followed by a rigorous justification of the importance of the issue at hand in Section 1.3, and why it is necessary to carry out involved research on such a topic. Next, an outline of the methods intended to be used in this research is given in Section 1.4. Finally, the overall structure of this paper is described in Section 1.5, and the chapter is ultimately concluded in Section 1.6.

1.2 Definition of Problem

As mentioned in Section 1.1, choosing an effective and stimulating set of courses is not an easy task for a student. There are several factors at play when it comes to choosing courses that one must study. One of these factors may be the assumed difficulty of a course that a student is considering to take [O’Mahony and Smyth 2007]. Of course, if the course is compulsory, then the student has no choice but to enrol in it. However, in the situation in where there are many different subjects to choose from, the student may shy away from taking optional courses that might pose a significant challenge, in terms of workload or being unable to fully understand the course content. These courses would clearly have a negative effect on academic performance. However, there may also be some students who are looking to be challenged, and for whom choosing more difficult courses would be an exciting challenge.

Another important consideration is to be aware of and take into account the student’s own personal interests [O’Mahony and Smyth 2007]. University study does not necessarily have to be restricted to only courses within the same subject area. Since a student will already have taken a number of courses that are compulsory, he or she may decide that they would like to take a course or a number of courses from a different field in order to broaden their skills and make their knowledge more well-rounded. However, this may come at the cost of students not learning some topics that may be useful for their understanding in later courses in their specific field. Thus, it is clear that there is a trade-off between interests and required subject knowledge that a student has to make when deciding what subjects to enrol for.

In addition, there are other external factors to consider that may not be directly related to a student’s personal preferences. For example, a very wide selection of possible courses may impose a burden on students. A student may not make a fully correct decision as a result of having too many courses to choose from [O’Mahony and Smyth 2007]. Students may miss out on crucial courses relevant to their academic interests, and may thus not be able to do the courses they wish to do. Also, the relative importance of certain courses may not be fully understood, perhaps due to insufficient course descriptions. In the case where students want to take courses closely related to their particular career path, certain courses in that field may be more relevant than others which may not be immediately apparent to the student, resulting once again in poor decisions. The chances of students failing as a result of poor decision making on their part would increase. Hence, a more effective way of selecting courses is needed, both for student satisfaction in terms of what they would like to learn, as well as to guarantee their likelihood of good academic performance, in terms of marks and grading.

Undoubtedly, the problem discussed is extremely important and relevant to any particular university that accepts new students. A more detailed discussion of the importance of this problem is provided in Section 1.3.

1.3 Importance of Problem

Following the discussion of the problem in Section 1.2, the issue discussed can be summarised in two ways: the happiness of students with their courses, and the likelihood of them being successful in their studies. Clearly, these two factors have implications not only for students, but for the university they attend as well. A student who is unhappy with the material being taught in a course may choose to de-register from it. This may result in the student having to wait until the next semester or year in order to register for another course, which results in more time being wasted. From a university’s perspective, any self-funding students who choose to leave the university would result in a loss in fees that would have been paid, not only in that particular year but in future years as well. Furthermore, negative feedback from dissatisfied students may cause potential students to decide not to enrol in a course or at a particular university. This sentiment is reflected in [Mehboob et al. 2012], where it was shown that friends’ opinions have a significant impact on a student or potential student’s course and university selection. This negative publicity could have wide-reaching consequences.

In terms of academic performance, effective course selection is of utmost importance in ensuring a student is able to succeed in her or his studies and obtain their qualification(s). Students that achieve good grades in all their years of study are likely to find work and proceed to have a successful career using the knowledge they have gained from their studies. This is also important for a university, since it is able to advertise the success stories of its students and continue to attract future students. On the other hand, selecting irrelevant courses or courses that do not combine well together can cause students to fail or drop out of university, especially if they cannot afford to repeat due to financial problems. A large number of failures or dropouts reflects badly on a university, causing it to lose its reputation and its ability to stand out from other institutions. Hence, any improvement in the way students select courses is likely to have a profound effect on their success rate and the standing of their university, and thus all stakeholders would benefit.

1.4 Overview of the Research

In order to come to a solution for the issue of course selection, it is important to consider what data is available that could potentially be useful, and how to interpret and interact with this data. Clearly, there would have been a large number of students that have registered and succeeded in the same fields in the past. By looking at the combinations of courses some of these students took, and their performance in these courses, it is possible to determine certain course combinations that worked particularly well, and which course combinations were likely to result in failure. Information on courses that were often selected together by similar students in the past can also be of use in determining the combinations of courses students were generally interested in. This information can then be used to enable students to select courses in a less cumbersome manner.

To implement this approach, we used a historical dataset containing information about past Computer Science I students. This dataset, sourced from the School of Computer Science and Applied Mathe-matics, contains relevant data pertaining to past academic performances of students over a number of years and the courses that each student enrolled in. We processed this data using various transformation procedures in order to obtain a clean, working dataset on which analysis could be performed. Using the valuable facts gained, we then developed a collaborative course recommender system, which can be used by students to make more informed decisions on the courses that they will potentially register for, providing them with detailed information on relevant courses they can take as well as their chances of passing when selecting particular course combinations. This recommender system was developed as a web-based Java program, which students can access from their computers or mobile devices with ease, in order to make their choices in a timely manner.

1.5 Structure of the Document

The following chapter, Chapter 2, takes a detailed look into the background knowledge required for un-derstanding the important concepts of recommendation and recommender systems, including the design and implementation of course recommender systems. We detail each type of commonly-used recom-mender system, including advantages and disadvantages, and provide examples of their application. A collaborative recommender system was implemented in this study. The various algorithms that are used to implement recommender systems are explained in detail, in order to provide an understanding of their inner workings and why they are used. We highlight the use of these algorithms in some recent papers in order to demonstrate their effectiveness. Finally, we go into detail about Apache Mahout, the platform we used to create the recommender system. The platform is Apache Mahout.

In Chapter 3, the motivation for this research is presented. We also state, in detail, the research questions that we answered in this research, and justify the necessity of investigating them. The system design for the end user is outlined, and the methods with which we implemented our intended system and as a result answered the research questions are also given.

In Chapter 4, we provide an in-depth look at the results of our research. We describe the statistics of the dataset we used, and we then move to the results for each specific research question we posed in Chapter 3. The design of the web application we created is also examined in detail.

Chapter 5 discusses the obtained results, going into detail about why certain results came about. We also discuss what effects the implementation of the system would bring.

Finally, Chapter 6 concludes the report, with an overall look at the study we carried out, as well as the types of future work that could add value to the research that was done.

1.6 Conclusion

Universities play an important role in preparing their students for their future careers. One of the most important considerations to be made is that of course selection, not only in terms of students’ individual learning desires but also for practical reasons related to the required knowledge in order to succeed, both in their university courses and in later life. An ill-informed combination of courses can impact negatively on the students themselves, and also on the university. It is thus in the university’s best interests to implement a system that can recommend effective course choices to students, in order to ensure that success is achieved.

Chapter 2 Background and Related Work

2.1 Introduction

The issue of course selection in university can pose a major challenge for students. Students may have to choose between courses restricted to their subject area and optional courses they may be interested in, for example. There is a need to develop a system that can assist students in making these decisions. For this purpose, recommender-based approaches are a worthwhile consideration for implementing such a system. There are different types of recommender systems, each having their positives and negatives. In addition, the platform on which to implement a chosen recommender system is also important to inves-tigate. It is therefore important to discuss and understand the different types of recommender systems, and the platform that was used to implement the type of recommender system selected for this research.

In Section 2.2, we provide a comprehensive overview and discussion of the different types of recom-mender systems available, with subsections 2.2.1 to 2.2.3 each focusing on a different type of recom-mender system. Next, Section 2.3 looks at applications of recommender systems in various fields. In Section 2.4 we describe in detail the inner workings of collaborative filtering systems, with subsections 2.4.1 to 2.4.4 describing the workings of the algorithms involved. Section 2.5 looks at past work on collaborative recommender systems, with a focus on their use in educational settings. Section 2.6 uses the concepts introduced in earlier sections to describe and discuss Apache Mahout as the platform for the implementation of the recommender system. Finally, Section 2.7 concludes the chapter.

2.2 Overview of Recommender Systems

Generally, the main purpose of recommender systems is to make useful determinations on possible links between the user of the system and some objects or items based on some form of cleaned in-put data, and thus to output these particular findings. These outputs may be in the form of an ordered list or a singular value, depending on whether the person is looking for a list of recommendations or a prediction [Vozalis and Margaritis 2003]. Such outputs or findings can then be used in such a way as to assist the user in making decisions.

illustration not visible in this excerpt

Figure 2.1: Types of recommender systems

Figure 2.1 above displays an outline of different types of recommender systems. There are content-based, collaborative and hybrid recommender systems. In this research a web-based collaborative recommender system will be implemented to recommend courses to first time first year Computer Science students at the University of the Witwatersrand.

2.2.1 Content-based Systems

One of the most important and widely used types of recommender systems is the content-based filtering system. In this recommendation approach, the system uses data based on previously-recorded facts about a particular user, or any other relevant data currently contained by the system about the user. The system attempts to find similarities between the available data in order to effectively recommend a certain object or item to the user [Adomavicius and Tuzhilin 2005].

In the context of course selection systems implemented in universities, for example, these types of rec-ommender systems may look at courses the student has taken in previous years in order to provide a better recommendation of current courses. The system may also use keyword selection to look at course descriptions to find courses that are similar to those already selected. O’Mahony and Smyth [2007] demonstrated a similar approach in creating a (More Like This) recommender system, in which course descriptions were scanned and compared with descriptions of other courses within the course database, with the most similar courses being selected and ranked.

Clearly, this type of recommender system has the potential to be very effective in performing recommen-dations, as it offers a relatively simple and yet practical means of gathering and comparing information. However, there are also a number of limitations. The approach will work well with more simple features such as text. However, other types of data such as real world data may be difficult to process and under-stand accurately by the system without significant time and processing power, due to the very dynamic nature of the information [Adomavicius and Tuzhilin 2005]. Also, the system may only recommend a set of courses from one field or faculty, and thus may miss out on some courses that could be valuable but are not directly related to the subject field. In this case, the system may be made to filter out the most similar suggestions in order to introduce a wider range of options.

2.2.2 Collaborative Systems

In contrast to content-based filtering, collaborative filtering methods do not rely solely on historical data about the user of the system and what their past decisions were. Instead, collaborative systems recom-mend items to the current user by using data containing the preferences of similar users, or similarities between different items in terms of ratings given to them by users. These constitute the two different types of collaborative recommender system - user-based recommendation and item-based recommenda-tion.

In user-based recommendation, similarities between the current user and other users are examined in depth [Adomavicius and Tuzhilin 2005]. For example, this may include similarities related to types of goods purchased at an on-line store, or similarities in the area of education and similarities between courses taken by different students. Users that give similar ratings to the same products or services are grouped together, and the system determines that other items that one of the users liked are also likely to be enjoyed by the second user. Useful, informative correlations between these users are thus found. Using the data obtained from this, it is then possible for the system to make an informed prediction or recommendation that is likely to be in some way helpful to the current user [Adomavicius and Tuzhilin 2005]. Using the previous examples, goods that the consumer would likely be interested in, or courses that a student might want to enrol for would be suggested. Thus, the end result is that the user is therefore be assisted in making an effective decision of their own, based on the particular field that the system is being used in.

In contrast, item-based recommendation methods take items themselves into consideration, rather than users [Adomavicius and Tuzhilin 2005]. The similarities in ratings between different items are examined and items that are rated similarly are grouped together. The system then compares these items to any items the user has already rated in order to find similarities, and if possible, recommend these items to the user.

illustration not visible in this excerpt

Figure 2.2: Example of Item-Based Collaborative Recommendation

Figure 2.2 shows an example of the results displayed by an item-based collaborative recommender used by Amazon to recommend similar items to the user, by considering the item the user is currently viewing [Linden et al. 2003]. The goal is to maximise sales and profits by predicting what products current customers will be interested in, based on past purchases made by similar customers. This method of advertising has had a significant impact in growing Amazon to a leading on-line retailer [Linden et al. 2003].

Adomavicius and Tuzhilin [2005] state that one of the major advantages of using collaborative systems is the fact that, due to their approach of solely considering similarity with other users’ choices, the recommender system does not require a baseline understanding of the actual content of what is being recommended. This is in contrast to content-based systems, which look at actual keywords in text-based objects, as mentioned earlier. The variety of item categories that the collaborative recommender system can be applied to is generally much greater compared to content-based systems, since the raw content of the items is not being considered [Adomavicius and Tuzhilin 2005].

Collaborative filtering approaches do have some drawbacks. Considering the fact that collaborative filtering can be applied to a larger number of disciplines, certain disciplines with a very large number of users and/or items would require a large amount of processing power in order to scan through the available data and find effective recommendations [Vozalis and Margaritis 2003]. Also, the approach may potentially leave out a large number of items which would be relevant to the user. The filtering method only considers objects that have been favoured or selected by other users in the past, and so any items that have not been selected before are not considered [Vozalis and Margaritis 2003]. Collaborative systems may therefore not always be preferable over content-based approaches and the specific problem in question must be analysed carefully before selecting a type of system to use.

We outline the implementation details of these systems in greater detail in a later section.

2.2.3 Hybrid Systems

Content-based and collaborative systems have their advantages and disadvantages, some of which over-lap with each other. A significant constraint with one filtering type, such as restriction to text-based information in content-based filtering, may be entirely negated by the other type of system, as in collab-orative systems being able to process more diverse types of data. Some form of combination between these two types of systems may therefore have a positive influence on the outcome of the recommen-dation process. Systems that combine content-based and collaborative components are known as hybrid systems.

There are a number of methods used to implement hybrid systems [Adomavicius and Tuzhilin 2005]. The first and most obvious method is to simply transfer relevant features of one system into the other, creating a partially combined system and feature set. The system may compare similarity amongst all users for a set of items, but at the same time, can also take into account the past decisions made by the user in order to arrive at a more well-rounded approach. In this way, common problems associated with collaborative systems can be eliminated, such as when items that have not been selected by other users before are being ignored.

Another approach is to simply obtain the results of a content-based filter on the dataset and environment, and thereafter get results from a distinct collaborative approach as well. These two sets of results can be considered together to arrive at a final recommendation set [Adomavicius and Tuzhilin 2005]. An effec-tive method when concatenating results would be to set different weights for each set of results based on the current environment of data [Vozalis and Margaritis 2003]. Depending on the number of items in the database that have been selected by the user-base, the system can decide whether to place more or less weight on the results of the content-based or the collaborative filtering, as collaborative results would be more meaningful at a stage when the vast majority of items in the database could be accessed via their relation to users having previously selected them, for instance. It can thus be seen that there may be good reason to use hybrid systems in certain situations.

Having discussed the different recommender systems, in Section 2.3 we focus various applications of recommender systems in different fields.

2.3 Applications of Recommender Systems

Content-based, collaborative and hybrid recommender systems are all used extensively in a wide variety of disciplines. Lee and Hosanagar [2016] have noted the impact of recommender systems on conversion rates (causing potential customers to decide to buy products) in electronic commerce areas. They found that rates of conversion are indeed increased after recommender systems have been implemented in companies’ web-pages, citing a near-6% increase, which is a significant amount especially for large businesses. In addition, the possible use of recommender systems in place of traditional customer reviews to help customers in decision-making was found to be viable. It can be seen from this that the use of recommender systems in different scenarios is worth considering. In the following subsections, we look at successful implementations of recommender systems in different applications.

2.3.1 Applications in Video Streaming

Perhaps the most common area where recommender systems are relevant is the Internet sector. Amazon has already been discussed as being a prominent example of successful recommender system imple-mentation. Another major Internet-based application of recommender systems is Netflix, a subscription-based video streaming service [Gomez-Uribe and Hunt 2015]. As users browse the selection of movies and television shows that Netflix offers, they are given recommendations of particular shows to watch. This recommendation procedure is based heavily on the hybrid approach discussed in Section 2.2.3. Net-flix takes into account both the user’s own past viewing preferences, as well as the similarities between different users based on common shows watched. Effective recommendations are thus able to be gen-erated based on this information. Gomez-Uribe and Hunt [2015] as well as Lee and Hosanagar [2016] have made similar findings regarding the effects of recommender systems on business success. They ex-plained that, as a result of recommendations, users that would have otherwise chosen not to renew their subscriptions to the service have instead carried on paying for the service due to having a continuous flow of interesting shows to watch. Thus, extremely large amounts of income continue to be generated by the service.

2.3.2 Applications in Music Selection

Another field with possible applications of recommender systems is the Internet-based music market. In particular, helping users to choose songs they would like to listen to or purchase is a major focus. iTunes is an example of a successful implementation of such a system [Yoshii et al. 2008]. A collaborative approach is used, wherein the purchase decisions by users who bought sets of music tracks are looked at when proposing products to similar customers. Yoshii et al. [2008] proposed an update to such traditional systems, by introducing a hybrid system. Before, recommendation was based primarily on content-based features of music, in particular, the sound waves contained within the musical piece. Aspects of the sound waves were broken up into distinct features for the system to analyse, and when combined with the genre of music the piece was sourced from, recommendations were able to be generated. In addition, collaborative-based features, such as common music choices between users were also examined and factored in to increase the accuracy of recommendations. This hybrid approach was found to be able to generate reliable recommendations, and thus would likely be useful in business-related applications similar to those discussed earlier, in order to boost profits.

2.3.3 Applications in Public Transport

Recommender systems also have the potential to be successfully applied in the field of public transport. Yuan et al. [2013] proposed a system whose purpose was to make the process of locating a taxi more streamlined both for the end user and for the driver, economically as well as logistically. Major tasks included finding approaches that made taxi driving more financially viable for drivers, as well as increas-ing convenience to customers. A method for doing this was described, which involved finding optimal routes to the customer’s pick-up location as well as factoring in elements that are likely to result in both the driver and the customer waiting for the smallest amount of time possible. Since this is a collab-orative recommender system approach, the main data used was obtained from past data about similar situations, looking at which routes proved to be the most effective in the largest number of cases. From tests carried out in real-world situations, Yuan et al. [2013] found the system to perform reliably in terms of generating effective route and pick-up location suggestions, allowing both the driver and customer to benefit.

It is thus easy to appreciate how valuable recommender systems, in general, can be in a variety of disciplines. In Section 2.4, we provide a detailed discussion of the collaborative systems approach, which will be used in this research.

2.4 Algorithms in Collaborative Systems

2.4.1 Practical Use of Collaborative Systems

The limitations of collaborative systems were discussed in Section 2.2.2. Considering course enrolment systems, some of these limitations are mitigated, making the use of collaborative systems an attractive option in this respect. To be more specific, a university is unlikely to introduce a significantly large number of new courses into the system in any given year, meaning that the problem of new courses being ignored due to not having been selected by students before is relatively minimal. The system could even be modified to give higher priority to the recommendation of these new courses, an approach that could combat this problem.

2.4.2 Association Rule Mining

Before delving deeper into collaborative systems, we start by discussing the concept of association rule mining. Vozalis and Margaritis [2003] provide a concise discussion of this topic, while Lee and Cho [2011] give a more detailed explanation. Essentially, an association rule describes the association or relationship between a number of items in a dataset based on the frequency with which they are selected or otherwise chosen at the same time by the person interacting with the system. To explain further, two or more items in a dataset that are often selected simultaneously by a user are said to be associated [Lee and Cho 2011]. Association rules can be particularly useful in collaborative recommender systems, since recommendations can be made based on the association rules generated using a large number of items and their related user-base.

These associations are created based on a few relevant criteria. The first that will be detailed here is the support value [Lee and Cho 2011]. The support value of an item or set of items correlates to how often that particular item is actually chosen by any user, out of all instances in which users have chosen items. It is useful to obtain these support values since they can be compared with some minimum support value that has been selected as the threshold for purposes of making recommendations. Any items whose support value falls below this threshold are disregarded. In other words, low-scoring items that are extremely unlikely to be related to the selections of the user can be effectively removed, making the process more efficient.

Another important value to calculate in association rule mining is the confidence value [Lee and Cho 2011]. This value defines the likelihood that a user selects two or more particular items or sets of items simultaneously, i.e. the likelihood of association between the two items. Similar to the support value metric described above, a minimum confidence level is also chosen arbitrarily. After selecting items above the support value threshold, confidence values are then calculated for these items and items with a high enough confidence are then selected for recommendation. The recommendation algorithm produces a final shortlist of items (i.e. a set of 10 items, for example) by ordering them in descending order, based on their confidence levels [Vozalis and Margaritis 2003].

Frequently Asked Questions about the Language Preview

What is the Language Preview about?

This comprehensive language preview includes the title, table of contents, objectives and key themes, chapter summaries, and key words of a document. It appears to be related to research on a course recommender system.

What topics are covered in the Table of Contents?

The table of contents includes sections such as the Abstract, Declaration, Acknowledgements, List of Figures, List of Tables, Introduction, Background and Related Work, Research Method, Results, Discussion, Conclusion and Future Work, and Descriptive Statistics.

What does the Abstract discuss?

The abstract describes the development of a web-based recommender application designed to assist students in course selection. It takes into account a student's likelihood of academic success and utilizes collaborative recommender principles, using a historical dataset of Computer Science I students. Apache Mahout is used to generate course recommendations based on student similarities.

What does the Declaration section entail?

The Declaration is a statement from the student, Mike Nkongolo, declaring that the work is their own and acknowledging the policies against plagiarism.

What information is found in the Acknowledgements section?

The Acknowledgements section expresses gratitude to Mr. Michael Mchunu for his supervision and guidance on the research topic.

What kind of content is included in Chapter 1 (Introduction)?

Chapter 1 provides an overview of the challenges students face when selecting courses, the importance of informed course selection, and an outline of the research methodology. It defines the problem of course selection, explains its importance, and details the research's structure.

What topics are covered in Chapter 2 (Background and Related Work)?

Chapter 2 discusses recommender systems, including content-based, collaborative, and hybrid systems. It also explores applications of recommender systems in various fields like video streaming, music selection, and public transport, as well as algorithms used in collaborative systems and related work in the field.

What does Chapter 3 (Research Method) describe?

Chapter 3 outlines the research questions, end-user system architecture, and the research methodology used. It includes details about the CRISP-DM methodology, data processing using SAS, correlation analysis, Apache Mahout installation, data storage, system evaluation, and the development of web-based components.

What results are presented in Chapter 4 (Results)?

Chapter 4 presents the results of the research, including overall dataset statistics, relationships between courses and performance, courses with low grades, the strongest determinant of performance, combinations of courses to be recommended, coverage and accuracy of recommendations, and the web-based course enrollment and recommender system.

What discussions are included in Chapter 5 (Discussion)?

Chapter 5 discusses the significance and implications of the results, the reliability and accuracy of the recommendations, and the web-based recommender application. It also addresses the limitations of the research.

What is the focus of Chapter 6 (Conclusion and Future Work)?

Chapter 6 provides a conclusion to the report, summarizing the study and suggesting future work that could add value to the research.

What information is included in the Appendix A (Descriptive Statistics)?

Appendix A presents the descriptive statistics for the student dataset used in the research.

What types of recommender systems are mentioned?

Content-based systems, collaborative systems (user-based and item-based), and hybrid systems are described.

What is Apache Mahout?

Apache Mahout is a free library of recommender algorithms used to generate course recommendations in this research.

What is the CRISP-DM methodology?

CRISP-DM is a data mining methodology used in the research for data processing and analysis.

How was the system evaluated?

The system was evaluated on the basis of coverage and accuracy.

What is a collaborative recommender system?

A collaborative recommender system recommends items to users based on the preferences of similar users or the similarities between items in terms of ratings given to them by users.

What is the purpose of association rule mining in recommender systems?

Association rule mining identifies relationships between items based on how frequently they are selected together, which can be used to generate recommendations.

Ende der Leseprobe aus 88 Seiten - nach oben

Details

Titel: A Web-Based Prototype Course Recommender System using Apache Mahout
Veranstaltung: Honors research project
Note: BSc Honours in Computer Science
Autor: Mike Nkongolo (Autor:in)
Erscheinungsjahr: 2017
Seiten: 88
Katalognummer: V375739
ISBN (eBook): 9783668554351
ISBN (Buch): 9783668554368
Dateigröße: 4418 KB
Sprache: Englisch
Schlagworte: Wits
Produktsicherheit: GRIN Publishing GmbH
Preis (Book): US$ 55,99

Arbeit zitieren: Mike Nkongolo (Autor:in), 2017, A Web-Based Prototype Course Recommender System using Apache Mahout, München, Page::Imprint:: GRINVerlagOHG, https://www.diplomarbeiten24.de/document/375739