Effective Data Mining Techniques for Unstructured Data in Big Data

Masterarbeit, 2017
40 Seiten, Note: 10

Informatik - Angewandte Informatik

Leseprobe

INTRODUCTION

IDEA AND MOTIVATION
LITERATURE SURVEY

PROBLEM DEFINITION AND SCOPE

SCOPE
SOFTWARE CONTEXT
SOFTWARE CONSTRAINTS
OUTCOMES
HARDWARE SPECIFICATION
S/W SPECIFICATION
AREA OF DISSERTATION

DISSERTATION PLAN

PROJECT PLAN
TIMELINE OF PROJECT
FEASIBILITY STUDY

Economical Feasibility
Technical Feasibility
Operational feasibility
Time Feasibility

RISK MANAGEMENT

Project Risk
Risk Assessment

EFFORT AND COST ESTIMATION

Lines of code (LOC)
Effort
Development Time
Number of People

SOFTWARE REQUIREMENT SPECIFICATION

INTRODUCTION

Purpose
Scope of Document
Overview of responsibilities of developer

PRODUCT OVERVIEW

Block diagram

FUNCTINAL MODEL

Flow diagram
Data Flow Diagram
UML Diagrams

Sequence diagram
Class diagram

Non-Functional Requirements

BEHAVIORAL MODEL AND DESCRIPTION

Description of software behavior
Use case diagram

DETAILED DESIGN

ARCHITECTURE DESIGN

Algorithms

INTERFACES

Human Interface
Database interface

TESTING

INTRODUCTION

Goals and Objective

TESTING STRATEGY

White Box Testing
Black Box Testing
System testing
Performance testing

DATA TABLE AND DISCUSSION

INPUT TO THE SYSTEM
OUTPUT
PERFORMANCE OF PROPOSED SYSTEM

Performance of proposed system with respect to baseline algorithm
Performance of proposed system with respect to blowfish encryption algorithm

RESULT

Difference between proposed algorithm and base algorithm i.e provider aware algorithm

SUMMARY AND CONCLUSION

FUTURE ENHANCEMENT

REFERENCES

Objectives and Key Themes

The dissertation aims to develop an effective data mining technique for both structured and unstructured big data, focusing on privacy preservation during data sharing from distributed databases. The work explores the challenges of anonymizing data while maintaining privacy and examines existing techniques to address this issue.

Privacy-preserving data analysis and publishing
Data anonymization techniques
Collaborative data publishing
Trusted third-party (TTP) role in data sharing
Insider attacks and their mitigation

Chapter Summaries

The dissertation begins by introducing the idea and motivation behind developing a new data mining technique for big data, with a focus on privacy preservation. It then defines the problem and scope of the dissertation, outlining the software context, constraints, and expected outcomes. Chapter 3 details the project plan, timeline, and feasibility study, including economic, technical, operational, and time feasibility aspects. Chapter 4 focuses on the software requirement specification, outlining the purpose, scope of the document, and responsibilities of the developer. It also includes a product overview with block diagrams, functional models with flow diagrams and data flow diagrams, and a detailed analysis of UML diagrams such as sequence diagrams and class diagrams. Finally, Chapter 5 dives into the detailed design, examining the architecture design and algorithms used, as well as interface details, including human and database interfaces.

Keywords

The primary focus of the dissertation lies in the intersection of big data, data mining, privacy preservation, and data anonymization. It investigates techniques for collaborative data publishing and the role of a trusted third party in ensuring data privacy while facilitating data sharing from distributed databases. Key concepts include privacy-preserving data analysis, insider attacks, and the development of a new algorithm for data anonymization, addressing the challenges of data sharing while maintaining privacy for individuals and sensitive information.

Frequently Asked Questions

What is Big Data mining?

It is a technique for extracting meaningful patterns and information from large-scale, complex datasets, including structured and unstructured data.

Why is privacy important in Big Data?

When sharing sensitive data like healthcare records, it's crucial to anonymize information to protect individual identities and comply with regulations.

What role does Hadoop play in data storage?

Hadoop is a framework used to store and process vast amounts of data across clusters of computers, making it ideal for managing Big Data.

What is the difference between structured and unstructured data?

Structured data is organized (like databases), while unstructured data includes things like social media posts, videos, and patient symptoms in text form.

What is an "insider attack" in data security?

An attack where someone within an organization or a trusted third party misuses their access to compromise sensitive data.

Ende der Leseprobe aus 40 Seiten - nach oben

Details

Titel: Effective Data Mining Techniques for Unstructured Data in Big Data
Hochschule: Rajiv Gandhi University (PATEL COLLEGE OF SCIENCE AND TECHNOLOGY)
Veranstaltung: COMPUTER SCIENCE
Note: 10
Autoren: Dnyandeo Khemnar (Autor:in), Nilesh Thorat (Autor:in)
Erscheinungsjahr: 2017
Seiten: 40
Katalognummer: V1307474
ISBN (eBook): 9783346783394
ISBN (Buch): 9783346783400
Sprache: Englisch
Schlagworte: Big data Data mining Hace theorem Map Reducer Privacy Preservation Mechanism.
Produktsicherheit: GRIN Publishing GmbH
Preis (Ebook): US$ 18,99
Preis (Book): US$ 20,99

Arbeit zitieren: Dnyandeo Khemnar (Autor:in), Nilesh Thorat (Autor:in), 2017, Effective Data Mining Techniques for Unstructured Data in Big Data, München, Page::Imprint:: GRINVerlagOHG, https://www.diplomarbeiten24.de/document/1307474