Bachelorarbeit, 2013
77 Seiten, Note: A
1. Introduction
1.1. Text Mining
1.2. Information Extraction
1.3. SMS Marketing
1.4. Project description
1.5. Benefits that will come from solution
1.6. Limitations
1.7. Need for Solution
1.8. Outline
2. Basic Concepts
2.1. Web Extraction
2.2. Limitations
3. Literature Review
3.1. Information Extraction
3.2. Learning Extractors from Unlabeled Text using Relevant Databases
3.3. Information Extraction from the Web: Techniques and Applications
3.4. A Survey of Web Information Extraction Systems
3.5. Information Extraction A Survey
4. System Design
4.1. Proposed Model Design
4.2. Flow Chart
5. Implementation
5.1. System Development tool
5.2. System Requirements
5.3. Software Requirements
5.4. Hardware Requirements
5.5. System Description
5.6. Hybrid Approach
6. Testing
6.1. Test case01
6.2. Test case02
6.3. Test case03
6.4. Test case04
6.5. Results
7. Conclusion & Future Work
7.1. Conclusion
7.2. Future Work
8. References
This work aims to develop an efficient, cost-effective software solution for SMS marketing by automating the extraction of targeted mobile phone numbers from websites, thereby reducing manual effort and human error in data gathering.
5.6. Hybrid Approach
In this project a hybrid approach including, pattern mining by using regular expression and Document Object Model (DOM) techniques is applied to mine web links from websites. This hybrid technique consists of following steps:
a. Find Source code by using HtmlWebRequest and HtmlWebResponse library.
b. Use DOM to find links of website.
c. Use Regular Expressions to mine phone numbers from Source code of website.
d. Clean data by using Regular expression and conditional statements.
e. Save phone Numbers to external file.
1. Introduction: Provides an overview of text mining, information extraction, and the specific needs for automated SMS marketing solutions.
2. Basic Concepts: Outlines the fundamental techniques for web extraction, specifically focusing on web interaction and request/response models.
3. Literature Review: Surveys existing research and systems regarding web information extraction, including various learning extractors and survey studies.
4. System Design: Details the architectural model, process flow, and specific use cases for the proposed software system.
5. Implementation: Describes the development tools, system requirements, and the technical implementation of the hybrid extraction approach.
6. Testing: Analyzes the functionality of the system through defined test cases and evaluates the performance and accuracy of the extraction process.
7. Conclusion & Future Work: Summarizes the achievements of the project and suggests potential future optimizations such as FPGA implementation.
8. References: Lists the academic and technical sources consulted during the research.
Text Mining, Information Extraction, SMS Marketing, Web Scraping, .NET Framework, Regular Expressions, Document Object Model, Data Parser, Automated Extraction, Software Testing, Hybrid Approach, Phone Number Mining
The research focuses on creating an automated software solution that extracts mobile phone numbers from websites to facilitate SMS marketing for businesses, eliminating the need for manual data entry.
The work covers text mining, information extraction, web scraping, and software development within the context of digital marketing automation.
The primary goal is to provide a reliable, low-cost system that identifies and harvests targeted customer phone numbers from unstructured web pages with high efficiency.
The system uses a hybrid approach combining the Document Object Model (DOM) for navigating website structures and Regular Expressions for identifying and cleaning specific phone number patterns.
The implementation focuses on using the .NET Framework and Visual C#.NET to build a functional tool that can navigate URLs, extract links, filter phone numbers, and save the output in text or CSV formats.
The work is defined by terms such as Information Extraction, SMS Marketing, Web Scrapping, and automated data processing.
The system's accuracy is evaluated through specific test cases, resulting in an reported average accuracy rate of 80% during experimental testing.
Key limitations include the necessity of an active internet connection and the challenge that some websites employ anti-scraping defenses that block automated bots.
Regular Expressions are essential for defining search patterns that accurately capture sequences of digits matching mobile phone number formats in different regions.
The authors suggest optimizing performance, refining data cleaning algorithms, and exploring hardware acceleration via FPGA to increase processing speed.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

