Personal notes which I made during Andrew NG’s Stanford Machine learning class.
Check this: https://github.com/ppant/MI-Class-Standord/blob/master/personal_notes.txt
I am updating notes weekly.
Stay tuned.
My corner on the web to share learning on Software Development, Systems Design, Big data, Data Science and Technology in general.
Personal notes which I made during Andrew NG’s Stanford Machine learning class.
Check this: https://github.com/ppant/MI-Class-Standord/blob/master/personal_notes.txt
I am updating notes weekly.
Stay tuned.
I have made a small project which demonstrate Water Quality of River Ganga (India) in various places on-route (Year 2012) as a part of JHU Coursera Data Science specialization.
This project have two parts:
I have created a Shiny Application to demonstrate Water Quality of River Ganga (India) in various places on-route (Year 2012)
https://ppant.shinyapps.io/Course-Project-Data-Products-Shiny-Application/
http://rpubs.com/ppant/DevDataProductsPres
References: Data set is given from https://data.gov.in (Open Government Data Platform India)
I will do more improvement in future to give more precise results and better visualization.
Check code at Github.
One must check caret package of R it has plenty of function to perform many MI tasks like classification, training etc. Finally, CRAN is the place one should visit for R packages.
I have created a text prediction application as a part of Coursera Johns Hopkins University Capstone project.
Check below for resources.
Next Word Text Prediction Algorithm — Data Science Capstone Project by JHU and Swiftkey
Presentation:
http://rpubs.com/ppant/capstone-presentation
Application:
https://ppant.shinyapps.io/nextWordPredict/
Code:
https://github.com/ppant/Coursera-Data-Science-Capstone-Project
Request to use and provide your valuable suggestions for improvement.
Thanks
Andrew NG Machine learning class is the best class so far which I took online.
Apart from the course video sometimes lecture slides are also important for quick reference. For quite some time, I was looking for them as they are not available on course home.
Here all the lecture slides available at:
https://d396qusza40orc.cloudfront.net/ml/docs/slides/Lecture1.pdf
Lecture2.pdf
Lecture3.pdf
Lecture4.pdf
and so on…
My own experience slides only make sense if you go through the full video course. Professor is an amazing teacher.
Enjoy learning.
Brief notes of my learning from course project of getting and cleaning data course from John Hopkins University.
The purpose of this project is to demonstrate the ability to collect, work with, and clean a data set. Final goal here is to prepare tidy data that can be used for later analysis.
One of the most exciting areas in all of the data science right now is wearable computing – see for example companies like Fitbit, Nike, tomtom, Garmin etc are racing to develop the most advanced algorithms to attract new users. In this case study, the data is collected from the accelerometers from the Samsung Galaxy S smartphone. A full description is available at the site where the data was obtained:
http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
Here is the dataset for the project:
https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
I have created an R script called run_analysis.R which does the following.
http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
https://www.coursera.org/learn/data-cleaning
https://github.com/ppant/getting-and-cleaning-data-project-coursera
For working code and tidy dataset please check my Github repo.
Modern API provided by Google, Twitter, Facebook, Github etc uses OAuth for authentication and authorization. In this example, I am using GitHub API. We get a JSON response which can be used to fetch specific information. In this code I have used my Github account.Code is written R programming languages.
Here are the steps:
1. Find OAuth settings for Github
2. Create a application in Github
3. Add/Modify secret keys
4. Get OAuth credentials
5. Finally use API and parse json data to show response
## Load required modules
library(httr)
library(httpuv)
require(jsonlite)
# 1. Find OAuth settings for github:
# http://developer.github.com/v3/oauth/
oauth_endpoints("github")
# 2. To make your own application, register at at
# https://github.com/settings/applications.
## https://github.com/settings/applications/321837
## Use any URL for the homepage URL
# (http://github.com is fine) and http://localhost:1410 as the callback url. You will need httpuv
## Add Secret keys
## Secret keys can be get from developer github
myapp <- oauth_app("github",
key = "7cd28c82639b7cf76fcc",
secret = "d1c90e32e12baa81dabec79cd1ea7d8edfd6bf53")
# 3. Get OAuth credentials
github_token <- oauth2.0_token(oauth_endpoints("github"), myapp)
## Authentication will be done automatically
# 4. Use API
gtoken <- config(token = github_token)
req <- GET("https://api.github.com/users/ppant/repos", gtoken)
stop_for_status(req)
##content(req)
output <- content(req)
## Either of the two can be used to fetch the required info, name and date created of repo ProgrammingAssignment3
out<-list(output[[30]]$name, output[[30]]$created_at)
BROWSE("https://api.github.com/users/ppant/repos",authenticate("Access Token","x-oauth-basic","basic"))
# OR:
req <- with_config(gtoken, GET("https://api.github.com/users/ppant/repos"))
stop_for_status(req)
content(req)
This program will be helpful if someone want to create recur date patterns based on criteria (yearly, monthly,weekly and daily). Program is written in Perl using old version of Date::Manip CPAN module.
# Script to calculate recurrence dates based on given criteria using Perl Date::Manip module.
# All the input dates in this are given hard coded. These shall be passed through external program etc.
#!/usr/local/bin/perl -w
use strict;
use Date::Manip;
use Data::Dumper;
# calculate the dates for yearly patterns.
&yearly();
&monthly();
&weekly();
&daily();
sub yearly {
my $base = "2015-10-29";
my $start_date = "2015-10-29";
my $end_date = "2018-01-01";
my $yearly_recur_every ="1";
my $yearly_on_month = "10";
my $yearly_on_week = "0";
my $yearly_on_day = "29";
my $yearly_on_the_month = "10";
my $yearly_on_the_week = "1";
my $yearly_on_the_day = "1";
my $frequency = "";
my $frequency_pattern_yearly_on = "$yearly_recur_every*$yearly_on_month:$yearly_on_week:$yearly_on_day:0:0:0";
my $frequency_pattern_yearly_on_the = "$yearly_recur_every*$yearly_on_the_month:$yearly_on_the_week:$yearly_on_the_day:0:0:0";
my @yearly_dates_on = ParseRecur($frequency_pattern_yearly_on,$base,$start_date,$end_date); # On a certain day of a month
my @yearly_dates_on_the = ParseRecur($frequency_pattern_yearly_on_the,$base,$start_date,$end_date); # First Monday of Oct
print "\n";
print "******************************************************************************\n";
print "**************************** YEARLY *******************************************\n";
print "*******************************************************************************\n";
print "Start date :". $start_date."\n";
print "End date :". $end_date."\n";
print "\n";
print "******************************************************************************\n";
print "Temporal expression: every 1 year on October 29\n";
print "Rule: ".$frequency_pattern_yearly_on;
print "\n";
print "Dates:\n";
print Dumper (\@yearly_dates_on);
print "\n";
print "Temporal expression: every 1 year on the first Monday of October\n";
print "Rule: ".$frequency_pattern_yearly_on_the;
print "\n";
print "Dates:\n";
print Dumper (\@yearly_dates_on_the);
print "\n";
}
# Monthly
sub monthly () {
my $base = "2015-10-29";
my $start_date = "2016-01-22";
my $end_date = "2017-06-01";
my $monthly_recur_every ="1";
my $monthly_day_of = "29";
my $monthly_the_day = "1";
my $monthly_the_week = "1";
my $frequency = "";
my $frequency_pattern_monthly_day = "0:$monthly_recur_every*0:$monthly_day_of:0:0:0";
my $frequency_pattern_monthly_the_day ="0:1*-2:5:0:0:0"; # Every month on the 2nd last Friday
my @monthly_dates_day = ParseRecur($frequency_pattern_monthly_day,$base,$start_date,$end_date); # On a certain day of a month
my @monthly_dates_the_day = ParseRecur($frequency_pattern_monthly_the_day,$base,$start_date,$end_date); # First Monday of Oct
print "\n";
print "******************************************************************************\n";
print "**************************** MONTHLY *******************************************\n";
print "*******************************************************************************\n";
print "Start date :". $start_date."\n";
print "End date :". $end_date."\n";
print "\n";
print "******************************************************************************\n";
print "Temporal expression: Day 29 of every 1 month\n";
print "Rule: ".$frequency_pattern_monthly_day;
print "\n";
print "Dates:\n";
print Dumper (\@monthly_dates_day);
print "\n";
print "Temporal expression: The first monday of every month\n";
print "Rule: ".$frequency_pattern_monthly_the_day;
print "\n";
print "Dates:\n";
print Dumper (\@monthly_dates_the_day);
print "\n";
}
# Weekly
sub weekly () {
my $base = "2015-10-29";
my $start_date = "2016-01-22";
my $end_date = "2016-03-01";
my $weekly_recur_every ="1";
# We need to add comma on the value we are getting from UI .. if the field is not selected means no value then
# no comma will be added
my $first_day_of_the_week = ""; # Monday
my $second_day_of_the_week = "2,"; # Tuesday
my $third_day_of_the_week = ""; # Wednesday
my $fourth_day_of_the_week = "4,"; #Thrusday
my $fifth_day_of_the_week = ""; # Friday
my $sixth_day_of_the_week = ""; # Saturday
my $seventh_day_of_the_week = ""; # Sunday
# my $weekly_the_day = "1";
# my $weekly_the_week = "1";
my $frequency = "";
my $frequency_pattern_weekly_day = "0:0:$weekly_recur_every*$first_day_of_the_week$second_day_of_the_week$third_day_of_the_week$fourth_day_of_the_week$fifth_day_of_the_week$sixth_day_of_the_week$seventh_day_of_the_week:0:0:0";
my @weekly_dates_day = ParseRecur($frequency_pattern_weekly_day,$base,$start_date,$end_date); # On a certain day of a month
print "\n";
print "******************************************************************************\n";
print "**************************** WEEKLY *******************************************\n";
print "*******************************************************************************\n";
print "Start date :". $start_date."\n";
print "End date :". $end_date."\n";
print "\n";
print "Temporal expression: Every every week on Tuesday and Thrusday\n";
print "Rule: ".$frequency_pattern_weekly_day;
print "\n";
print "Dates:\n";
print Dumper (\@weekly_dates_day);
print "\n";
}
# Daily
sub daily () {
my $base = "2015-10-29";
my $start_date = "2016-01-22";
my $end_date = "2016-02-05";
my $daily_recur_everyday ="1";
# We need to add comma on the value we are getting from UI .. if the field is not selected means no value then
# no comma will be added
my $first_day_of_the_weekday = "1,"; # Monday
my $second_day_of_the_weekday = "2,"; # Tuesday
my $third_day_of_the_weekday = "3,"; # Wednesday
my $fourth_day_of_the_weekday = "4,"; #Thrusday
my $fifth_day_of_the_weekday = "5"; # Friday
my $daily_start_time = "8:00"; # 8AM
my $frequency = "";
my $frequency_pattern_daily_everyday = "0:0:0:$daily_recur_everyday*0:0:0";
# 0:1*1-5:$dow:0:0:0";
# "0:0:0:$n*0:0:0"; # Every nth day
my $frequency_pattern_daily_every_weekday = "0:0:$daily_recur_everyday*$first_day_of_the_weekday$second_day_of_the_weekday$third_day_of_the_weekday$fourth_day_of_the_weekday$fifth_day_of_the_weekday:0:0:0";
my @daily_dates_everyday = ParseRecur($frequency_pattern_daily_everyday,$base,$start_date,$end_date); # On a certain day of a month
my @daily_dates_every_weekday = ParseRecur($frequency_pattern_daily_every_weekday,$base,$start_date,$end_date); # On a certain day of a month
print "\n";
print "******************************************************************************\n";
print "**************************** DAILY *******************************************\n";
print "******************************************************************************\n";
print "Start date: ". $start_date."\n";
print "End date: ". $end_date."\n";
print "\n";
print "Temporal expression: Everyday\n";
print "Rule: ".$frequency_pattern_daily_everyday;
print "\n";
print "Dates:".@daily_dates_everyday."\n";
print Dumper (\@daily_dates_everyday);
print "\n";
print "Temporal expression: Every weekday\n";
print "Rule: ".$frequency_pattern_daily_every_weekday;
print "\n";
print "Dates:\n";
print Dumper (\@daily_dates_every_weekday);
print "\n";
}
# End of script
Full working code is available on GitHub with documentation.
Enjoy,
Investigating search engines and this time apache Lucy 0.4.2. I am showing a basic indexer and a small search application. See below code for indexer (This will take documents one by one and then index them). Search module will take arugument as STDIN and then will show the search result.
This is pure command line utility just to show how basic indexing and searching works using apache lucy.
indexer.pl
#!/usr/local/bin/perl use strict; use warnings; use Lucy::Simple; # # Ensure the index directory is both available and empty. # my $index = "/ppant/LucyTest/index"; system( "rm", "-rf", $index ); system( "mkdir", "-p", $index ); # Create the helper...a new Lucy::Simple object my $lucy = Lucy::Simple new( path = $index, language = 'en', ); # Add the first "document". (We are mainly adding meta data of the document) my %one = ( title ="This is a title of first article" , body ="some text inside the body we need to test the implementaion of lucy", id =1 ); $lucy-add_doc( \%one ); # Add the second "document". my %two = ( title ="This is another article" , body ="I am putting some basic content, using some words which are also in first document like implementation", id =2 ); $lucy add_doc( \%two );
# Both the documents are now indexed in path
One indexing of the documents is done we’ll make a small search script.
search.cgi
#!/usr/local/bin/perl
use strict;
use warnings;
use Lucy::Search::IndexSearcher;
my $term = shift || die "Usage: $0 search-term";
my $searcher = Lucy::Search::IndexSearcher new( index ='/ppant/LucyTest/index');
# A basic search command line which will look for indexed items based on STDIN and will show that in which document query string is found and no of hits
my $hits = $searcher hits( query =$term );
while ( my $hit = $hits next ) {
print "Title: $hit {title} - ID: $hit {id}\n";
}
# End of search.cgi
***********************************************************************
If you want to explore more check Full Code on GitHub
Create a new index
curl -X PUT "192.168.0.37:9200/test" -d '{
"settings" : { "index" : { "number_of_shards" : 1, "number_of_replicas" : 0 }}
}'
Mapping attachement type
curl -X PUT "192.168.0.37:9200/test/attachment/_mapping" -d '{
"attachment" : {
"properties" : {
"file" : {
"type" : "attachment",
"fields" : {
"title" : { "store" : "yes" },
"file" : { "term_vector":"with_positions_offsets", "store":"yes" }
} } } } }'
Shell script to convert content to base64 encoding and index
#!/bin/sh</code>
coded=`cat TestPDF.pdf | perl -MMIME::Base64 -ne 'print encode_base64($_)'`
json="{\"file\":\"${coded}\"}"
echo "$json" > json.file
curl -X POST "192.168.0.37:9200/test/attachment/" -d @json.file
curl "192.168.0.37:9200/_search?pretty=true" -d '{
"fields" : ["title"],
"query" : {
"query_string" : {
"query" : "Cycling tips"
}
},
"highlight" : {
"fields" : {
"file" : {}
} } }'
***********************************************************************
If you want to explore more check Full Code on GitHub