Language/Library choice for data analysis of urls and earnings

bernard

BuSo Pro
Joined
Dec 31, 2016
Messages
2,599
Likes
2,302
Degree
6
Knowing which urls bring income is obviously important, but merely knowing the total income is not enough.

I grab income from my affiliate networks with their api, which delivers csv of urls, date and commission. Naturally these url's can have various parameters and such.

I want to:

1) See earnings pr. url for each time period, say last month, last 3 months, last 1 year
2) See earnings pr. url for a time series, so that I can calculate an average payback period for outsourced content

Currently I use Python and import into Google Sheets.

I was considering if Numpy might be the correct choice here?
 
Yeah Python + Pandas and Jupyter Notebooks. Write different python scripts to pull the data and put it into a SQL database using requests and run them on chrontab.

I've written this for auto optmizing traffic, but then I pulled the data from a PPC tracker, and the affiliate networks posted back conversions to a S2S postback. This is probably the best way to go, as you can get the data more normalized.

Run all your clicks through a tracker, and run one chronjob per affiliate network/account.

Some ideas you can try:

Code:
Table: Clicks

ID | Click DateTime | Click URL | Click Param 1 | Click Param 2 | Click Param 3

Code:
Table: Affiliate Conversions
ID | Click_ID (foreign key) | Conversion Data Time | Payout | Status (refund, pending, upsell, accepted, duplicate) | Whatever | Whatever

You could have another table for affilaite networks, and maybe one for websites, and set foreign keys on the tables to link to these. Then you can summarize an affiliate network or a website. Maybe one for offers too. Many options.

Then just join the tables and aggregate them however you please. The affiliate clicks should be one to many to affiliate conversions. One click can have multiple sales: pending, upsell or refund (with negative amount, to support partial refunds). But a sale will have one click (even if the user clicked many times, only one ID will fire (first or last depending on attribution).

Maybe you can allow Conversions to have NULL as click too, and you can attribute these to "direct".

Don't store them as aggregated, but store the clicks. If you store it aggregated it will be a bunch of junk you can't work with.
 
Is there anything Pandas/Jupyter books does that google sheets doesn't? with the benefit that sheets data is availabe everywhere
 
Is there anything Pandas/Jupyter books does that google sheets doesn't? with the benefit that sheets data is availabe everywhere

Maybe pivot tables can handle some of this?
 
Well, Pandas has a more data-centric approach and is much more powerful. You can do some of the same work using Google Apps Scripts - but then again if you're going to code it, might as well do it using the data science Swiss army knife that Pandas is.

Once you have something you like in Jupyter, there's nothing stopping you from making an actual script/software to do this work and display it in a web app control panel.

You can hook Jupyter and Pandas up to an SQL database, and execute SQL, easily remove N/A (without needed to write formulas and add new columns), or any other kind of manipulation (i.e. aggregate the data into time series).

It's not a spreadsheet or WYSIWYG -- you need to know Python, and there's a lot of cans with a lot of worms in them. I'd recommend just installing Anaconda or something similar.

I'm by now means an expert, but it's something I'm playing with now and then.
 
Back