[2020-03-17]
"GitHub blocked me and all my libraries" https://news.ycombinator.com/item?id=22593595 [2020-05-31]
Own your content on Social Media using the IndieWeb - YouTube [[dataliberation]][2019-10-03]
another big goal is having little operational overhead. I’d rather set up a (potentially elaborate) system once and tthen never have to update it and think how it works [[exports]] [[infra]]
[2021-03-04]
Importance of agnostic exports: ofter you start backing up before you process the data [2020-01-01]
ChromeDevTools/devtools-protocol: Chrome DevTools Protocol [[exports]] [[scrape]][2020-04-19]
open files using utf-8 encoding (fixes #5) by miguelrochefort · Pull Request #6 · karlicoss/rexport [2021-01-19]
bisguzar/twitter-scraper: Scrape the Twitter Frontend API without authentication. [[twitter]] [[exports]][2019-07-28]
jonbakerfish/TweetScraper: TweetScraper is a simple crawler/spider for Twitter Search without using API [[twitter]]
[2019-07-29]
taspinar/twitterscraper: Scrape Twitter for Tweets
[2020-04-05]
Our plan is for the next version of HN’s API to simply serve a JSON version of e… | Hacker News [[hackernews]][2020-04-07]
Profile: karlicoss | Hacker News [2020-04-29]
need to mirror HN… [[hackernews]] [[exports]][2021-03-05]
it’s impresive that pretty much every tool for exporting has some flaws [[hackernews]][2021-01-10]
Hypothesis [[takeout]][2019-06-11]
eh, recompressing to .tar.xz only saved 100 mb [[takeout]][2020-04-23]
I’ve found Google Takeouts to silently remove old data | beepb00p [2020-04-24]
Takeout/My Activity/Search data is limited to last 10 years. Please remove limit - Google Search Community [2020-04-29]
> I’ve already pulled down my 2-300GB Google Photos archive How? I’ve tried sev… | Hacker News [2020-05-04]
I replied to a similar point about hashing here - https://news.ycombinator.com/i… | Hacker News [2020-01-01]
perkeep/gphotos-cdp: This program uses the Chrome DevTools Protocol to drive a Chrome session that downloads your photos stored in Google Photos. [[scrape]][2019-06-28]
After hoarding over 50k YouTube videos, here is the youtube-dl command I settled on. : DataHoarder [2020-01-01]
perkeep/gphotos-cdp: This program uses the Chrome DevTools Protocol to drive a Chrome session that downloads your photos stored in Google Photos. https://github.com/perkeep/gphotos-cdp [2018-08-18]
Emfit has local API; would be nice to use it… [[emfit]]
[2019-12-17]
downloadEmfitAPI.py https://gist.github.com/vanne02135/6901cc2b92315881080d0ce0f07c1a17 [2020-05-29]
emfit API didn’t work for about three days straight… [[emfit]] [[backup]][2019-12-21]
samuelmr/emfit-qs: Unofficial Node client for Emfit QS [2019-09-29]
yeah, could elaborate on backing up android data, could be quite generic? [[android]][2020-01-11]
Getting Started — PRAW 3.6.0 documentation [[reddit]][2020-08-29]
seanbreckenridge/ffexport: export and interface with firefox history/visits and site metadata [2020-02-01]
motivation for github backups [[exports]][2019-09-01]
Usage of /users/{ids}/favorites GET - Stack Exchange API [[promnesia]][2019-09-16]
shit. seems that no way to get upvoted posts… https://meta.stackexchange.com/questions/299264/how-to-get-the-list-of-all-posts-ive-upvoted-via-the-api [2019-09-16]
https://meta.stackexchange.com/questions/148008/how-can-i-see-comments-that-ive-upvoted [2019-09-16]
fuck. I guess I’m gonna have to scrape votes… https://stackoverflow.com/users/706389/karlicoss?tab=votes [2020-01-11]
kensanata/mastodon-backup: Archive your statuses, favorites and media using the Mastodon API (i.e. login required) [2019-12-29]
halcy/Mastodon.py: Python wrapper for the Mastodon ( https://github.com/tootsuite/mastodon/ ) API. [[mastodon]][2019-04-19]
Pinboard on Twitter: "Next question is, does a raw API call give the same results as the website? The API and website search engine run off of different indexes.… https://t.co/CZrLE7YNWo" [[pinboard]][2020-07-31]
alexattia/Maps-Location-History: Get, Concatenate and Process you location history from Google Maps TimeLine [[location]] [[timeline]] [[qs]][2020-10-25]
Garmin Connect [[garmin]][2020-12-30]
Notice: This project is unmaintained · Issue #613 · fbchat-dev/fbchat [[facebook]][2019-06-13]
joeyates/imap-backup: Backup GMail (or other IMAP) accounts to disk [[email]][2020-12-13]
https://bandcamp.com/developer no listening history though… [2019-04-08]
python - Steam API get historical player count of specific game - Stack Overflow [2019-07-14]
fabianonline/telegrambackup: Java app to download all your telegram data. [2020-10-03]
Statify: Pull your playlist and listening data from the Spotify API to a Sqlite database /r/coolgithubprojects [2019-04-23]
feedbin/feedbin-api: Feedbin API Documentation [[feedbin]][2020-11-27]
Success Stories · tcgoetz/GarminDB Wiki [[garmin]][2020-12-19]
Importing your Goodreads & Accessing them with Open Library’s APIs [2020-06-24]
Telegram Now Lets You Export Your Chats, View Notification Exceptions | Technology News [[telegram]][2019-09-02]
vincaslt/memparse: A Memrise courses parser https://github.com/vincaslt/memparse [2019-04-01]
Polar AccessLink Api Daily Activity Goal /r/Polarfitness [2020-03-05]
signalnerve/roam-backup: Automated Roam Research backups using GitHub Actions and AWS S3 [2020-02-03]
Data lake - Wikipedia [[dal]] [[exports]][2020-04-21]
fucking hell. so materialistic export stopped working [[phone]][2017-01-21]
Играюсь с IMDB, думал придется beautiful soap доставать айтемы из вотчлиста, а там в стейте реакта лежит JSONка [[exports]][2019-12-27]
‘hostage model’ is a good term [[toblog]] [[dataliberation]] [[sadinfra]][2020-01-15]
Hi, Camlistore author here. Andrew Gerrand worked with me on Camlistore too and… | Hacker News [[infra]] [[exports]][2021-02-06]
— [2020-04-13]
twintproject/twint: An advanced Twitter scraping & OSINT tool written in Python that doesn’t use Twitter’s API, allowing you to scrape a user’s followers, following, Tweets and more while evading most API limitations. [2020-04-23]
MatthieuBizien/roam-to-git: Automatic RoamResearch backup to Git [2020-04-28]
timgrossmann/InstaPy: 📷 Instagram Bot - Tool for automated Instagram interactions [2021-02-05]
Chiaki/VKBK: Инструмент для создания и синхронизации локального бэкапа вашего профиля ВКонтакте (Profile backup & synchronization tool for Vk.com) [[vk]] [[exports]][2021-02-07]
Against developer terms of service? · Issue #171 · Tyrrrz/DiscordChatExporter [2021-02-25]
ryanmcgrath/twython: Actively maintained, pure Python wrapper for the Twitter API. Supports both normal and streaming Twitter APIs. [[python]] [[twitter]][2021-02-04]
Privacy Policy - October 15, 2020 - Reddit [[reddit]] [[exports]][2021-02-05]
Rapptz/discord.py: An API wrapper for Discord written in Python. [[discord]] [[exports]][2021-02-08]
Oura ring vs. Emfit QS (My detailed comparison) - What do you think? - Quantified Self / Sports, Physical Activity, and Fitness - Quantified Self Forum [[emfit]] [[exports]][2021-03-10]
Quickstart — StackAPI 0.1.12 documentation [[exports]][2021-03-07]
Exporting my own comment content from Disqus? · Discuss Disqus · Disqus [[disqus]] [[exports]][2021-03-23]
Your eBay data [[ebay]][2021-04-18]
exobrain/data/exportsgdpr at master · seanbreckenridge/exobrain [[exports]] [[gdpr]]I need data exports to build tools around my personal data, and the actual process of exporting it from a silo is the first step.
After I export it I use it to build a ‘data mirror’.
Here I mostly keep the notes about the data I haven’t finished exporting.
The ones I have already/mostly finished are mentioned here:
Similar to backups.
[2020-03-17]
"GitHub blocked me and all my libraries" https://news.ycombinator.com/item?id=22593595[2020-05-31]
Own your content on Social Media using the IndieWeb - YouTube [[dataliberation]][2019-10-03]
another big goal is having little operational overhead. I’d rather set up a (potentially elaborate) system once and tthen never have to update it and think how it works [[exports]] [[infra]][2019-10-03]
that involves automatic ci [[ci]][2019-10-03]
continuous cloud sync [[cloud]][2019-10-03]
automation/cron jobs for orger [[dron]][2021-03-04]
Importance of agnostic exports: ofter you start backing up before you process the data[2020-01-01]
ChromeDevTools/devtools-protocol: Chrome DevTools Protocol [[exports]] [[scrape]]https://github.com/ChromeDevTools/devtools-protocol
[2020-04-19]
open files using utf-8 encoding (fixes #5) by miguelrochefort · Pull Request #6 · karlicoss/rexportapply this to export helper…
Easiest option is just to have separate scripts to run regularly?
so the backup script could provide TODO
so need a way to trigger backup from promnesia indexer itself? Fairly easy to achieve as it’s all just python code?
in promnesia
def index_reddit():
from exporters.reddit import export
# TODO?
return
might be annoying to implement token retrieval on JS only?
[2020-04-12]
add this to myinfra repository??[2020-05-27]
dunno, I’m a bit tired and not as motivated to build it… but could post so someone else picks up [[toblog]]Twitter is a big pain in the ass, they’ve become very hostile towards API access.
Even the archives are somewhat incomplete (e.g. favorites lack some metadata).
E.g. from Apply for API — Twitter Developers
Be thorough
We need to completely understand your use case before we can approve it. So, please include as much detail as possible in your application.
[2020-04-28]
shit.. also RTs are shortened?? so I need to get retweets properly?[2021-01-19]
bisguzar/twitter-scraper: Scrape the Twitter Frontend API without authentication. [[twitter]] [[exports]]Even though Twint uses db, they seem to treat is as a temporary storage, so the schema might change.
I’m also not super convinced by how reliable the code is (from quick glance), so would worry about data loss.
[2019-07-28]
jonbakerfish/TweetScraper: TweetScraper is a simple crawler/spider for Twitter Search without using API [[twitter]][2021-02-09]
doesn’t work, this error :( https://github.com/bisguzar/twitter-scraper/issues/168[2019-07-29]
taspinar/twitterscraper: Scrape Twitter for TweetsOne of the bigger disadvantages of the Search API is that you can only access Tweets written in the past 7 days. This is a major bottleneck for anyone looking for older past data to make a model from. With TwitterScraper there is no such limitation.
[2021-02-09]
https://github.com/taspinar/twitterscraper/issues/344 broken as wellcompare tw-before.org (twint) and tw-after.org (twidump) in views
retweets in twint are def missing
[2020-04-05]
Our plan is for the next version of HN’s API to simply serve a JSON version of e… | Hacker News [[hackernews]]https://news.ycombinator.com/item?id=22788526
Our plan is for the next version of HN's API to simply serve a JSON version of every page. I'm hoping to get to that this year.
[2020-04-07]
Profile: karlicoss | Hacker Newshttps://news.ycombinator.com/user?id=karlicoss
user: karlicoss
created: August 25, 2016
karma: 757
capture HN karma? maybe on all comments
[2020-04-29]
need to mirror HN… [[hackernews]] [[exports]]could also have ‘exact’ time notion and ‘approximate’ time – when it’s guessed from the file timestamp etc
[2021-03-05]
it’s impresive that pretty much every tool for exporting has some flaws [[hackernews]]don’t have ci
https://github.com/HackerNews/API
https://hacker-news.firebaseio.com/v0/user/karlicoss.json?print=pretty – get user data
extract ‘submitted’
https://hacker-news.firebaseio.com/v0/item/25971799.json?print=pretty – comment
https://hacker-news.firebaseio.com/v0/item/25971380.json?print=pretty – type: "story"
dunno if useful to keep scores over time?
not sure if should dump everything in a single json? or split by files?
can change later I guess
Google Takeout doesn’t have a proper API, and periodic expots are kind of annoying… would be good to automate them.
Another difficulty is that the data seems to have a certain retention,
so you can’t just take the latest takeout, for some data you need to merge all of them.
[2019-09-28]
life-vault/seleniumtakeout.py at master · ThorbenJensen/life-vault https://github.com/ThorbenJensen/life-vault/blob/master/src/takeout/selenium_takeout.pymaybe release my module for 2FA separately?
https://github.com/ThorbenJensen/life-vault/blob/master/src/takeout/selenium_takeout.py
[2021-01-10]
Hypothesis [[takeout]]Seriously, check out ratarmount if you haven't. Since the Google Takeout spans multiple 50GB tgz files (I'm at ~14, not including Google Drive in the takeout), ratarmount is brilliant. It merges all of the tgz contents into a single folder structure so /path/a/1.jpg and /path/a/1.json might be in different tgz folders but are mounted in to the same folder.
[2019-06-11]
eh, recompressing to .tar.xz only saved 100 mb [[takeout]]20180807 My Activity/Discover/MyActivity.html 20190523 20181015 My Activity/Discover/MyActivity.html 20190522 20181213 My Activity/Discover/MyActivity.html 20200122
[2020-04-23]
I’ve found Google Takeouts to silently remove old data | beepb00phuh, so with my script to search takeout duplicates, I’ve figured out that from 2015 at least Search/MyActivity.html hasn’t been erased? interesting
but looks like Chrome/MyActivity.html still being removed
[2020-04-24]
Takeout/My Activity/Search data is limited to last 10 years. Please remove limit - Google Search CommunityTakeout/My Activity/Search data is limited to last 10 years. Please remove limit
[2020-04-29]
> I’ve already pulled down my 2-300GB Google Photos archive How? I’ve tried sev… | Hacker Newscuu508 1 hour ago [-]
Takeout doesn't work in practice for bigger collections (archive creation routinely fails, timeouts while downloading, 50GB max size results in many splits)
I've used this 3rd party tool and it worked OK: https://github.com/gilesknap/gphotos-sync/
geekgonecrazy 1 hour ago [-]
I forgot to mention this. But yes the export failed several dozen times. I believe I ended up doing in chunks. It was hard to get them off
[2020-05-04]
I replied to a similar point about hashing here - https://news.ycombinator.com/i… | Hacker NewsYou're correct that the methods I described are a far cry from actually guaranteeing that the backup has no errors. In the same way that a unit test doesn't prove code is error-free, but _can_ justify increased confidence in the code, I'm interested in techniques that can justify increased confidence in my backups. Particularly in cases where I don't have direct access to the original data, and where exhaustively checking the data manually is too time-consuming to be worth it.
yes!
[2020-01-01]
perkeep/gphotos-cdp: This program uses the Chrome DevTools Protocol to drive a Chrome session that downloads your photos stored in Google Photos. [[scrape]]https://github.com/perkeep/gphotos-cdp
In our original Perkeep issue, @bradfitz said that we might have to give up on APIs and resort to scraping, noting that the Chrome DevTools Protocol makes this pretty easy.
[2019-06-28]
After hoarding over 50k YouTube videos, here is the youtube-dl command I settled on. : DataHoarderhttps://www.reddit.com/r/DataHoarder/comments/c6fh4x/after_hoarding_over_50k_youtube_videos_here_is/
After hoarding over 50k YouTube videos, here is the youtube-dl command I settled on.
[2020-01-01]
perkeep/gphotos-cdp: This program uses the Chrome DevTools Protocol to drive a Chrome session that downloads your photos stored in Google Photos. https://github.com/perkeep/gphotos-cdpwe'd like our photos mirrored in seconds or minutes, not weeks.
Emfit QS is my sleep tracker.
[2018-08-18]
Emfit has local API; would be nice to use it… [[emfit]]https://gist.github.com/harperreed/9d063322eb84e88bc2d0580885011bdd
https://gist.github.com/karlicoss/3361f6a239048a451daa2a02982ee180
[2020-09-11]
sanielfishawy/emfitdatagetter: Gets heart rate and respiration rate from an Emfit QS device on the same local network. [[emfit]][2019-12-17]
downloadEmfitAPI.py https://gist.github.com/vanne02135/6901cc2b92315881080d0ce0f07c1a17[2021-02-06]
I think I ended up just using login + password. meh[2020-05-29]
emfit API didn’t work for about three days straight… [[emfit]] [[backup]][2019-12-21]
samuelmr/emfit-qs: Unofficial Node client for Emfit QShttps://github.com/samuelmr/emfit-qs
Exchange username and password to a token (expires in 7 days). You can also log in to qs.emfit.com and check the ´remember_token´ parameter passed to API calls (e.g. with developer tools of your browser).
[2019-03-12]
[2019-09-29]
yeah, could elaborate on backing up android data, could be quite generic? [[android]][2019-09-10]
er, I guess for orger need to extract a simple reddit provider that just merges various timestamped backups?[2019-03-23]
[2019-08-25]
yep, it def happens; promnesia triggers it[2020-01-11]
Getting Started — PRAW 3.6.0 documentation [[reddit]]https://praw.readthedocs.io/en/v3.6.0/pages/getting_started.html#connecting-to-reddit >
Next question is, does a raw API call give the same results as the website? The API and website search engine run off of different indexes.
[2020-04-13]
twintproject/twint: An advanced Twitter scraping & OSINT tool written in Python that doesn’t use Twitter’s API, allowing you to scrape a user’s followers, following, Tweets and more while evading most API limitations.[2020-04-23]
MatthieuBizien/roam-to-git: Automatic RoamResearch backup to Git Format [[links]]
Format #links
Format attribute::
Format [[ [[link 1]] [[link 2]] ]]
Format ((link))
[2020-04-28]
timgrossmann/InstaPy: 📷 Instagram Bot - Tool for automated Instagram interactions[2021-02-05]
Chiaki/VKBK: Инструмент для создания и синхронизации локального бэкапа вашего профиля ВКонтакте (Profile backup & synchronization tool for Vk.com) [[vk]] [[exports]]ugh fuck.. apache & mysql? a bit much for me :(
[2021-02-07]
Against developer terms of service? · Issue #171 · Tyrrrz/DiscordChatExporterif it’s a single file, don’t do anything just yet?
or treat it as ‘simple’ with month duration or something
just do it doesn’t warn immediately. could be a takeout archive or something
(although it only maintains two?)
[2021-02-25]
ryanmcgrath/twython: Actively maintained, pure Python wrapper for the Twitter API. Supports both normal and streaming Twitter APIs. [[python]] [[twitter]]hmm still working? nice…
[2021-02-04]
Privacy Policy - October 15, 2020 - Reddit [[reddit]] [[exports]]ugh. gdpr takeout has to be emailed?
[2021-02-05]
Rapptz/discord.py: An API wrapper for Discord written in Python. [[discord]] [[exports]][2021-02-08]
Oura ring vs. Emfit QS (My detailed comparison) - What do you think? - Quantified Self / Sports, Physical Activity, and Fitness - Quantified Self Forum [[emfit]] [[exports]]Can only store 10 hours of data on the device & 360 days in the cloud
huh? motivation for exports I guess
so it could cooperate with HPI… egh not sure
[2021-03-10]
Quickstart — StackAPI 0.1.12 documentation [[exports]]By default, StackAPI will return up to 500 items in a single call. It may be less than this, if there are less than 500 items to return. This is common on new or low traffic sites.
The number of results can be modified by changing the page_size and max_pages values. These are multiplied together to get the maximum total number of results. The API paginates the results and StackAPI recombines those pages into a single result.
[2021-03-07]
Exporting my own comment content from Disqus? · Discuss Disqus · Disqus [[disqus]] [[exports]]seems hostile against exporting your own data
[2021-03-23]
Your eBay data [[ebay]]can request data takeout here… takes ages to complete though, like a week
vimdiff <(rg -A 363 -B 1 15538293160 events_20210317T120954Z.json) <(rg -A 363 -B 1 15538293166 events_20210317T120954Z.json)
[2021-04-18]
exobrain/data/exportsgdpr at master · seanbreckenridge/exobrain [[exports]] [[gdpr]]Some thoughts on how easy to parse/use GDPR/get data exports from different services. A lot of these I did just because I was curious what information/context I could glean into the past about
Rendering context...