I Automated My Entire Research Workflow With 10 Free APIs

**MyrinNew** · 03-25-2026, 03:26 AM

Two weeks ago, I started a research project that required:

Academic papers from multiple databases
Patent data
Clinical trial information
Security checks on all downloaded files

Manually, this would take days. With 10 free APIs, I automated it in an afternoon.

Here's the stack I built.

The Research Pipeline

Query → OpenAlex (papers) → Crossref (metadata) → Unpaywall (free PDFs)
→ PubMed (medical) → ClinicalTrials.gov (trials) → Patents (USPTO)
→ Semantic Scholar (AI summaries) → Export → Analyze

Each step is one Python function. Total code: ~200 lines.

Step 1: Find Papers (OpenAlex)

import requests

def find_papers(topic, limit=20):
resp = requests.get('https://api.openalex.org/works', params={
'search': topic, 'per_page': limit,
'sort': 'cited_by_count:desc'
})
return [{
'title': w['title'],
'doi': w.get('doi'),
'citations': w['cited_by_count'],
'year': w.get('publication_year')
} for w in resp.json()['results']]

papers = find_papers('CRISPR gene editing therapy')
print(f"Found {len(papers)} papers, top cited: {papers[0]['citations']}")

Step 2: Enrich Metadata (Crossref)

def get_metadata(doi):
if not doi: return {}
doi_id = doi.replace('https://doi.org/', '')
resp = requests.get(f'https://api.crossref.org/works/{doi_id}')
if resp.status_code != 200: return {}
item = resp.json()['message']
return {
'publisher': item.get('publisher'),
'journal': item.get('container-title', [''])[0],
'references': item.get('references-count', 0)
}

Step 3: Find Free PDFs (Unpaywall)

def find_pdf(doi):
if not doi: return None
doi_id = doi.replace('https://doi.org/', '')
resp = requests.get(f'https://api.unpaywall.org/v2/{doi_id}',
params={'email': 'research@example.com'})
data = resp.json()
if data.get('is_oa'):
return data['best_oa_location'].get('url_for_pdf')
return None

Step 4: Get AI Summaries (Semantic Scholar)

def get_tldr(title):
resp = requests.get('https://api.semanticscholar.org/graph/v1/paper/search',
params={'query': title, 'limit': 1, 'fields': 'tldr'})
papers = resp.json().get('data', [])
if papers and papers[0].get('tldr'):
return papers[0]['tldr']['text']
return 'No summary available'

Step 5: Check Related Trials (ClinicalTrials.gov)

def find_trials(topic, limit=5):
resp = requests.get('https://clinicaltrials.gov/api/v2/studies', params={
'query.term': topic, 'pageSize': limit, 'format': 'json'
})
return [{
'nct_id': s['protocolSection']['identificationModule']['nctId'],
'title': s['protocolSection']['identificationModule']['briefTitle'],
'status': s['protocolSection']['statusModule']['overallStatus']
} for s in resp.json().get('studies', [])]

Step 6: Check Patents (USPTO)

def find_patents(topic, limit=5):
resp = requests.post('https://api.patentsview.org/patents/query', json={
'q': {'_text_any': {'patent_abstract': topic}},
'f': ['patent_number', 'patent_title', 'patent_date'],
'o': {'per_page': limit},
's': [{'patent_date': 'desc'}]
})
return resp.json().get('patents', [])

The Full Pipeline

def research(topic):
print(f"Researching: {topic}\n")

# Papers
papers = find_papers(topic, limit=10)
print(f"📚 {len(papers)} papers found")

# Enrich top 5 with metadata + PDFs
for p in papers[:5]:
meta = get_metadata(p['doi'])
pdf = find_pdf(p['doi'])
tldr = get_tldr(p['title'])
print(f" • {p['title'][:60]}")
print(f" Citations: {p['citations']} | Journal: {meta.get('journal', 'N/A')}")
print(f" PDF: {'✅' if pdf else '❌'} | TLDR: {tldr[:80]}...")

# Clinical trials
trials = find_trials(topic)
print(f"\n🏥 {len(trials)} clinical trials")
for t in trials:
print(f" [{t['status']}] {t['title'][:60]}")

# Patents
patents = find_patents(topic)
print(f"\n📜 {len(patents)} patents")
for p in patents:
print(f" [{p['patent_date']}] {p['patent_title'][:60]}")

research('CRISPR gene editing therapy')

Results

For one query, I got:

10 highly-cited papers with metadata
4 free PDFs (via Unpaywall)
AI summaries for all papers
5 active clinical trials
5 related patents

All in under 30 seconds.

All Toolkits (Open Source)

I packaged each step into its own toolkit:

1	OpenAlex	250M+ academic works
2	Crossref	150M+ article metadata
3	PubMed	36M+ medical papers
4	Semantic Scholar	AI summaries
5	arXiv	2.4M+ preprints
6	CORE	300M+ open access
7	Unpaywall	Find free PDFs
8	ClinicalTrials.gov	500K+ trials
9	USPTO Patents	8M+ patents
10	Security Scanner	5 security APIs

Full collection: awesome-free-research-apis

What would you automate if you had all these APIs in one pipeline? I'm curious about creative use cases.

Need custom data pipelines? My tools | GitHub

More...