使用Python挖掘newsfeed文本并提取其观点
1、环境搭建
本使用了Anaconda的python3.6虚拟环境,并且需要额外安装以下package:
- tqdm (a progress bar python utility): pip install tqdm
- nltk (for natural language processing): pip install nltk
- bokeh (for interactive data viz): conda install bokeh
- gensim: pip install --upgrade gensim
xxxxxxxxxx
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime
from tqdm import tqdm, tqdm_notebook
from functools import reduce
def getSources():
• source_url = 'https://newsapi.org/v1/sources?language=en'
• response = requests.get(source_url).json()
• sources = []
• for source in response['sources']:
• sources.append(source['id'])
• return sources