Create an application in Ravello from a clone:
I have two machines, k1 for elesticsearch (no ssh access), saltok for ssh with private key jump.
Installers & instructions:
ElasticSearch: https://www.elastic.co/downloads/elasticsearch
cerebro: https://github.com/lmenezes/cerebro
fscrawler: https://github.com/dadoonet/fscrawler
Configure and Start ElasticSearch:
vi elasticsearch-6.0.0/config
network.host: 0.0.0.0 transport.host: localhost transport.tcp.port: 9300
elasticsearch-6.0.0/bin/elasticsearch-plugin install x-pack
elasticsearch-6.0.0/bin/elasticsearch
Crawl a website into a directory:
wget --no-clobber --convert-links --random-wait -r -p --level 10 -E -e robots=off -U mozilla https://javiermugueta.wordpress.com
Configure fscrawler:
fscrawler-2.4/bin/fscrawler javi --loop 1 --rest --username elastic --upgrade
Edit config file (/home/oracle/.fscrawler/javi/_settings.json) and set index directory:
{ "name" : "javi", "fs" : { "url" : "/oracle/javi", ...
Launch fscrawler:
fscrawler-2.4/bin/fscrawler javi --loop 1 --username elastic
Launch cerebro:
cerebro-0.7.1/bin/cerebro
Connect to cerebro ui:
http://k1-cotscfbp21sep20172-yji6xufr.srv.ravcloud.com:9000
Make a query:
javi/_search?q=Almost every cloud should have its (i)PaaS
In addition I’ve crawled the whole www.intratext.com to a directory and indexed with fscrawler: more than 1,6 million docs indexed!
Enjoy 😉