Harassing Town Halls and Police “on side”?

Lately, in the news, there has been a lot of coverage of the “outrage” expressed by “the public” at various congressional Town Hall meetings. This is typically presented as a spontaneous outburst by “constituents” (with the implication they are Republicans) at Town Hall meetings held by republicans.

This is simply not true. It is an orchestrated bit of Street Theater.

But what could possibly be worse?

How about holding your organizing meetings at the Police Station?

Wonder why the police in San Jose just stand around and watch Trump Supporters get beaten? Perhaps they are “on side” with one of the “teams”… (Yes, it is possible there is some innocent reason a political revolutionary activist group might be holding their meeting at a Police Station… just like it is possible I could run for Governor and win… )

One of the organizing sites, and some quotes about their desired “membership” for the groups on their map:

https://www.indivisibleguide.com/groups-nav

Locate or register Indivisible groups, group meetings, or actions in your area. Come prepared to make plans for action and meet others who are working to resist Trump’s agenda. We’ll be planning next steps to encourage our Members of Congress in Senate and House to represent us, not Trump. If we stand together, indivisible, we will win. Welcome to the resistance!

Groups in our directory are wholly independent; they are listed provided they agree to resist Trump’s agenda, focus on local, defensive congressional advocacy, and embrace progressive values. Meeting and actions are listed provided their hosts agree to resist Trump’s agenda; focus on local, defensive congressional advocacy; and embrace progressive values.

In other words, only if you are NOT in those who elected Trump (and WANT his agenda). So they are directly stating the desire to get “Congress in Senate and House” to NOT represent the will of the voters as expressed in the election, but only to represent the losers.

Political Organizing at the Milpitas Police Station

Political Organizing at the Milpitas Police Station

You can click on the image to get a larger one.

Note that the center ‘text bubble’ has a pointer to the Milpitas area. You can click on dots on their map to identify the particular groups. I chose this one as it is near me and I’ve done work for the City Of Milpitas. Notice this meeting is held regularly at the Milpitas Police Department…

Now I realize there can be all sorts of legitimate reasons for this apparent conflict of interest. Perhaps there is a general public available meeting hall in a building that just happens to be named “Police Department”… In any case, it is a very bad set of optics to your typical Trump Voter (especially so given the history of lack of protection from just such “advocacy groups” (i.e. Rioters On Parade) at places like the San Jose Trump rally).

Now look at the block of items along the right side of the page. These are “actions” planned. It is very clear they are targeting Congress Members for “direct action” events. So, want to know when the next bunch of loudmouth disruptors having a tantrum will show up at your local Town Hall? Click it… Just don’t make the mistake of thinking these things are spontaneous, or that they represent “the will of the people”. Only The Vote represents the will of the people. These folks represent the will of the Useful Idiots Of The Left… after having been told what their ‘will’ is by their masters and the Soros Brigades…

If you would like to know more about what they do, and how they do it, you can download their “guide” here:

https://www.indivisibleguide.com/download-the-guide

It has interesting things in it. Here’s from just the title page:

A PRACTICAL GUIDE FOR RESISTING THE TRUMP AGENDA

Former congressional staffers reveal best practices for making Congress listen.

Note it is focused on pressuring Congress members. NOT on respecting the vote.

From “chapter four”, one sample:

https://www.indivisibleguide.com/web#header-chapter-four

I’ve bolded a few things that you can watch for on the “News” (that is increasingly just the Propaganda Producers…) MoC and MoCs is Member Of Congress. They seem to like code words.

OPPORTUNITY 1

TOWN HALLS/LISTENING SESSIONS

MoCs regularly hold local “town halls” or public listening sessions throughout their districts or state. Tea Partiers used these events to great effect — both to directly pressure their MoCs and to attract media to their cause.
Preparation

Find out when your MoC’s next public town hall event is. Sometimes these are announced well in advance, and sometimes, although they are technically “public,” only select constituents are notified about them shortly before the event. If you can’t find announcements online, call your MoC directly to find out. When you call, be friendly and say to the staffer, “Hi, I’m a constituent, and I’d like to know when his/her next town hall forum will be.” If they don’t know, ask to be added to the email list so that you get notified when they do.
Send out a notice of the town hall to your group, and get commitments from members to attend. Distribute to all of them whatever information you have on your MoC’s voting record, as well as the prepared questions.

Prepare several questions ahead of time for your group to ask. Your questions should be sharp and fact-based, ideally including information on the MoC’s record, votes they’ve taken, or statements they’ve made. Thematically, questions should focus on a limited number of issues to maximize impact. Prepare 5-10 of these questions and hand them out to your group ahead of the meeting.
Example question:

“I and many district families in Springfield rely on Medicare. I don’t think we should be rationing health care for seniors, and the plan to privatize Medicare will create serious financial hardship for seniors who can’t afford it. You haven’t gone on the record opposing this. Will you commit here and now to vote no on Bill X to cut Medicare?”

SHOULD I BRING A SIGN?

Signs can be useful for reinforcing the sense of broad agreement with your message. However, if you’re holding an oppositional sign, staffers will almost certainly not give you or the people with you the chance to get the mic or ask a question. If you have enough people to both ask questions and hold signs, though, then go for it!

At the Town Hall

Get there early, meet up, and get organized. Meet outside or in the parking lot for a quick huddle before the event. Distribute the handout of questions, and encourage members to ask the questions on the sheet or something similar.

Get seated and spread out. Head into the venue a bit early to grab seats at the front half of the room, but do not all sit together. Sit by yourself or in groups of two, and spread out throughout the room. This will help reinforce the impression of broad consensus.

Make your voices heard by asking good questions. When the MoC opens the floor for questions, everyone in the group should put their hands up and keep them there. Look friendly or neutral so that staffers will call on you. When you’re asking a question, remember the following guidelines:

Stick with the prepared list of questions. Don’t be afraid to read it straight from the printout if you need to.

Be polite but persistent, and demand real answers. MoCs are very good at deflecting or dodging questions they don’t want to answer. If the MoC dodges, ask a follow-up question. If they aren’t giving you real answers, then call them out for it. Other group members around the room should amplify by either booing the MoC or applauding you.

Don’t give up the mic until you’re satisfied with the answer. If you’ve asked a hostile question, a staffer will often try to limit your ability to follow up by taking the microphone back immediately after you finish speaking. They can’t do that if you keep a firm hold on the mic. No staffer in their right mind wants to look like they’re physically intimidating a constituent, so they will back off. If they object, then say politely but loudly: “I’m not finished. The MoC is dodging my question. Why are you trying to stop me from following up?”

Keep the pressure on. After one member of the group finishes, everyone should raise their hands again. The next member of the group to be called on should move down the list of questions and ask the next one.

Support the group and reinforce the message. After one member of your group asks a question, everyone should applaud to show that the feeling is shared throughout the audience. Whenever someone from your group gets the mic, they should note that they’re building on the previous questions — amplifying the fact that you’re part of a broad group.

Record everything! Assign someone in the group to use their smart phone or video camera to record other advocates asking questions and the MoC’s response. While written transcripts are nice, unfavorable exchanges caught on video can be devastating for MoCs. These clips can be shared through social media and picked up by local and national media. Please familiarize yourself with your state and local laws that govern recording, along with any applicable Senate or House rules, prior to recording. These laws and rules vary substantially from jurisdiction to jurisdiction.

After the Town Hall

Reach out to media, during and after the town hall. If there’s media at the town hall, the people who asked questions should approach them afterward and offer to speak about their concerns. When the event is over, you should engage local reporters on Twitter or by email and offer to provide an in-person account of what happened, as well as the video footage you collected. Example Twitter outreach:

“.@reporter I was at Rep. Smith’s town hall in Springfield today. Large group asked about Medicare privatization. I have video & happy to chat.”
Note: It’s important to make this a public tweet by including the period before the journalist’s Twitter handle. Making this public will make the journalist more likely to respond to ensure they get the intel first.

Ensure that the members of your group who are directly affected by specific threats are the ones whose voices are elevated when you reach out to media.

Share everything. Post pictures, video, your own thoughts about the event, etc., to social media afterward. Tag the MoC’s office and encourage others to share widely.

So if you were wondering at all about why there is a sudden nation wide outbreak of Town Hall Chaos, it is simple. The Lords Of Chaos (Democratic Machine and Soros Organizations) are doing their usual of hiding behind (i.e. indirectly setting up and funding) “independent” organizations to disrupt and harass. They also make sure cohorts in “the media” are in attendance to “amplify” their “message”.

Just do not ever mistake any of this for “the will of the people” or what “the majority” want or even anything remotely “spontaneous”.

When your plans, questions, answers, signs, etc. etc. are all prescribed in advance, that is called a “script”, and this is just badly done Street Theater.

Post Script

There may well be lots more of interest or use inside their organizing How To Guide, but I’m not able to read through it all right now. Others are encouraged to “take a look” and see what pops up.

It might also be fruitful to wander their “groups” map and see if any interesting names and places show up. I clicked on 2 whole spots before having the Milpitas Police Department pop up. One wonders what would show up in Beverly Hills or Martha’s Vineyard ;-)

Subscribe to feed

Posted in Political Current Events | Tagged , , | 25 Comments

Flood In San Jose

Hey, San Jose Government: Can I water my lawn now?

http://www.sanjoseca.gov/index.aspx?NID=4717

Water Use Rules for Residents

Outdoor Water Conservation Rules & Recommendations*

Outdoor water use is probably the easiest place to reduce water use since it accounts for roughly half of the average water bill. Please follow these rules:

Be cool – water when it’s cool, by HAND held hose with an automatic shut off nozzle or irrigation system before 10:00 a.m. and after 8:00 p.m. With a SPRINKLER system, water before 10:00 a.m. and after 8:00 p.m. only on three designated days:

Odd numbered addresses may water on Mondays, Thursdays and Saturdays;
Even numbered addresses may water on Tuesdays, Fridays and Sundays;
Properties without an address may water on Mondays, Thursdays and Saturdays.
Watering outdoors at other times is not allowed. Less evaporation occurs in the cooler evening and early morning hours — so you can use less water and your plants and landscape will absorb more of it.
Lawns are incredibly resilient and can tolerate the dry conditions of summer, if left alone. Letting your lawn go dormant and turn brown is okay. The grass will bounce back when rainfall and cool temperatures return in the fall and winter months. Learn more lawn watering tips.

Be a sharp shooter — with automatic shut-off nozzles. Cars can be washed at home, but only using hoses with a nozzle that shuts off automatically when the handle is released. This helps you aim and control the water and can save many gallons.
Be quick — fix water leaks as soon as possible. Fix visible leaks as soon as possible. If notified of a leak in your system, fix it within 5 working days. Visit our leaks page to find out how to detect water leaks and fix them.
Be in control — don’t let water flow into gutters or streets. Beyond minor splashing of surfaces, sprinkler and drip systems and hand watering that cause water to flow into gutters and streets or that make large puddles is not allowed.
Be a sweeper — sweep hard surfaces. Use a broom instead of a hose to clean patios, sidewalks, driveways, parking lots, or other hard surfaces. Note: Hosing is allowed when health and safety issues are a concern.
Be frugal — water less often and consider rebates. Many plants can survive on less water, especially when the weather starts to cool. Consider replacing lawns and thirsty plants with drought tolerant landscaping and get a rebate from the Santa Clara Valley Water District! For more information visit www .valleywater.org.
Be resourceful — don’t water after it rains. Watering outdoors within 48 hours after measurable rain is not allowed.

Image from article here:

http://www.cnn.com/2017/02/21/us/san-jose-flood/index.html

Subscribe to feed

Posted in News Related | Tagged , , | 12 Comments

Scraping GISS, CDIAC, NCDC / NCEI, and Me

This is partly just an “aggregator” of things already discussed. Some in specific articles, some in “tips” as I was just making some notes as I went along. I’m putting this up for some added information and so that finding the other bits is easier in the future.

First off, what is “scraping” a site, and why do it?

Scraping is in essence just making a full copy of it for use later as an archive, or as an offline copy. You do it to preserve what is there either at a point in time or as protection from loss.

For some reasons beyond my ken, some site operators don’t like that. Partially, I can see it if they are being hit hard by a bunch of site scrapers and all of them are wide open on fast links. It can saturate their internet connection and is a sort of ‘denial of service’ to others. For those of us on slow home links, this isn’t an issue, but we tend to be whacked by the same “protective” measures used against the others. Oh Well.

There are fairly trivial ways to bypass that kind of block, and for starters one can just set polite settings about a site scraping script. Most such ‘scripts’ are really just a one line command, but I put them in an executable file anyway, so it is a trivial kind of script.

The preferred command is “wget” (at least, it is my preferred command). Which stands for “Web Get”, as that is what it does. It goes out on the web and gets stuff. There are many parameters you can set. Most of them can be ignored. But if you run into issues, RTFM on wget. Read The (um) “Friendly” Manual.

Prior postings have looked specifically at doing a site scrape of the NOAA/NCDC (now renamed to protect the guilty to NCEI though the links / paths have the old name) data and site, along with the CDIAC site (Carbon Dioxide Information Analysis Center). Since CDIAC has posted a “Going Off Line Real Soon Now” notice on their site, I figured it would be a “very good thing” to capture and preserve what I could since it is unclear where, or if, it will come back on line.

NOTICE (August 2016): CDIAC as currently configured and hosted by ORNL will cease operations on September 30, 2017. Data will continue to be available through this portal until that time. Data transition plans are being developed with DOE to ensure preservation and availability beyond 2017.

So it says it will be preserved and available, but… So I snagged a copy of what was publicly available. This also means that, over time, I don’t need to whack their site just to look at a particular bit of data nor do I need to take the network traffic load. All good things. My take on it is here:

https://chiefio.wordpress.com/2017/01/30/scraping-noaa-and-cdiac/

So how big is this bundle? I have a little command named DU that tots up disk usage, sorts it, and prints out a nice summary in a dated file. It looks like this:

root@odroid32:/WD4/ext/7Feb2017_Scrape# cat ~chiefio/bin/DU

du -BMB -s * .[a-z,A-Z]* | sort -rn > 1DU_`date +%Y%b%d` &

#du -ks * .[a-z]* .[A-Z]* | sort -rn > 1DU_`date +%Y%b%d` &

The -BMB causes the Macintosh to barf, so you can use -ms instead of “-BMB -s” and it is fine. One gives you megabytes in binary (1024 per KB) while the other gives it base 10 (1000 per KB) so most folks will not care. I also have a commented out “-ks” form that gives the KB count for things too small for MB to be informative… All that .[a-z] .[A-Z] stuff is to catch the hidden files in your home directory that normally you don’t see. Those starting with a “.” so not normally displayed.

root@odroid32:/WD4/ext/7Feb2017_Scrape# cat 1DU_2017Feb17 
163382MB	Temps
142051MB	cdiac.ornl.gov
15875MB	GHCN_Daily_NOAA_NCDC
2413MB	Old_Logs
1MB	lost+found
1MB	1DU_2017Jan22

So the scrape of NOAA / NCDC was all of 15.8 GB, and that of CDIAC was 142 GB. A lot, but quite manageable. The commands used were a mixed set over time. (wget is smart and doesn’t download a new copy of things that have not changed.) I’ve commented out various iterations as I’d at times used flags to slow total bandwidth, or be simpler. All of them worked, though in slightly different ways. I broke up the fetches into chunks, so I could get any given bit updated with just commenting out, or uncommenting various bits. Note that the only active line is presently the first one that lacks the “-np” flag? By leaving off that “no parent”, it fetches all of USHCN Daily first, then wanders up the parent directory and back down again, collecting most everything not blocked. That would normally be an “error” (so you see the others have “-np”) but as I wanted to preserve the site, I let it walk the whole tree, parent directories included.

# cdiac.ornl.gov USHCN Daily

echo
echo Doing cdiac.ornl.gov USHCN Daily
echo

wget -m http://cdiac.ornl.gov/ftp/ushcn_daily

#wget -m -np http://cdiac.ornl.gov/ftp/ushcn_daily
#wget -m -np -w 10 http://cdiac.ornl.gov/ftp/ushcn_daily

#wget -w 10 --limit-rate=100k -np -m http://cdiac.ornl.gov/ftp/ushcn_daily
#wget -r -N -l inf --no-remove-listing -w 10 --limit-rate=100k -np http://cdiac.ornl.gov/ftp/ushcn_daily

echo
echo Doing World Weather Records
echo

#wget -np -m ftp://ftp.ncdc.noaa.gov/pub/data/wwr/
#wget -np -m -w 20 ftp://ftp.ncdc.noaa.gov/pub/data/wwr/

#wget --limit-rate=100k -np -m ftp://ftp.ncdc.noaa.gov/pub/data/wwr/

#wget --limit-rate=100k -nc -np -r -l inf ftp://ftp.ncdc.noaa.gov/pub/data/wwr/

echo
echo Doing World War II Data
echo

#wget -np -m ftp://ftp.ncdc.noaa.gov/pub/data/ww-ii-data/

#wget -np -m -w 20  ftp://ftp.ncdc.noaa.gov/pub/data/ww-ii-data/

#wget --limit-rate=100k -np -m ftp://ftp.ncdc.noaa.gov/pub/data/ww-ii-data/

#wget --limit-rate=100k -nc -np -r -l inf ftp://ftp.ncdc.noaa.gov/pub/data/ww-ii-data/

Of all the directories and files that are grabbed, only a portion exceed one MB of size:

root@odroid32:/WD4/ext/7Feb2017_Scrape/cdiac.ornl.gov# cat 1DU_mb_out 
125576	ftp
574	oceans
167	epubs
74	trends
70	SOCCR
25	programs
22	carbonmanagement
19	newsletr
16	images
11	wwwstat.html
4	science-meeting
3	ndps
2	datasets

All the rest are 1 MB or smaller. Here’s the listing:

root@odroid32:/WD4/ext/7Feb2017_Scrape/cdiac.ornl.gov# ls
1DU_mb_out		     ftp.2
about			     ftpdir
aerosol_parameters.html      GCP
aerosol_particle_types.html  glossary.html
aerosols.html		     halons.html
authors			     hcfc.html
backgrnds		     hfcs.html
by_new			     home.html
carbon_cycle_data.html	     hydrogen.html
carbon_cycle.html	     ice_core_no.html
carbonisotopes.html	     ice_cores_aerosols.html
carbonmanagement	     icons
carbonmanagement.1	     images
carbonmanagement.10	     includes
carbonmanagement.11	     index.html
carbonmanagement.12	     js
carbonmanagement.13	     land_use.html
carbonmanagement.14	     library
carbonmanagement.2	     methane.html
carbonmanagement.3	     methylchloride.html
carbonmanagement.4	     methylchloroform.html
carbonmanagement.5	     mission.html
carbonmanagement.6	     modern_aerosols.html
carbonmanagement.7	     modern_halogens.html
carbonmanagement.8	     modern_no.html
carbonmanagement.9	     ndps
cdiac			     new
cdiac_welcome.au	     newsletr
cfcs.html		     newsletter.html
chcl3.html		     no.html
climate			     oceans
CO2_Emission		     oceans.1
CO2_Emission.1		     oceans.10
CO2_Emission.10		     oceans.2
CO2_Emission.11		     oceans.3
CO2_Emission.12		     oceans.4
CO2_Emission.13		     oceans.5
CO2_Emission.14		     oceans.6
CO2_Emission.15		     oceans.7
CO2_Emission.16		     oceans.8
CO2_Emission.2		     oceans.9
CO2_Emission.3		     oxygenisotopes.html
CO2_Emission.4		     ozone.html
CO2_Emission.5		     permission.html
CO2_Emission.6		     pns
CO2_Emission.7		     programs
CO2_Emission.8		     recent_publications.html
CO2_Emission.9		     science-meeting
comments.html		     search.html
css			     sfsix.html
data			     shutdown-notice.css
data_catalog.html	     SOCCR
datasets		     staff.html
datasubmission.html	     tetrachloroethene.html
deuterium.html		     trace_gas_emissions.html
disclaimers.html	     tracegases.html
epubs			     trends
factsdata.html		     vegetation.html
faq.html		     wdca
frequent_data_products.html  wdcinfo.html
ftp			     whatsnew.html
ftp.1			     wwwstat.html

You can see that a lot of it is just the html files that make the site go.

Most of the actual volume is the ftp site, as you would expect.

OK, that’s how you can grab a copy of CDIAC before the world changes…

NOAA NCDC / NCEI

The NOAA/NCDC scrape was a similar command. You will note in this listing all of it is commented out except the last bit that is getting “superghcnd”. That was added after this first scrape, and it is HUGE. So not in the above size information (it isn’t done yet). As I had just finished the other bits, I commented them out. Now it only chews on a chunk of syperghcnd when I launch it:

echo
echo Doing NOAA set
echo

#wget -np -m  ftp://ftp.ncdc.noaa.gov/pub/data/noaa/

#wget -np -m  -w 10 ftp://ftp.ncdc.noaa.gov/pub/data/noaa/

#wget --limit-rate=100k -np -m ftp://ftp.ncdc.noaa.gov/pub/data/noaa/

#wget -nc -np -r -l inf ftp://ftp.ncdc.noaa.gov/pub/data/noaa/

echo
echo Doing Global Data Bank set
echo

#wget  -np -m ftp://ftp.ncdc.noaa.gov/pub/data/globaldatabank/

#wget -w 10 --limit-rate=100k -np -m ftp://ftp.ncdc.noaa.gov/pub/data/globaldatabank/

#wget  -np -m -w 10 ftp://ftp.ncdc.noaa.gov/pub/data/globaldatabank/

echo
echo Doing GHCN
echo

#wget  -np -m ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/

#wget -w 10 --limit-rate=100k -np -m ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/

#wget  -np -m -w 10 ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/

echo
echo Doing GHCN -daily-   SuperGHCNd
echo

wget -np -m -w 10 ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/superghcnd/superghcnd_full_20170204.csv.gz

#wget  -np -m ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/

SO FAR I’m at 2.5 TB or so of that hourly daily data. It is 10 GB / day and about 1.5 years worth.

root@odroid32:/LVM/ftp.ncdc.noaa.gov/pub/data/ghcn/daily/superghcnd# du -ms .
2573927	.

I’m figuring on about 4 TB when it is done, so be advised…

My Site

I also grabbed a locally readable mirror of my site. This lets me look at it with a browser offline. Nice for checking old articles without creating web traffic. Like when on a slow link (let it scrape all night, then browse lightning fast during the day). It is a ‘snapshot’ so not useful for things like recent comments and / or interaction. Some images may get downloaded, other things remain live links to the outside world (like video from youtube) so it isn’t 100% network free. (Tuning parameters to wget can grab more stuff outside the original site on links, but I’ve not done that yet. It is tricky to not end up scraping the entire world… so set the depth to capture all links, and all depths, and you end up putting the whole internet on your disk drive…)

What command did I use?

wget -U Mozilla -mkEpnp https://chiefio.wordpress.com

I was testing the “-U Mozilla” prior to doing GISS and didn’t want an error of syntax to lock me out for a day again… (GISS is picky about scraping, so gave me a one day lockout on my first scrape attempt)

How much disk did that take?

root@odroid32:/LVM/chiefiowp# du -ms chiefio.wordpress.com/
1373	chiefio.wordpress.com/

1.3 GB. Not bad, but I can see I need to check where the “free” limit on disk is located on WordPress ;-)

GISS

This one was more problematic. With the news being that President Trump would be refocusing NASA on space, and out of the politicized field of Climate, I’d figured a nice thing to do would be to preserve a copy. A couple of folks “tipped” this, but this is the link I can find at the moment. From P.G. here:

https://chiefio.wordpress.com/2017/02/01/tips-february-2017/#comment-79793

https://www.europebreakingnews.net/2017/02/trump-scrapping-nasa-climate-research-division-in-crackdown-on-politicized-science/

Trump scrapping NASA climate research division in crackdown on ‘politicized science’

February 19, 2017
Donald Trump is poised to eliminate all climate change research conducted by Nasa as part of a crackdown on “politicized science”, his senior adviser on issues relating to the space agency has said. Nasa’s Earth science division is set to be stripped of funding in favor of exploration of deep space, with the president-elect having set a goal during the campaign to explore the entire solar system by the end of the century. This would mean the elimination of Nasa’s world-renowned research into temperature, ice, clouds and other climate phenomena. Nasa’s network of satellites provide a wealth of information on climate change, with the Earth science division’s budget set to grow to $2bn next year. By comparison, space exploration has been scaled back somewhat, with a proposed budget of $2.8bn in 2017. Bob Walker, a senior Trump campaign adviser, said there was no need for Nasa to do what he has previously described as “politically correct environmental monitoring”. “We see Nasa in an exploration role, in deep space research,” Walker told the Guardian. “Earth-centric science is better placed at other agencies where it is their prime mission. “My guess is that it would be difficult to stop all ongoing Nasa programs but future programs should definitely be placed with other agencies. I believe that climate research is necessary but it has been heavily politicized, which has undermined a lot of the work that researchers have been doing. Mr Trump’s decisions will be based upon solid science, not politicized science.”

Well, to me, that sure sounded like GISS climate work, and GIStemp, were likely to get the boot. So being responsible for backups and archives at companies for much of my professional life, I naturally thought: “Make a Golden Master Archive” of what you can.

Well, my first attempt was immediately slapped down by a bot assassin. Details in comments here:

https://chiefio.wordpress.com/2017/02/01/tips-february-2017/#comment-79749

https://chiefio.wordpress.com/2017/02/01/tips-february-2017/#comment-79767

https://chiefio.wordpress.com/2017/02/01/tips-february-2017/#comment-79820

The bottom line of all that is that NASA GISS has anti-site scraper settings in their robot.txt file. I did get the scrape to work, after waiting a day or two for the block to expire. The command that worked is:

wget -U Mozilla --wait=10 --limit-rate=50K -mkEpnp https://data.giss.nasa.gov

Most likely one could leave out the “–limit-rate” and even the “–wait” commands, but as I’m still working off the “superghcnd” TB wad, I didn’t want to slow it down. The “wait” says to pause that many seconds between fetches (so it looks like someone clicked a key) and the “limit-rate” makes it polite about being a bandwidth hog. The “-U Mozilla” says to tell the site, when asked, that I’m really Mozilla browser. You can put many different browser types in that spot, as you like it.

As of now (all of a few hours of running, waiting and rate-limiting) I’ve already got some data on downloads. Here’s what I’ve go so far:

root@odroid32:/LVM/GISS/data.giss.nasa.gov# ls
cassini   dust_tegen  impacts	  mineralfrac  precip_cru  sageii
ch4_fung  efficacy    index.html  modelE       precip_dai  seawifs
co2_fung  gistemp     landuse	  modelforce   robots.txt  stormtracks
csci	  imbalance   mcrates	  o18data      rsp_air	   swing2
root@odroid32:/LVM/GISS/data.giss.nasa.gov# du -ms *
2	cassini
22	ch4_fung
1	co2_fung
8	csci
21	dust_tegen
7	efficacy
1	gistemp
1	imbalance
130	impacts
1	index.html
1	landuse
3	mcrates
49	mineralfrac
259	modelE
2	modelforce
1	o18data
1	precip_cru
1	precip_dai
1	robots.txt
2	rsp_air
1	sageii
7	seawifs
5	stormtracks
349	swing2

So there are 14 out of 22 directories either done, or in progress (so one of them is actively downloading at the moment. I can see it is the ModelE directory in another window)

That leaves only 8 more directories to go, one of the items is the file ‘index.html’ and another being the robots.txt file, so not a directory. A total of 879 MB so far. Unless something is very very large in the other directories, not a big scrape load, really. We’ll see when it completes.

Now, about that robots file… Sites can send a file to your code that says, basically, “If you are not a human, but are a computer robot doing a task for a human, don’t do this list of things.” Here’s the robots file from GISS:

root@odroid32:/LVM/GISS/data.giss.nasa.gov# cat robots.txt 
User-agent: *
Disallow: /cgi-bin/
Disallow: /gistemp/graphs/
Disallow: /gfx/
Disallow: /modelE/transient/
Disallow: /outgoing/
Disallow: /pub/
Disallow: /tmp/

User-agent: msnbot
Crawl-delay: 480
Disallow: /cgi-bin/
Disallow: /gfx/
Disallow: /modelE/transient/
Disallow: /tmp/

User-agent: Slurp
Crawl-delay: 480
Disallow: /cgi-bin/
Disallow: /gfx/
Disallow: /modelE/transient/
Disallow: /tmp/

User-agent: Scooter
Crawl-delay: 480
Disallow: /cgi-bin/
Disallow: /gfx/
Disallow: /modelE/transient/
Disallow: /tmp/

User-agent: discobot
Disallow: /

Now I don’t really care about a robots.txt file, I just “flow around it” by spoofing and saying I’m not a robot. So I’ve never really learned how to read one. To me, it looks like “IF your ‘user agent’ text is FOO, forbid / Disallow these directories”. Looks like “discobot” gets screwed with nothing allowed, and “MSNbot, Scooter and Slurp” get a speedlimt and some various transitory things blocked, all else OK. Everyone else gets even more blocked (but not all like discobot). That “*” is a wild card that usually says “match everything”.

I’m not sure if being Mozilla gets me past that, or not. We’ll see when this scrape is done, if those directories are all missing, or not. (I may need to spoof a different user-agent string in a future scrape). Re-runs of scrapes only pick up what has changed or has been added (IF you set the flags right), so a rerun on a mostly static site can go very very fast. It does not hurt to re-run a scrape in those conditions.

In Conclusion

So there you have it. How to snag huge chunks of data and such from various climate related sites.

You could do similar things for just about any site out there (depending on how tight they are on robots.txt, how creative you are getting past it, and how much disk you have).

I can now point my browser at that local file set and read the pages from my own disk, if desired. This is an example URL from my browser title bar:

file:///LVM/GISS/data.giss.nasa.gov/index.html

And I’m looking at the top page of the data.giss.nasa.gov site as of the time I scraped it.

Nice, eh?

Subscribe to feed

Posted in AGW Science and Background, Earth Sciences, Tech Bits | Tagged , , , , | 12 Comments

30 miles due west of Oroville, in the flat

Yet more “Flooding Drought”.

This is very near my old home town. It is the Seasonal Flood as in my childhood. This area is “protected” by Shasta Dam on the Sacramento River, so this flood is from water in that catchment, plus local rains. This IS NOT related to the Oroville dam problems.

Yet it is flooded.

The horrible thing this means is that releases from Lake Shasta are too high for present rain conditions, which implies the operators are very worried about something. Perhaps a too full lake with too much snow above it and a warmish atmospheric river starting?

The pictures of Williams I-5 as a watercovered road are also not good. That roadbed is raised relative to the land on each side, and has good drainage (normally).

This is what is downslope from Central Valley reservoirs that are full.

This (toward Colusa and Williams) is where Oroville residents head to leave town (though 30 miles from Oroville). The main road west goes directly to this flood zone. The southern road is along the banks of the Feather River, so the prime flood direction. The northern road to Chico took 3 hours to go a 25 minute drive.

.
.

And we have 3 more days of steady rain ahead.

From this posting:

https://www.iceagenow.info/residents-flee-northern-california-town-video/

Subscribe to feed

Posted in Human Interest, News Related | Tagged , , | 55 Comments

RAID, LVM, Gaggle Of Disks…

What to do if you have a stack of “modest” sized disks, say a couple of TB each, but you need a single directory of about 6 TB?

I suppose you could go out and buy a new 8 TB disk (some is lost in formatting and such). Or move some of the files to another disk (and put symbolic links in the original location – I’m running a wget, so if the files are just gone, they would be downloaded again). But the first one is expensive and requires moving a lot of data up front. The other has ongoing need to move data around and assuring that the wget is structured so that it really doesn’t try to download all that stuff again. All of it is a kludge.

There are alternatives.

RAID

The first one most folks think of is a RAID group. Redundant Array of Inexpensive Disks. This is most often used to make a group of disks where any one disk can fail, be replaced, and you lose no data. There are a bunch of RAID levels. Mirrors (2 sets of disks with one copy of the data each). Striped Groups (where each file has blocks on each disk, usually done to increase read and write speed as you can have a block buffered and R or W on each disk. And higher RAID types. Most often this is RAID 5 where blocks are spread over several disks, as is a block of parity data enough to reconstruct the data blocks on any one disk, were it to crash.

More on RAID levels:

https://en.wikipedia.org/wiki/Standard_RAID_levels/a>

In computer storage, the standard RAID levels comprise a basic set of RAID (redundant array of independent disks) configurations that employ the techniques of striping, mirroring, or parity to create large reliable data stores from multiple general-purpose computer hard disk drives (HDDs). The most common types are RAID 0 (striping), RAID 1 and its variants (mirroring), RAID 5 (distributed parity), and RAID 6 (dual parity).

Raid levels cover things like glueing together a set of disks, but often has a large time cost in building, and changing, the structure. When you add or remove a disk, the RAID does a “rebuild” and it can take a long time, especially on slow hardware like the Pi.

A striped group gives performance improvement as reads / writes are spread over several disk spindles and heads.

A mirror group gives data security, but at a high cost in duplicated disks and reads / writes.

RAID 3 and 4 are fairly specialized combinations of bit or byte striping and parity (on a dedicated disk for RAID 4).

RAID 5 has the parity distributed over all the disks, and RAID 6 has two copies of the parity so that you can lose 2 disks and survive.

All that parity has a large cost in computes, especially when the compute engine is small. Thus the very long rebuild times. Even adding a new empty disk involves a ‘rebuild’ as the data and parity get spread over that new disk and recomputed.

I built a RAID as my first cut at this problem, and then found that the ‘rebuild’ when I added a third disk was going to take a day. During that time, the RAID array is at risk. Every time I would add or remove a disk, that same process would happen. Furthermore, one disk is lost to parity, so for 3 disks, you get 2 disks of storage. Each disk improves efficiency, so more smaller disks is better than 2 giant disks. My USB Hub has only 4 slots, so at best I could get 3 disks worth of space usable. For 6 TB that would mean using 4 x 2 TB disks, and that would be “close” on total space. When it ran out, I’d be basically stuck. Adding another hub and more disks would start to get pricy and then there woud be the rebuild time.

Oh, and since for RAID 5, the basis is a striped group:

“RAID 5 consists of block-level striping with distributed parity.”

Each disk (or partition) must be of the same size. Well, some can be bigger than others, but the only space used will be the size of the smallest disk or partition. So if you have 4 x 1 TB disks, but one of them has a 100 GB partition set aside of something else, you will get 4 chunks of 900 MB each used, and only 3 x 900 MB available after parity. Spending 4 TB to get 2.7 TB starts to bite pretty quickly, especially when after formatting you are closer to 2.5 TB.

For anyone wanting to play with making a RAID, pretty good directions are here:

http://projpi.com/diy-home-projects-with-a-raspberry-pi/raspberry-pi-raid-array-with-usb-hdds/

The very abbeviated form is:

If your Debian / Devuan has been a while since the last update, bring it up to date:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade

Personally, I’d skip the dist-ugrade, especially since it can screw up your Devuan on Pi in some cases (replacing the kernel on BerryBoot seems to kill it).

The program that impliments RAID on the Debian family is “mdadm” (multi-disk admin?) so install it.

apt-get install mdadm

Then you plug in your disks and create your RAID. Quoting the article:

mdadm -Cv /dev/md0 -l0 -n2 /dev/sd[ab]1 ( configure mdadm and create a raid array at /dev/md0 using raid0 with 2 disks ; sda1 and sdb1. To create a raid1, replace the line to read mdadm -Cv /dev/md0 -l1 -n2 /dev/sd[ab]1 )

Clearly for RAID 5 you would use -l5 instead. Also note that you can list the disks explicitly without the wildcard [ab] bit. So like:

mdadm -Cv /dev/md0 0l5 -n3 /dev/sda1 /dev/sdb1 /dev/sdc3

I’ve not tested that command, but think I have the syntax right and no typos… one hopes. IIRC, that’s what I did with my test case. Note that you can use different partitions on different disks and your particular disk partition names will vary. Note that you now have a RAID group on /dev/md0 but not a file system. So make one:

mkfs /dev/md0 -t ext4 

You can now mount /dev/md0 like any other disk. I mounted it as /RAID for my testing.

For a fair time I searched for how to keep it straight what disk was in the RAID. They get marked with a magic number on the disk itself and assembled at boot time. Removing it can be a challenge

Is there something less complicated, that takes less computes, and is more efficient with the disks?

LVM, an easier way

Logical Volume Manager.

The purpose of LVM is different from that of RAID. RAID is to handle data protection and performance, while LVM is for the purpose of making volume management easy.

Before anyone asks, yes, you can use the two together ( IFF you are prone to loving hyper-complex environments and enough levels of indirection to cause your eyes to glaze… but folks have used RAID to build the underlaying data vault then used LVM on top of it to make administering the disks easier).

With LVM you can “glue together” a gaggle of disks so that they look like one giant disk to the world. Or break up one gaggle of disks into a different gaggle of logical disks.

I just used it to create what looked like one giant disk by glueing together a 4 TB disk, a 2 TB disk, and a 1.5 TB disk. Notice that volume sizes can be anything and we’re not talking about data preservation or speed of access here. Just one BIG file system made out of several different disk bits.

The LVM Wiki is pretty good:

https://wiki.debian.org/LVM

First you do the usual “upgrade / update” of the system. Then you install the LVM code and start the service:

sudo apt-get install lvm2
sudo service lvm2 start

The wiki has you install a graphical management bit, but I didn’t bother.

apt-get install system-config-lvm

Now there is a 3 level set of “stuff” to keep track of during the rest. Physical disks or disk partitions. Groups of “volumes” (called volume groups). And logical volumes created inside a volume group. There are commands to create, inspect, and manage things at each level. (So you can see how adding RAID above or below and adding a couple of more levels can be a bit confusing…)

OK, at the physical level we need to assign disks or disk partitions to the Volume Group. You can pretty much mix and match bits of disks at this level, though the pages encourage slugging in whole disks as simpler to manage. I built mine out of partitions and put a swap partition as slice ‘b’ on each disk. Why? Because I’m an old school surly curmudgeon who doesn’t like the idea of running swap onto a splotch of disk on a LVM volume in an LVM Group on a gaggle of physical disk partitions… but you can put swap on an LVM volume if you like, then just slug in whole disks for space. So instead of using /dev/sda1 for disk space and /dev/sda2 for swap (and paritioning accordingly) you can just add /dev/sda to the LVM group and parcel it out as desired to files or swap.

So once the LVM service is installed, how do you hand it disks or partitions?

As usual for all things systems admin, you either put a “sudo” in front of commands or run them as root. Just a reminder… So what is that command?

pvcreate /dev/sda2

This marks that partition as part of the LVM batch. If you used “pvcreate /dev/sda” you would assign the whole disk.

There are a bunch of physical volume commands, but I’ve not found one that tells you how much real data is on any given physical disk.

PV commands list

pvchange — Change attributes of a Physical Volume.
pvck — Check Physical Volume metadata.
pvcreate — Initialize a disk or partition for use by LVM.
pvdisplay — Display attributes of a Physical Volume.
pvmove — Move Physical Extents.
pvremove — Remove a Physical Volume.
pvresize — Resize a disk or partition in use by LVM2.
pvs — Report information about Physical Volumes.
pvscan — Scan all disks for Physical Volumes.

You would think pvs would tell you how much of each physical volume had data on it. It doesn’t. It tells you how much has a file system built on it:

root@orangepione:~# df /LVM
Filesystem                                 1K-blocks       Used  Available Use% Mounted on
/dev/mapper/TemperatureData-NoaaCdiacData 7207579544 2688218088 4189744724  40% /LVM

root@orangepione:~# pvs
  PV         VG              Fmt  Attr PSize PFree
  /dev/sda1  TemperatureData lvm2 a--  3.64t    0 
  /dev/sdb1  TemperatureData lvm2 a--  1.82t    0 
  /dev/sdc1  TemperatureData lvm2 a--  1.36t    0 

So with 60% empty, pvs shows nothing free. OK… It makes a certain kind of sense in that I can’t add a new Logical Volume as the space is committed to the /LVM mount point (made from the Volume Group “TemperatureData” and the Logical Volume “NoaaCdiacData” – and yes, I wish I’d used shorter names ;-)

As I understand it, unless you make it a striped group, then files are allotted in order from first disk to last disk, so I can assume that the 2.6 TB used is all on that first /dev/sda1 physical volume at this point… but I’d really like a command that let me know for sure…

OK, you have handed over some disk or partition to the physical volume list. Now how to do that Volume Group and Logical Volume stuff?

Create your Volume Group. I used TemperatureData as the name and wish I’d used TGroup…

vgcreate myVirtualGroup1 /dev/sda2

Then add another disk or partition to it with:

vgextend myVirtualGroup1 /dev/sda3

There are lots of things you can do with Volume Groups:

VG commands list

vgcfgbackup — Backup Volume Group descriptor area.
vgcfgrestore — Restore Volume Group descriptor area.
vgchange — Change attributes of a Volume Group.
vgck — Check Volume Group metadata.
vgconvert — Convert Volume Group metadata format.
vgcreate — Create a Volume Group.
vgdisplay — Display attributes of Volume Groups.
vgexport — Make volume Groups unknown to the system.
vgextend — Add Physical Volumes to a Volume Group.
vgimport — Make exported Volume Groups known to the system.
vgimportclone — Import and rename duplicated Volume Group (e.g. a hardware snapshot).
vgmerge — Merge two Volume Groups.
vgmknodes — Recreate Volume Group directory and Logical Volume special files
vgreduce — Reduce a Volume Group by removing one or more Physical Volumes.
vgremove — Remove a Volume Group.
vgrename — Rename a Volume Group.
vgs — Report information about Volume Groups.
vgscan — Scan all disks for Volume Groups and rebuild caches.
vgsplit — Split a Volume Group into two, moving any logical volumes from one Volume Group to another by moving entire Physical Volumes.

I’ve not explored most of those commands…

OK, you have a nice big volume group, now what? How to split out what looks like a disk to the system and mount it? Create a Logical Volume.

lvcreate -n myLogicalVolume1 -L 10g myVirtualGroup1

Now I used NoaaCdiacData for my logical volume name and wish I’d used NCData…

lvcreate -n NCData -L 100g Tgroup

Then format it to ext4 (or something else if you have good reason to).

mkfs -t ext4 /dev/Tgroup/NCData

You could now do a mount on /test to see if it worked:

mount /dev/Tgroup/NCdata /test

There are lots of Logical Volume commands too:

LV commands

lvchange — Change attributes of a Logical Volume.
lvconvert — Convert a Logical Volume from linear to mirror or snapshot.
lvcreate — Create a Logical Volume in an existing Volume Group.
lvdisplay — Display the attributes of a Logical Volume.
lvextend — Extend the size of a Logical Volume.
lvreduce — Reduce the size of a Logical Volume.
lvremove — Remove a Logical Volume.
lvrename — Rename a Logical Volume.
lvresize — Resize a Logical Volume.
lvs — Report information about Logical Volumes.
lvscan — Scan (all disks) for Logical Volumes.

I had to add disks to my Volume Group after it was built, and then extend the size of the file system to include those other disks. The “lvextend” command does that, and then the resize2fs command to expand the file system to fill that extended space.

Essentially that’s it. If folks want examples of the lvextend an resize2fs commands, let me know and Illl add it, but it is fairly simple.

In Conclusion

So that’s where I’m at on scraping the approximately 6 TB of “superghcnd” that looks like it is hourly data for a selection of GHCN sites. About 10 GB / day… I chose to use LVM just to avoid several days worth of “rebuild” on RAID volumes, and because I could glue together a gaggle of different disks into one logical volume image. I risk that any disk loss can cause all of it to be lost, but since it is a duplicate of an online server, I’m able to reload it if needed (as long as NOAA keeps it up).

Sometime after I have a full copy, should I desire more security, I could make a RAID volume (and maybe put LVM on top of it), then gradually grow the RAID as I copied over data and shrink the LVM group… Or just toss a couple of $Hundred more at a couple of added 4 TB disks. It’s a full 3 weeks until the download is finished, and I’ve got plenty of raw space at the moment, so lots of time to think about the next step. At the moment, I’m happy to just have it all download and then leave the disks turned off 90% of the time. Turned off disks in a drawer have a long MTBF (Mean Time Between Failure).

At present I have about 2 TB additional empty disk, beyond the 4 TB free in the LVM Logical Volume at the moment, so things are fine for now. I think I’m going to need about 2.5 of that 4 TB to have the download finish. Then I’ll decide on “safe in a drawer” or “Move to a RAID”. I already have a simple copy of the data that does not include the “superGHCNd” mammouth chunk, so the only bit at risk is that huge chunk of unclear value. I think just moving one day of that data and the ‘diff’ files to a duplicate is enough “protection”.

So there you have it. The “joys” of slugging TB of data around and how to do it.

Subscribe to feed

Posted in Tech Bits | Tagged , , | 10 Comments

Trump Campaign Stop in Melbourne (FL that is…)

Well, we’ve finally done it. We’ve reached the point of perpetual Presidential Campaigning.

The cleverness of it is impressive, though. Blocks NGOs from attacking him without compliance with the whole PAC thing. Lets him hold support rallies for the President as he governs and lets him get loads of “earned media” while bypassing the news cycle.
Just love it.

Remember to can watch it live, and without talking heads telling you what to think, at RightSide Media:

http://rsbn.tv/president-trump-rally-in-melbourne-fl-2182017/

Happening now as a live feed with reruns later…

Subscribe to feed

Posted in Political Current Events | Tagged , , | 61 Comments

Frittatomelette – a hybrid

The spouse was especially fond of the omelette I made this morning, so I thought I would share the single detail that makes it different.

Americans often call a frittata an omelette. The difference is subtile, but important.

In a classical omelette, the egg mix is cooked almost to done, then filling is laid in the middle and the egg folded over it. In a frittata, the filling is placed into the frying pan and cooked some, then the egg mix is added, and when near the finished point, folded and served.

Over the years, I’ve played with both. Trying to work out what is best. Making a cheese frittata is an exercise in eggs that don’t set up right. Making a ham omelette means having cold ham with undercooked egg mix on the bits and lacking that browning that enhances the flavor. Just how can one make a decent Ham & Cheese omelette with that problem?

My solution is a hybrid. Place the ham bits in a lump of better in the skillet. Saute or fry them until browned just enough. Add the egg mix (couple of eggs beaten with a Tbs or two of milk) and let it cook to the almost all set stage (lowering the heat helps here so the bottom surface doesn’t overcook while the top layer is not cooked yet… at lower heat the whole depth warms more evenly). Just about the time it’s ready to set up on the top layer, at that gelatinous but still not set up stage, sprinkle on finely shredded cheese. I use the Mexican Taco Mix shreds. Fold, and finish (that for me, means let it sit just long enough for the folded flaps to stick, then turn the whole thing over to seal and finish.

For things like a Denver, I also fry the onion and peppers bits with the ham. Essentially, I make a frittata out of any bits that fry well, and an omelette with the bits that ought not be fried, like cheese and avocado and whatnot. The Frittatomelette.

With that, time to refill the coffee cup and admire the stormy weather with a hot cup a Joe and a full tummy ;-)

Subscribe to feed

Posted in Food | Tagged , , , | 13 Comments